-
Notifications
You must be signed in to change notification settings - Fork 260
Home
Julien Nioche edited this page Sep 20, 2016
·
26 revisions
- Introduction
- Configuration: how to configure the storm-crawler
-
Registering Metadata for Serialization: If your topology doesn't extend
ConfigurableTopology
, you will need to manually register storm-crawler'sMetadata
class for serialization in Storm.
- FetcherBolt(s)
- Protocols: Network protocols that are usable in storm-crawler
- JSoupParserBolt: parse HTML documents
- SiteMapParserBolt: how to handle sitemaps
- URLFilters: how to filter or normalise outlinks
- ParseFilters: extract metadata from documents
- IndexingBolts
- Start
- Components
- Filters
- Bolts
- Protocol
- Metadata
- Resources