-
Notifications
You must be signed in to change notification settings - Fork 0
Home
FCS (Focused Crawling Service) is a highly accessible web platform for so-called "focused crawling". This method of web crawling, driven by a single topic, tries to discover as many web pages relevant to this particular topic, as possible. Unlike the popular crawling approaches, which cover as large Web subgraph as possible, this technique minimizes time and resources.
The platform was designed to be available for varying clients working on different platforms and using different programming languages. It may be accessed either as a web application or via REST API. Moreover, it supports customizable topic similarity methods (i.e. methods of evaluation of relevance between web pages and a given topic).
FCS is meant to be fully scalable and efficient regardless of a number of clients. Since demands for resources needed for crawling tasks vary dynamically and depend on the number of clients and their requests, the platform may scale its resources and adapt them to current needs. In case of massive crawling, resources like computing power, storage space or bandwidth become exhausted very quickly, however maintenance of vast static installation may be extremely expensive. On the other hand, such a high demand for resources is required only periodically and during low usage periods resources are wasted. Autoscaling mechanism allows to ensure high efficiency, irrespective of a number of crawling tasks and results in reduction of running costs in periods of low demands. In future, the included autoscale features are intended to be applied on top cloud providers.
More detailed information about FCS may be found on the other Wiki pages.