-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Andrés Aguilar-Umana edited this page Jul 24, 2017
·
7 revisions
The Heritrix Connector is designed to be used in conjunction with the Aspire Content processing system.
It's a web crawler (obviously!) based on the Heritrix engine. It accepts a number of url seeds (and all the other usual Heritrix parameters), starts the Heritrix engine and passes a job to Heritrix that performs all the hard work.
Urls found by Heritrix are passed to the Aspire content processing system as adds. The connector handles deletes using settings allowing a number of iterations or days since a url was seen to pass before the url is send to Aspire as a delete.
In order to use this connector "as is", you'll need to download and install the Aspire Content Processing system.
Downloading and Installing Aspire