Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running committer without running collector again #16

Open
hardreddata opened this issue Dec 3, 2023 · 3 comments
Open

Running committer without running collector again #16

hardreddata opened this issue Dec 3, 2023 · 3 comments

Comments

@hardreddata
Copy link

Hi,

I ran the collector and have the working folders here. The SQL commit failed as the database was down.

Is it possible to just re-run the committer part without rerunning the collector? The collector took quite a bit of time.

Many thanks.

@sakanaosama
Copy link

sakanaosama commented Dec 8, 2023

Hi,

If you're using version 3.x or later, here's what we can do:

  1. Enable "commitLeftoversOnInit" in the configuration (default is false).
    https://opensource.norconex.com/committers/sql/v3/apidocs/com/norconex/committer/sql/SQLCommitter.html
  2. Change maxDocuments to 0 to avoid fetching further new documents
  3. Find stored error indexes in the "error" directory, as shown below:
workdir
...
> queue
> error
  >> batch-xxxxxxx
    >>> failed-index
  1. Move the error index to the "queue" folder:
workdir
> queue
  >> batch-xxxxxxx
    >>> failed-index
> error
  1. Restart the crawler, retaining the previous crawling status (using the working directory).
    Creating a backup of the working folders and testing in a non-production environment is recommended.

Ryan Ng

@hardreddata
Copy link
Author

Thanks for the advice.

I am still running the older 2.9.x version. If there is no solution here I will just crawl it again from the start over the holiday period.

@sakanaosama
Copy link

Regrettably, this feature is only accessible starting from version 3.x. It may be time to consider an upgrade. Also, I will proceed to close this ticket.

Thank you,
Ryan Ng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants