Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Committer closed without sending any documents #40

Open
benzaita opened this issue Feb 10, 2020 · 1 comment
Open

Committer closed without sending any documents #40

benzaita opened this issue Feb 10, 2020 · 1 comment

Comments

@benzaita
Copy link

benzaita commented Feb 10, 2020

I have configured an Elasticsearch domain in AWS and verified it works by PUTting a document into it using curl.

However, when running the http-collector configured with the elasticsearch-committer the committer just closes without sending any documents or reporting any errors:

INFO  [AbstractCrawler] MyWebsite: Crawler finishing: committing documents.
INFO  [ElasticsearchCommitter] Elasticsearch RestClient closed.
INFO  [AbstractCrawler] MyWebsite: 4 reference(s) processed.
INFO  [CrawlerEventManager]          CRAWLER_FINISHED
INFO  [AbstractCrawler] MyWebsite: Crawler completed.
INFO  [AbstractCrawler] MyWebsite: Crawler executed in 12 seconds.
INFO  [SitemapStore] MyWebsite: Closing sitemap store...
INFO  [JobSuite] Running MyWebsite: END (Mon Feb 10 08:58:07 UTC 2020)

This line (INFO [ElasticsearchCommitter] Elasticsearch RestClient closed.) is the only output I get from the committer which is configured as follows:

        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
            <nodes>https://hostname-in-aws</nodes>
            <indexName>mywebsite</indexName>
            <queueSize>1</queueSize>
            <commitBatchSize>1</commitBatchSize>
            <ignoreResponseErrors>false</ignoreResponseErrors>
        </committer>

How can I increase the log level? Or - what could be the problems here?

@essiembre
Copy link
Contributor

Look into the collector directory for a file called log4j.properties. You can use it to raise the log level.

You are showing the last part of your log only. I am curious to see what shows up before. There were only 4 documents processed. The logs should tell you if they were rejected or what not. To make it to Elasticsearch you should see log entries with DOCUMENT_COMMITTED_ADD in them. Do you see any?

If you only see REJECTED_... and you cannot figure it out, you can change the log level for those rejections to get more details explaining why it was rejected.

If you cannot figure it out, please attach your config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants