Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Committer queue is not fully processed #16

Open
jsteggink opened this issue May 21, 2018 · 10 comments
Open

Committer queue is not fully processed #16

jsteggink opened this issue May 21, 2018 · 10 comments

Comments

@jsteggink
Copy link

The committer queue is not fully processed because it's capped by the queueSize property. Since the queue can be bigger than the queueSize and is only called after a commit, the file queue grows and grows.

https://github.com/Norconex/committer-core/blob/master/norconex-committer-core/src/main/java/com/norconex/committer/core/AbstractFileQueueCommitter.java#L175

@essiembre
Copy link
Contributor

When using AbstractFileQueueCommitter directly, the commit will only becalled at the end like you mention, unless you call it yourself more frequently. If you want to be called after X number of documents, have a look at the AbstractBatchCommitter subclass, which does it for you.

Does that address your issue?

@jsteggink
Copy link
Author

Thanks for your reply Pascal. I came across this issue while using the Solr Committer. Since it implements the AbstractFileQueueCommitter (Committer Core) it's why I'm posting the issue here.

The commit is also called by commitIfReady (AbstractCommitter) which in turn is called by the "add" and "remove" methods in AbstractCommitter. The commitIfReady also checks the queue size. This means that commit is only run from here. However, since the queue sizes in commitIfReady() and commit() are the same and the code is asynchronous, as multiple threads can call the methods, the queue grows and grows because items are added to the queue while the commit takes a bit of time to be processed. A quick solution would be to remove the queue limit in the commit() method.

Many things happen in the commit() method. Mainly because of rereading the complete directory of the file queue and because of the iteration of the filesToCommit.

I would suggest to build a more robust commit queue, maybe based on events. Something like RxJava could help to make the committer-core more pluggable so people can implement there own queues.

What do you think?

@essiembre
Copy link
Contributor

If it can't keep up right now, you may have to slow it down, unfortunately. I agree the queue could be improved and I am already sold to the idea of being able to supply your own queue. I am marking this as a feature request.

The Committers will be seriously revisited in the next major version and something like RxJava will be given consideration. Have you used RxJava in a few projects yourself? Any examples?

@jsteggink
Copy link
Author

I have some experience with Reactor, which is another Reactive Streams framework. It's also used by the Spring framework. It would be my first choice as it's targeted to Java 8 and easily integrates with Kafka and RabbitMQ.

@truezjz
Copy link

truezjz commented Jun 27, 2019

Hi Pascal,

I'm also facing this issue, after crawling 174K files, there are 12000 files left in the commiter-queue folder not processed.
Understand the issue will be addressed in next version, is there a walk around for the time being?

@jsteggink
Copy link
Author

Last year I fixed it. I can come up with a pull request tomorrow.

@truezjz
Copy link

truezjz commented Jun 27, 2019

thanks for for prompt response Jeroen, I will check it out, that will be in the commiter-core right?

@jsteggink
Copy link
Author

Yes, it's the committer-core. I need a little bit of extra time to make some unit tests and I have some stuff I haven't committed yet in my fork. In the meantime you can take a look of what I did: https://github.com/jsteggink/committer-core
I have added a reactive committer and a persistent queue based on RocksDb. This makes the committer-core more stable, way faster and potentially more scalable. In the future different persistent queue implementations could be added.

@truezjz
Copy link

truezjz commented Jun 28, 2019

Hi Jeroen, Thanks for the update, as you mentioned current code in your fork is not your final submit, looking forward to final submit;
Meanwhile as you mentioned, for temporary solution, I can set the queuesize in commiter to unlimited number, but the disadvantage is, it need large storage to hold the committer queue, correct?

@truezjz
Copy link

truezjz commented Aug 20, 2019

@jsteggink Jeroen any update? if need I can help with the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants