Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #200

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Benchmarks #200

wants to merge 4 commits into from

Conversation

paualarco
Copy link
Member

@paualarco paualarco commented Jun 1, 2020

WIP
This PR aims to add benchmarking for monix-kafka. #116
The approach to do so would be to spin up a kafka cluster using docker containers, for the moment only one broker but later might add more.
The plan/strategy for this benchmark is better explained in the readme.md within benchmarks subproject.

Guess would also be cool to have benchmarks for previousVersion and nextVersion in order to compare them, but that might go in a different PR?

@Avasil
Copy link
Collaborator

Avasil commented Jun 1, 2020

Awesome, I'm very happy to see it!
Not sure how reliable these benchmarks will be but hopefully good enough to spot any noticable regressions or areas for improvement (when compared to other libraries)

@paualarco paualarco marked this pull request as draft June 2, 2020 13:58
@paualarco paualarco marked this pull request as ready for review September 27, 2020 10:04
@paualarco
Copy link
Member Author

@Avasil have done progress on this one ☝️

Finally, the benchmarks are structured as follows:

Consumer Observable

  • Topic with partitioning of 1 and 2.
  • Manual Commit
  • AutoCommit (Async and Sync)

Single Producer

  • Syncronous for topic with partitioning of 1 and 2

Sink Producer

  • Topic with partitioning of 1 and 2.
  • Parallelism of 100
  • Single threaded with no parallelism

If you think there should other scenarios contemplated, please suggest :)
Would be good in the future to compare with other kafka integrations with other reactive stream libs.

@Avasil
Copy link
Collaborator

Avasil commented Sep 27, 2020

Fantastic, I hope I can finally find some time to update the library 😅

If you think there should other scenarios contemplated, please suggest :)
Would be good in the future to compare with other kafka integrations with other reactive stream libs.

I remember this blog post from a while ago. We could do something similar, that is compare vs plain Kafka producer/consumer to see what kind of overhead do we introduce

@paualarco
Copy link
Member Author

Writing a blog pos would be nice, even we could include that in the web docs once they get merged.
But probably most important now is to keep the library updated. On that side, is there something I could do to contribute ongoing maintenance?

@Avasil
Copy link
Collaborator

Avasil commented Sep 27, 2020

Writing a blog pos would be nice, even we could include that in the web docs once they get merged.

I've meant more as a benchmark scenario - I'm hesitant to advertise benchmarks here because the inaccuracy / error is very high ( e.g. 49.599 ± 12.737 ) and well, the library really needs an update :D

On that side, is there something I could do to contribute ongoing maintenance?

It would be awesome to pick up and fix the issue described in #104
I've been procrastinating it for a year now. :)

Some stuff is on me, like checking out your docs PR.

Other than that, I would probably release a version (1.0.0?) that is similar to the current master and then remove all version-specific modules, maybe do some refactoring around Serializers, returning Consumer as a Resource etc. and release as 2.0.0 soon after. But that's more "mid-term"

@paualarco
Copy link
Member Author

I'm hesitant to advertise benchmarks here because the inaccuracy / error is very high ( e.g. 49.599 ± 12.737 ) and well, the library really needs an update :D

Yup, the error is quite high, do you know if using jmh it is possible to replicate benchmarks that the blog post is exposing?

They seem quite general, so in order to know how many elements were produced/consumed from kafka topics we could maybe just run the same scenarios in tests but in this case using an unlimited number of elements and setting a timeout limit (each test in a separate topic).
And then count them.

@Avasil
Copy link
Collaborator

Avasil commented Sep 27, 2020

Yup, the error is quite high, do you know if using jmh it is possible to replicate benchmarks that the blog post is exposing?

They probably have (or had) the benchmark there: https://github.com/akka/alpakka-kafka/tree/master/benchmarks/src

We don't have to do it exactly the same way

They seem quite general, so in order to know how many elements were produced/consumed from kafka topics we could maybe just run the same scenarios in tests but in this case using an unlimited number of elements and setting a timeout limit (each test in a separate topic).
And then count them.

I don't think we need to provide msg/s result, I think it's more important to see what kind of overhead we introduce.
If our result is 80% of plain Kafka result then it's valuable information. And let's say another library has 90% then we know that there is a room for improvements

Block


Fix


Binded connection


First benchmark for kafka producer


Added kafka benchmark strategy plan


Added sink and consumer benchmarks 


Producer results


Akka 


Removed references to akka


a


Final
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants