-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance in sync client #446
Comments
Hey, what are your clusters index settings? I would expect this with default cluster settings as shards in an index are thread bound as well. A default index will have 3 shards and that's the max number of threads that can be used for ingestion. |
Are you spawning threads and creating instances of the client? Can you please post a repro? Is your actual usage of the client sync or async? I do think that a client should be able to push the server into a state where it starts returning 429s. |
@dblock @dtaivpp , I checked the cluster and the CPU util was not going above 60% even at the maximum thread. That is where I think its the problem where client is not able to push the requests. I also tried changing the pool_max_size from 10 to 20 even then there is no increase in the requests that can be sent to the cluster. |
Are you using a sync client or an async client? My guess is that you're using a sync client. I wrote a benchmark that compares synchronous, threaded and asynchronous I/O in opensearch-py. I was not able to push the synchronous nor the threaded version of the client past a certain limit, no-matter the amount of Sync vs. threaded inserts make no difference. For 5x50 inserts async outperforms sync 1.4x.
5x250 items, 2.5x improvement
5x1000 items, 3.5x improvement
We may be able to improve the synchronous client performance on insert, but I don't (yet) know how. |
Unfortunately not, and it's by design. The default |
@dblock can we use some other implementation of HttpConnection. Generally we use |
@dblock here is my thinking, as a user of the OpenSearch client I feel that OpenSearch sync client is easy to use. Most of the services have the same thing. But if we cannot use threading with simple sync clients that a bumber. Because async client is not easy. |
Yes it does. That one does allow changing |
What are other open-source client examples that talk HTTP that don't exhibit this kind of problem? PS: I did find https://number1.co.za/how-to-speed-up-http-calls-in-python-with-examples/ that claims that requests-futures can get close to async performance, which could be an idea for a reimplementation, but it ain't a small effort. |
@dblock can we check milvus. |
Seems to be using grpc, whole other ball game. |
Let me check if there are any other clients I can find out which are not grpc. This seems to be a standard thing and a fix should be there. |
Can you run multiple pythons with multiple clients? And there is also bulk, which helps max out the connection. |
@wbeckler No, we don't want to run multiple clients, as that defeats the purpose of what we are trying to benchmark. Also, the request which we are trying to send is for query and not indexing. Hence bulk is not an option. |
Thanks @navneet1v. I do think the problem is clear, we want the client to not block on I/O across requests. |
Yes that is correct. |
Is it okay to use python multiprocessing instead of multithreading in your use case? |
@wbeckler we can use that, but given that sync client is easy to use and is already integrated in various ann benchmarks, this is the reason why we want to keep things simple. |
I'm suggesting that the benchmarking or high-volume query application rely on multiprocessing, and that way the python client would not be blocking on IO. So instead of multithread: {do python client stuff}, you would have multiprocess: {do python client stuff} I am not a python expert, so I might be way off here, but that's how it seems it should work from my naive perspective. |
I am going to take another look at this, assigning to myself. |
Updated benchmarks in https://github.com/dblock/opensearch-py/tree/benchmarks/samples/benchmarks.
We are getting an improvement by using threads (1.7x with 2, 5.1x with 8, with diminishing returns further, 5.5x with 32). If we set the
Sync vs. async. Async really just depends on the pool size (3 iterations).
|
I've spent a lot of time on the synchronous client, having written multiple benchmarks and digging through the code to understand how pooling works. I've varied number of threads, number of clients, search and ingestion, data sizes, switching between the default The simplest benchmark is to do
This does show that Overall, these benchmarks are pretty conclusive that the client does not have the limitation as described and we need to dig deeper into how it's being used to see where the bottleneck is. |
@dblock Thanks for doing the deep-dive. Most of the time as a user we go by seeing the examples. If changing the connection client is the only problem we should just fix the examples and our documentation. Also can we mark RequestsHttpConnection as deprecated and more things around using this connection client can degrade the performance. |
In #535 I documented the connection pools and made sure I don't think we should deprecate |
Make sense. My whole idea was we should provide our recommendation for best performance and also the challenges in using other connection libs |
Its great to see the benchmarks for this client! @dblock Anyway we can make these benchmarks part of this repo or part of opensearch-benchmark? |
I renamed the issue to "improve performance in sync client". @navneet1v I know you've been running some anecdotal benchmarks with k-nn. What are your most recent numbers between RequestsHttpConnection, Urllib3HttpConnection, multiple clients and async? |
@dblock Here is results.
With Request HTTP Connection ClientMulti Threading
Multi Processing
With UrlLib3Multi Threading
Multi Processing
|
Thanks, this is helpful as a comparison of multithreading vs. multiprocessing and thank you for the conclusions, I agree with them. Here's a more readable version.
tl;dr with urllib3 we're seeing a 30% improvement over where this issue started We can conclude that urllib3 is 14% better than requests in pure transport. If we are doing more work, urllib performs better by 30% than requests (likely because it has less blocking behavior). Finally, because because of additional processing multithreading is roughly 2x slower that multiprocessing with urllib3. We need to move to proper benchmarking from here I think, with the goal of reducing the 156ms p99 in multithreading above closer to the 84.30ms multiprocessing. We have 3 kinds of requests possible:
It's clear with @navneet1v's numbers that the blocking nature of the client has outsized impact on overall processing. So apart from pure client transport benchmarking along the (3) variations above, we will need to add processing to simulate realistic scenarios. |
What is the bug?
I am using opensearch-py client version 2.2.0 to do search on the a cluster. I started using the multi-threading to push more and more request to the cluster. But I see that after 4 threads we are not able to increase the throughput. I see that no throttling happening from Server, but client is not able to push the request.
How can one reproduce the bug?
Create an OpenSearch cluster, and use OpenSearch-py client with multi-threading to start doing then queries. Thread count we tested with 1,2,4,8,16,32,48,96 threads.
What is the expected behavior?
Expectation is OpenSearch client should be able to push through more and more request and if OpenSearch cluster is not able to take the request we should get a throttling exception.
What is your host/environment?
Linux.
Do you have any screenshots?
NA
Do you have any additional context?
I tried increasing the worker pool size from 10 to 20, but that didn't help.
The text was updated successfully, but these errors were encountered: