You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a possibility to configure multiple users / concurrent request sessions?
I'd like to simulate how the different backends behave if not 1 user, but e.g. 8 users concurrently access the LLM.
I know there is the possibility to configure batches, but there should be a performance difference if e.g. 1 user sends a batch with 8 requests or 8 users independently send a batch with 1 request each. Please correct me if that is not true :-)
Thanks a lot and appreciate the work on optimum-benchmark!
The text was updated successfully, but these errors were encountered:
Yes that's possible, it will have to be integrated on a backend level but for example if you look at the py-txi backend, you'll see that it has an async method (that's converted into a sync one for our batched inference scenario). That method can be used with a scenario that specifically targets server-like concurrency, that can have as configuration the number of concurrent users instead of batch size, etc.
Overall this will mostly require an InferenceServerScenario that implements the logic and some async methods (async_forward, async_generate, etc) in the backends that you wanna target.
I have already discussed this with @mht-sharma and it could be a great feature to compare server backends (TGI, vLLM, TRT-LLM) more adequately.
Hi there :-)
Is there a possibility to configure multiple users / concurrent request sessions?
I'd like to simulate how the different backends behave if not 1 user, but e.g. 8 users concurrently access the LLM.
I know there is the possibility to configure batches, but there should be a performance difference if e.g. 1 user sends a batch with 8 requests or 8 users independently send a batch with 1 request each. Please correct me if that is not true :-)
Thanks a lot and appreciate the work on optimum-benchmark!
The text was updated successfully, but these errors were encountered: