-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: GPU utilization is low when running large batches on H100 #6560
Comments
definitely, it is listed in #5805 |
@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight? |
The command is nothing special, I think it's only |
Thanks for your quick reply. I see. It seems that you were using nsys to profile a single .py script. I thought that you were benchmarking the service. |
For benchmarking the server, I will add nvtx marker to control the code start and end domain, just before and after .step |
Proposal to improve performance
Hi all, I'm running vicuna 13B on H100 using fp8, and I find when batch size is large, say 64 or 96, the gpu utilization is low, about 60%, this is an important cause for the low performance.
I did some analysis, part of this is caused by the schedule and post process of requests.
Do you have any plans for improving this?
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
The text was updated successfully, but these errors were encountered: