Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: GPU utilization is low when running large batches on H100 #6560

Open
sleepwalker2017 opened this issue Jul 19, 2024 · 5 comments
Labels
performance Performance-related issues

Comments

@sleepwalker2017
Copy link

Proposal to improve performance

Hi all, I'm running vicuna 13B on H100 using fp8, and I find when batch size is large, say 64 or 96, the gpu utilization is low, about 60%, this is an important cause for the low performance.

I did some analysis, part of this is caused by the schedule and post process of requests.

Do you have any plans for improving this?

image

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`
@sleepwalker2017 sleepwalker2017 added the performance Performance-related issues label Jul 19, 2024
@youkaichao
Copy link
Member

definitely, it is listed in #5805

@VincentXWD
Copy link

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?
Thanks!

@sleepwalker2017
Copy link
Author

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?

The command is nothing special, I think it's only nsys profile python xxx.py. you can refer to nsys manual to see the usage.

@VincentXWD
Copy link

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?

The command is nothing special, I think it's only nsys profile python xxx.py. you can refer to nsys manual to see the usage.

Thanks for your quick reply. I see. It seems that you were using nsys to profile a single .py script. I thought that you were benchmarking the service.

@artetaout
Copy link

For benchmarking the server, I will add nvtx marker to control the code start and end domain, just before and after .step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

No branches or pull requests

4 participants