-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if the buffers fit GPU memory after device map auto inferred #2412
Conversation
60b749c
to
fb87eb1
Compare
fb87eb1
to
a3e5732
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
a3e5732
to
a1eac15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into the buffer issue @notsyncing ! Could you add some tests to see that the buffer calculation is right and that we raise the error when the buffers don't fit the remaining space on the gpu. Can you have a second look @muellerzr ?
Hi @notsyncing, thanks for your work ! I've replied to your questions. This is not pressing but are you planning to finish this PR ? Otherwise, I can finish it and add you as co-author =) |
Yes, I'm planning to finish it. But for the multi-gpu scenario, I cannot test it because I have only one gpu. Would you mind helping testing this? Thanks! |
We can run the relevant tests for you when you are ready/the normal CI here is green and report back any failures (on a multi-gpu runner), yes :) |
a1eac15
to
0e4c945
Compare
Now I added a check to see if the buffer can fit any GPU, and only print the warning when every GPU does not fit. Will this work in your scenario? |
As for the tests, it seems difficult to write a test for this if it only prints a warning. I propose that we could use a environment variable (like if os.getenv("ACCELERATE_BUFFER_SIZE_CHECK") == "raise":
raise ValueError("xxx")
else:
logger.warn("yyy") Would this be better? |
You can use |
Also there's no need to force push, we squash the commit history when merging |
Thanks ! Let's do that first and see the feedback. We can always change this behavior if needed. Most probably, the user will receive this warning if all his gpus are full and some of the layers are offloaded to cpu/disk. Hence, the buffer won't fit any of the gpus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iteration. I left a few comments. Could you also add tests that would trigger the warning for a single and multi gpu setup ? We can test the multi gpu test if needed !
0e4c945
to
5a3dc39
Compare
@SunMarc I have added two test cases in |
5a3dc39
to
15f6544
Compare
15f6544
to
01c7a56
Compare
* For some models, like TheBloke/WizardCoder-33B-V1.1-GPTQ, contain a huge buffer, which may cause OOM on GPU memory if not using offload_buffers. This commit adds a check for such case.
01c7a56
to
2a278f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this PR and adding the tests ! I ran them and they passed. I left a couple of minor comments/suggestions. We can merge this PR after fixing these =)
@SunMarc all fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating ! LGTM. I re-opened a comment I made. Could you have a quick look ? Also, can you have a second look @muellerzr ?
Added the missing assertions. I did not notice them before, sorry 😂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Overall this looks sound to me :)
What does this PR do?
Hello, I'm trying using
accelerate
to offload a large model (https://huggingface.co/TheBloke/WizardCoder-33B-V1.1-GPTQ) to CPU, with following code (requires #2383 if using Intel GPU, and huggingface/transformers#28755):I have one Intel Arc A770 16GB GPU in my machine, but the code above always OOM on the GPU.
After some digging, I found that this model
TheBloke/WizardCoder-33B-V1.1-GPTQ
contains a huge buffer (model is 17GB, but buffers are more than 16GB), so it will not fit into my GPU at all withoutoffload_buffers
.So I created this PR, to check if the buffer can fit on the GPU, and raise an exception for such case.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.