-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in fp8 causing OOM (and potentially 3x vRAM usage) #2089
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good find, thanks for investigating and fixing this.
The movement of the code block in accelerator.py
is presumably due to the order in which convert_model
needs to be called?
(note for later: the 2nd error message in that block is messed up)
@BenjaminBossan correct, it's that note about how we can save extra vram by converting to fp8 before moving the model to CUDA |
So that's how it should've been done... Thank you, it's amazing to see this is finally resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating this @muellerzr!
What does this PR do?
Doing
.copy()
like we were before leads to a huge memory leak duringfp8
, and also putting the model on CUDA after converting the layers can reduce some memory.Example benchmark:
Model:
"meta-llama/Llama-2-7b-hf"
Note: on FP8 weights are in bf16 beforehand
Before fix:
Memory in FP32: 25.23 GB
Memory on BF16: 12.61 GB
Memory on FP8: 37.23 GB <- Warning sign that something is amiss!
After fix:
Memory on FP8:
.cuda()
before (current implementation in.prepare()
): 12.87 GB.cuda()
after: 12.61 GBFixes # (issue)
(Finally) fixes #1430
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@BenjaminBossan @LysandreJik