Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove explicit copy of model.forward() and instead check/store the id to unwrap #2104

Closed
wants to merge 2 commits into from

Conversation

muellerzr
Copy link
Collaborator

What does this PR do?

This PR solves another bug with fp8 where having a link to the original forward in the model would cause yet another memory spike:

(From the NVIDIA team):

we still experience OOM in fp8, but in model function call. In fp8, the forward step takes 15 GB of memory, while in bf16, it is 3GB.

The solution is to get rid of _original_forward_func (which was shown to be the issue) and to instead rely on the id of the function and don't make copies of it/copies of references.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@BenjaminBossan @SunMarc

@muellerzr muellerzr changed the title Try without hooks Remove explicit copy of model.forward() and instead check/store the id to unwrap Oct 31, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 31, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work identifying and fixing the issue.

Regarding the solution, it isn't really pretty, but there is probably no pretty solution.

What confuses me is why the previous code would lead to a memory spike but not this one. Clearly, it can't be the case that, by holding a reference to the original module's forward method, accelerate somehow inhibits garbage collection. Since we can still retrieve the original method through unwrapping, a reference to the original module has to exist.

It could be some kind of circular reference issue, but that should be detected by the GC. I wonder if this points to some kind of more fundamental bug. Also, it's really unclear to me how it related to fp8.

Regarding id, I very rarely make us of it. Can we be sure that it will do the right thing? I think it should work, but we should consider all possible edge cases.

@muellerzr muellerzr closed this Oct 31, 2023
@muellerzr muellerzr deleted the try-without-hooks branch October 31, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants