Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU offload hooks] hooks with overlapped transfers and computations #3267

Open
sayakpaul opened this issue Nov 29, 2024 · 0 comments
Open
Labels
feature request Request for a new feature to be added to Accelerate

Comments

@sayakpaul
Copy link
Member

diffusers relies on cpu_offload() a lot for implementing enable_sequential_cpu_offload(). It offloads the modules of a model to CPU when they are not being used and only pops them on to the GPU when it's needed for computation.

As one can notice, the cost of these frequent transfers blocks the underlying computation and hence it leads to quite a bit of increased latency. But it also tremendously helps in running very big models on consumer hardware (very important as a good diffusion model is actually composed of multiple big models).

So, the question is can we overlap communication with computation? https://gist.github.com/gau-nernst/9408e13c32d3c6e7025d92cce6cba140 implements a hook that leverages CUDA streams to implement the same functionality as enable_sequential_cpu_offload() but is significantly faster. See the results:

image

As @SunMarc and I were discussing, it'd be extremely cool to have a similar hook supported in accelerate so that we can make diffusion models, in particular, more accessible without having to completely give away speed.

Cc: @DN6 @a-r-r-o-w as well.

@sayakpaul sayakpaul added the feature request Request for a new feature to be added to Accelerate label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature to be added to Accelerate
Projects
None yet
Development

No branches or pull requests

1 participant