-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SageAttention #1820
Comments
This is very interesting. It appears to be a quantized version, so the accuracy may be inferior, but this shouldn't be an issue if we use I'm curious if Windows is supported. |
Triton is listed as a requirement, there are working wheels for Triton for Windows at https://github.com/woct0rdho/triton-windows |
I confirmed that sageattention works on Windows by following the steps here: https://www.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/ Documenting these steps would be difficult, so I will not do so. The problem is that some scripts have SDPA hardcoded, so I think introducing a global option would be a solution. |
I did do a quick test with SDXL and Flux loras, the approach to have it replace SDPA globally (as below) doesn't work due to incompatible head_dims.
|
Looks like a faster method similar to xformers and SDP
https://github.com/thu-ml/SageAttention
The text was updated successfully, but these errors were encountered: