Support SageAttention #1820

sdbds · 2024-12-04T06:00:34Z

Looks like a faster method similar to xformers and SDP
https://github.com/thu-ml/SageAttention

kohya-ss · 2024-12-04T13:23:53Z

This is very interesting. It appears to be a quantized version, so the accuracy may be inferior, but this shouldn't be an issue if we use --fp8_base.

I'm curious if Windows is supported.

67372a · 2024-12-04T14:13:50Z

Triton is listed as a requirement, there are working wheels for Triton for Windows at https://github.com/woct0rdho/triton-windows

kohya-ss · 2024-12-09T12:05:23Z

I confirmed that sageattention works on Windows by following the steps here: https://www.reddit.com/r/StableDiffusion/comments/1h7hunp/how_to_run_hunyuanvideo_on_a_single_24gb_vram_card/

Documenting these steps would be difficult, so I will not do so.
However, if the user has installed SageAttention themselves, it may be possible to make the scripts compatible with SageAttention.

The problem is that some scripts have SDPA hardcoded, so I think introducing a global option would be a solution.

67372a · 2024-12-09T16:04:07Z

I did do a quick test with SDXL and Flux loras, the approach to have it replace SDPA globally (as below) doesn't work due to incompatible head_dims.

from sageattention import sageattn
import torch.nn.functional as F

F.scaled_dot_product_attention = sageattn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SageAttention #1820

Support SageAttention #1820

sdbds commented Dec 4, 2024

kohya-ss commented Dec 4, 2024

67372a commented Dec 4, 2024

kohya-ss commented Dec 9, 2024

67372a commented Dec 9, 2024

Support SageAttention #1820

Support SageAttention #1820

Comments

sdbds commented Dec 4, 2024

kohya-ss commented Dec 4, 2024

67372a commented Dec 4, 2024

kohya-ss commented Dec 9, 2024

67372a commented Dec 9, 2024