-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why mochi diffusers video output is worse than mochi official code? #10144
Comments
Official Diffusers Codemochi.mp4PR #10033 Close to Mergedmochi_new.mp4Quality has much improved , now fixed @foreverpiano |
Hi @foreverpiano yes that branch should address the quality issues. We're currently looking into whether we can remove the dependency on torch autocast and still match the original repo. |
Related issue, but I don't see MochiPipeline on release 0.31. How are you generating these videos? |
@jmahajan117 Install from |
I think the issue is still not fixed. The Mochi attention does not consider adding masks (which are needed for text padding). So the result is different from the original Mochi. The encoder query should have an attention mask because it is padded to 256. Padding with zeros is not equivalent to adding an attention mask when considering the softmax operation in attention.
Although this difference might be negligible since attention has some robustness, there are still factual errors compared to attention with cu_seq_len due to issues in the attention implementation. cc @hlky @DN6 @a-r-r-o-w |
@foreverpiano Could you use this branch? https://github.com/huggingface/diffusers/tree/mochi-quality
|
@a-r-r-o-w The implementation appears better since it takes the mask into consideration. I believe this addresses my previous concern. Let me look into this version video output. |
Describe the bug
The quality of video is worse.
Reproduction
Run the code with official prompt
Logs
No response
System Info
diffusers@main
Who can help?
@a-r-r-o-w @yiyixuxu
The text was updated successfully, but these errors were encountered: