You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I noticed that you're able to train on more than 300 frames using an A100 GPU. I'm curious about your training process - are you only training the to_q or the entire motion module?
I've been using the official AnimateDiff training script, and training on just 32 frames consumes about 30GB of VRAM. I'm wondering if you've implemented any optimizations to improve efficiency. It would be helpful if you could share some details about your training setup and any techniques you're using. Thanks!
The text was updated successfully, but these errors were encountered:
Now i training lora 1024x576x3 and it tooks 23.8 GB on my 3090.
memory offload everything that don't needed for train (vae, text_encoder)
precache samples (encode latents and embeddings into pth)
keep an eye on gradients
i'm using my own framework latentflow https://github.com/tumurzakov/latentflow. I could be hard to understand and use it, but you could try to look at train code. May be it will be useful for you
Thanks for the suggestions! I'll give them a try. I've noticed that the official AnimateDiff code doesn't use gradient checkpointing by default, and it can save lots of GPU memory.
Hello, I noticed that you're able to train on more than 300 frames using an A100 GPU. I'm curious about your training process - are you only training the
to_q
or the entire motion module?I've been using the official AnimateDiff training script, and training on just 32 frames consumes about 30GB of VRAM. I'm wondering if you've implemented any optimizations to improve efficiency. It would be helpful if you could share some details about your training setup and any techniques you're using. Thanks!
The text was updated successfully, but these errors were encountered: