Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about the generated video quality when using more than 100 frames for training? #19

Open
junwenxiong opened this issue Jun 12, 2024 · 4 comments

Comments

@junwenxiong
Copy link

How about the quality when using more than 100 frames for training?

@tumurzakov
Copy link
Owner

tumurzakov commented Jun 13, 2024

I concentraited on training at 48 frames and achieved quite good result. Model become much smoother then 16 or 24 existing frame models. 48 frames because it is limit for my 24gb card on 720x480 resolution. 320 frames i got on a100 and 512x288 and it is very expensive.

I'm training on 100+ videos quite often but now i'm using tiles and add frameN word to conditioning. I tried another version with 3d conditioning as HxWxF for training on hd video with tiles but it is too expensive. Much better to infer in 1280x720 and then use SR.

And it is better to use LoRA then train model directly due to catastrophic forgetting but that is obvious.

@tumurzakov
Copy link
Owner

I forgot. 48 i trained first on 256x144 and then 512x288 and same for 96 frame model for ~100k steps. 96 model didnt allow use any other extensions as cnet or IPadapter bcause of memory limit. Now I made on my another project latentflow model ram offload and I think i could now. But I don't need. 48 model is useful for all my needs now

@tumurzakov
Copy link
Owner

About quality.

My trained models can't infer something useful without extensions like cnet or special lora. But I don't need it. Mostly I use AD for video stylization. My models are much smoother then adv3 for example because it trained on 24 frames but adv3 much better thrained and have more versatile output.

I have not so big dataset for training ~5000 videos. It is hard to make such dataset because of scene cuts and lack of descriptions. Cuts are very big problem. I spent much time to clean dataset from cuts.

@aihopper
Copy link

A slightly related question, did you try training for 8 frames? Thanks for sharing BTW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@tumurzakov @junwenxiong @aihopper and others