-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EMA #893
Add EMA #893
Conversation
amazing thank you so much |
Thank you for this! I will review and merge it sooner. |
i hope this can be tested and merged this will be only SD 1.5 right? |
tests on SD1:
It works with SDXL Lora training. SDXL finetuning with ema would need a lot of memory. It probably won't fit in 24GB. so it's not useful. |
but if working great. for best quality we can do on higher vram machine |
Trained lycoris on multiple characters with |
@vvern999 EMA was huge improvemtn when I was using Automatic1111 DreamBooth extension can you verify SDXL too? also how did you come up with value --ema_decay=0.9995 |
I know it is difficult, but please, for the love of god, STOP adding features to only some of the trainers. It causes so much unnecessary confusion, especially when it is silently not doing anything, like in this case when using --enable_ema with sdxl_train.py with this PR. If you don't want to add the functionality, inform the user that it's not doing anything. |
it adds huge quality to the SD 1.5 based training. why you complain? |
@IdiotSandwichTheThird Plenty of other training scripts like hcp-diffusion, everdream2, onetrainer, etc. support EMA for SDXL training. @FurkanGozukara |
I really prefer Kohya to support it can you add EMA for SDXL of Kohya DreamBooth? |
can we expect EMA for SDXL? |
@vvern999
|
Additionally, I could not get EMA to actually train with the train_db script, the resulting ema checkpoint ended up being the same as the base model, only the non-EMA trained. |
Either something is wrong with copying / updating weights or default decay value is too high. |
Okay with ema_decay 0.5, there is indeed a difference, I guess I set it way too high, possibly too because of the extremely low LR I choose. |
Just curious - what is "large" in this context? Dataset size varies greatly between lora training and finetuning, for example. Are we talking about 100 images or 10k? |
There's a new method of doing EMA. https://arxiv.org/abs/2312.02696 Closing this PR. It would be better to implement the method from that paper instead. |
I am looking forward to EMA implementation for SDXL when can we expect? |
ema is not added to the code yet right? |
I also regret that the EMA development was not implemented, because if it could have achieved a better result, it would have been better to use it. |
I tested EMA on OneTrainer and definitely improves quality. Also EMA can be made to run on CPU there so no extra VRAM However EMA on OneTrainer didn't work on SDXL but worked on SD 1.5 |
There's a nice explanation of how it should work - https://github.com/cloneofsimo/karras-power-ema-tutorial |
awesome |
@vvern999 @FurkanGozukara I am still interested in knowing if EMA for full SDXL finetuning is implemented or not ?? |
i think not implemented |
@FurkanGozukara @vvern999 Can you explain in details, as i am new to it. |
i didnt test ema yet at all on kohya |
Add EMA support. Usually it is used like this:
In this implementation by default both EMA and non-EMA weights are saved (model is saved in 2 files). Usually you only need EMA weights if you using EMA.
Usually EMA is used when training large models, but I tried it on short finetuning and Lora training, and it seems to work. though difference between EMA and normal weights is small.