-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching latents and Text Encoder outputs with multiple GPUs #1690
Conversation
awesome. this will work automatically when multi gpu used? |
Yes😀 In FLUX.1, the Text Encoder cache also takes time, so we've made it compatible with multiple GPUs. We'd appreciate it if you could test it. Please note that |
@kohya-ss those --highvram and --lowvram made 0 impact on my testings previously what they do actually? I tested both for FLUX Fine tuning and FLUX LoRA training I can test FLUX LoRA multi gpu caching - fine tuning still requires 80 gb GPUs, fused backward pass not working |
Currently,
I've done some more research recently, but so far I don't know of any way to improve memory usage with multi-GPU fine tuning other than DeepSpeed or FSDP. |
@kohya-ss you are awesome thank you so much |
I hope the full version of GPT 01 will help find the right solution very soon. |
I hope it is possible to push the cached data directly to HF, so that pulling latents directly in the cloud platform will not take up cache time, and more hard disk space can be saved on large datasets. |
Certainly, how to handle large datasets is a big challenge. I don't have much experience working with large-scale datasets, but I think we should also consider using web datasets, etc. Also, since the cost of AE/VAE processing decreases relatively during large-scale training, it may not be necessary to cache latents. |
No description provided.