Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SANA support #1807

Open
recoilme opened this issue Nov 26, 2024 · 13 comments
Open

SANA support #1807

recoilme opened this issue Nov 26, 2024 · 13 comments

Comments

@recoilme
Copy link

May you please add minimal SANA support?
https://github.com/NVlabs/Sana

SANA train implementation very not optimal(
NVlabs/Sana#49

GPU intensive, has no cache/ Multi Aspect Ratio / Adafactor support(

We love you, mr Kohya!

@kohya-ss
Copy link
Owner

Sana is very interesting. However, I am concerned that the weight license is CC BY-NC-SA 4.0.
If this applies not only to the model but also to the generated images, the use cases for the model will be limited.

It seems that a pull request for optimization is being developed in the Sana repository, so I will look forward to that first.

@CorradoF
Copy link

Sana is very interesting. However, I am concerned that the weight license is CC BY-NC-SA 4.0. If this applies not only to the model but also to the generated images, the use cases for the model will be limited.

What's the difference to flux.1dev? Isn't it research only too?

@kohya-ss
Copy link
Owner

What's the difference to flux.1dev? Isn't it research only too?

From my understanding, the output of FLUX.1 dev is not covered by the license of FLUX.1 dev.

https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev

a. “Derivative” means any (i) modified version of the FLUX.1 [dev] Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the FLUX.1 [dev] Model, or (iii) any other derivative work thereof. For the avoidance of doubt, Outputs are not considered Derivatives under this License.

@CorradoF
Copy link

What's the difference to flux.1dev? Isn't it research only too?

From my understanding, the output of FLUX.1 dev is not covered by the license of FLUX.1 dev.

https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev

a. “Derivative” means any (i) modified version of the FLUX.1 [dev] Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the FLUX.1 [dev] Model, or (iii) any other derivative work thereof. For the avoidance of doubt, Outputs are not considered Derivatives under this License.

Interesting thank you, I always heard differently on reddit, my bad for not reading it myself.

If Nvidia doesn't change the license eventually there will be a pony sana. Having things already ready for that could be an advantage. Thank you anyway for the great work you did so far

@recoilme
Copy link
Author

Sana is very interesting. However, I am concerned that the weight license is CC BY-NC-SA 4.0.

Looks like training code under Apache now, not sure
NVlabs/Sana@335d445

@Bocchi-Chan2023
Copy link

NVlabs/Sana#54
code only, unfortunately.

@recoilme
Copy link
Author

@Muinez add very good code for MAR bucketing and simplified local train:
https://github.com/Muinez/Sana

i add code for model load/save and fix some small bugs
https://github.com/recoilme/Sana

And example how to use it and train small model from zero in bf16
https://github.com/recoilme/Sana/blob/main/TRAIN.md

@Bocchi-Chan2023
Copy link

@Muinez add very good code for MAR bucketing and simplified local train: https://github.com/Muinez/Sana

i add code for model load/save and fix some small bugs https://github.com/recoilme/Sana

And example how to use it and train small model from zero in bf16 https://github.com/recoilme/Sana/blob/main/TRAIN.md

Is it possible to improve the way the dataset cache is generated? When I tried the official code, it used all the system RAM and froze my PC.

@recoilme
Copy link
Author

recoilme commented Nov 30, 2024

Is it possible to improve the way the dataset cache is generated?

official code dont have cache

@Bocchi-Chan2023
Copy link

Is it possible to improve the way the dataset cache is generated?

official code dont have cache

Oh, maybe I'm misunderstanding something.
It started using several times more memory than Sigma fine tuning, so I feel something is wrong

@recoilme
Copy link
Author

recoilme commented Dec 1, 2024

current status
0.6b on A40(48Gpu) with 256 batch 512-1024 resolution
1.6b with 16 batch 512-2048 res

looks like success train from zero in bf16
https://wandb.ai/recoilme/potato/runs/8?nw=nwuserrecoilme

its not official training code.

@Manni1000
Copy link

but i have seen a guy that said it runns with 24gb vram if you use vae cashing is this true or false?

@recoilme
Copy link
Author

recoilme commented Dec 2, 2024

but i have seen a guy that said it runns with 24gb vram if you use vae cashing is this true or false?

true, i train batch 256 0.6b and batch 24 1.6b on 48 gpu with vae cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants