Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable-cascade support #1982

Open
bmaltais opened this issue Feb 18, 2024 · 46 comments
Open

Stable-cascade support #1982

bmaltais opened this issue Feb 18, 2024 · 46 comments
Labels
enhancement New feature or request

Comments

@bmaltais
Copy link
Owner

I have started work on supporting stable-cascade in the GUI,,, hope it will not be too much of a pain to implement. Let's discuss it in here.

@bmaltais bmaltais added the enhancement New feature or request label Feb 18, 2024
@bmaltais bmaltais pinned this issue Feb 18, 2024
@GamingDaveUk
Copy link

Happy to see this, not much I can add to the discussion other than a thank you for undertaking this. People like yourself creating and maintaining these tools are the reason so much content exists. Thank you

@futureflix87
Copy link

Thank you so much!!

@bmaltais
Copy link
Owner Author

bmaltais commented Feb 18, 2024

Can someone share a toml config file for a simple one concept finetuning? I never do finetuning and apparently using .toml is the way to go now... and I have no clue how to configure it ;-)

My 1st quest in making the GUI is getting to properly finetune a Stable Cascade model... and I need a proper .toml to run this example command:

& accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 2 stable_cascade_train_stage_c.py `
  --mixed_precision bf16 --save_precision bf16 --max_data_loader_n_workers 0 --persistent_data_loader_workers `
  --gradient_checkpointing --learning_rate 1e-4 `
  --optimizer_type adafactor --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" `
  --max_train_epochs 10 --save_every_n_epochs 1 --save_precision bf16 `
  --output_dir e:\model\test --output_name sc_test `
  --stage_c_checkpoint_path "E:\models\stable_cascade\stage_c_bf16.safetensors" `
  --effnet_checkpoint_path "E:\models\stable_cascade\effnet_encoder.safetensors" `
  --previewer_checkpoint_path "E:\models\stable_cascade\previewer.safetensors" `
  --dataset_config "D:\kohya_ss\examples\stable_cascade\test_dataset.toml" `
  --sample_every_n_epochs 1 --sample_prompts "D:\kohya_ss\examples\stable_cascade\prompt.txt" `
  --adaptive_loss_weight

Once I am successful I will be in a better place to judge how to put the GUI together... At 1st I tought I would just extend the finetuning tab to support Stable Cascade... but I think it might just be better to create a dedicated tab for it... still unsure...

@bmaltais
Copy link
Owner Author

I have figured it out...

[[datasets]]
resolution = 1024
batch_size = 4
keep_tokens = 1
enable_bucket = true

  [[datasets.subsets]]
  image_dir = 'd:\kohya_ss\examples\stable_cascade\test_dataset'
  num_repeats = 10
  class_tokens = 'toy'
  caption_extension = '.txt'

@bmaltais
Copy link
Owner Author

bmaltais commented Feb 18, 2024

Look like I am successful in finetuning...

image

Finetuning as zxc class toy and prompting with zxc toy posing at the beach --W 800 --H 1200... so there is hope

Look like the best epoch was 7... after that it went downhill

@bmaltais
Copy link
Owner Author

I have shared the test dataset in the stable_cascade branch. Look under the examples folder. You can play with it for now.

@bmaltais
Copy link
Owner Author

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

image

@bmaltais
Copy link
Owner Author

If you find better parameters for better results please share. Training SC is hugely VRAM intensive.

@311-code
Copy link

311-code commented Feb 19, 2024

I have 4090 24gb. I'll dive into this today and report back.

How many photos do you recommend I use for ideal use in Cascade to test this?

@bmaltais
Copy link
Owner Author

I did my test with 8… I don’t think the disappointing result is due to that… I tried using other optimiser but I don’t have enough vram.

@gesen2egee
Copy link
Contributor

image
Maybe because this

@bmaltais
Copy link
Owner Author

Using the latest updated code in sd-scripts produce better results... still not perfect... kohya is working on allowing to train stage_b... hoping this will fix the issue with the final look:

image

@311-code
Copy link

311-code commented Feb 21, 2024

Ok I feel like I'm close, but I'm not familiar this new code. Is there any basic info you can provide of where to put the training images and the format of sample .json for cascade? It's very different.

@bmaltais
Copy link
Owner Author

I did provide everything in the stable_cascade branch. Look in the example folder in that branch. You will find the dataset, toml file for the dataset, etc. The new way of configuring the image for finetuning in the latest sd-scripts code is to use a .toml file... this is what the new SC Finetuning tab is configured to use...

@311-code
Copy link

311-code commented Feb 22, 2024

Thank you, I completely missed your examples folder.

I just read everything here also: https://github.com/kohya-ss/sd-scripts/tree/stable-cascade as per your main page link. Went and read the stable cascade branch. (good info + docs folder for general fine-tune) but had to translate the Japanese.

After replacing examples folder with additions/images/toml where do I place all of files? I am assuming either leave them there or to your empty "dataset" folder. Edit/Update: The .toml file in examples folder controls dataset location.

@bmaltais
Copy link
Owner Author

The dataset can be anywhere. Simply edit the toml file to point to it and specify the repeats, resolution, etc.

@311-code
Copy link

311-code commented Feb 22, 2024

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file. I will edit the toml with images path.

Trying this out again today!

@bmaltais
Copy link
Owner Author

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file, got it. I will edit the toml with images path.

Trying this out again today!

Actually it does not. Just make sure to put the path to your toml in the SC Finetuning and it should work. It does not need to be in the example.

@311-code
Copy link

311-code commented Feb 22, 2024

Ok got it, I see it under SC Finetuning tab > folders > Dataset toml path (it looks for .json by default) "selected all filetypes" then chose the model files for each field, then the .toml. in the \examples\stable_cascade folder. It's training now.

Not sure if you have plans to map the .toma file to gradio interface input fields with some instructions in there, but I think it would help a lot for novices like me in getting into this.

Thanks again for working on this btw, it's pretty huge for the community having easy to use training like this imo.

Edit: I keep editing my posts because my brain can't think straight the last few days and want the info to be as clear as possible for users.

@bmaltais
Copy link
Owner Author

No worries, I keep editing mine to :-)

as fast as a GUI to manage and create the toml dataset file it might be possible but I feel it might just be easier to just create one by hand. The complexity of building a gradio interface for that is beyond my current knowledge… but I am sure someone could do it.

if someone want to take a crack at creating a toml dataset gradio gui class I will gladly add it to the interface.

@311-code
Copy link

311-code commented Feb 23, 2024

I had some luck with slightly higher quality outputs and a few issues, here are some samples:

1
2
3
4

The first one has a prompt censorship joke if you look closely haha.

Problems: Likeness not completely there, samples during training look good at 800 steps (Should be telling you epochs to make this easier, apologies) but not as good a 800 steps in comfyui. So I used 1800 checkpoint. Another issue is samples stuck at 192x192.
samples-during-training

For some reason I have to use an overtrained checkpoint with a text_model for clip node that is less steps than the checkpoint to get decent results, or even mix a text_model from another training of Ted gets better results somehow.

@bmaltais
Copy link
Owner Author

bmaltais commented Feb 23, 2024

Thank you for sharing this. I will test it out later tonight after work and familly stuff ;-) I will update the content of the branch with you updates so it can help others who want to cut their teeth on this ;-)

The sample you provided is actually pretty great. Probably a combination of your parameters, source data and training the text encoder

@bmaltais
Copy link
Owner Author

Interesting results...

Unet and TE:

image

TE only:

image

UNet only:

image

Conclusion... TE has the most importance as far as likeness goes... but without the trained UNet the result is quite fuzzy...

@311-code
Copy link

311-code commented Feb 23, 2024

That first one is much better likeness than I ever got.

I really had to fight with text encoder unet model combinations. Maybe increasing text encoder learning rate a bit could help as it has the biggest impact?

@bmaltais
Copy link
Owner Author

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

@311-code
Copy link

311-code commented Feb 26, 2024

Ok, I spent I spent a couple more days testing. Tried a few things, no captions, changing classes, general tokens, training self with 60 photos like I would do on SDXL.

Overall it's very difficult to figure out the right combination of unet model and text encoder model to use in comfyui, or what number of steps for 60 photos is the best for Cascade. Maybe this will change with future diffusers updates? To complicate things, the 13 ted photos look good at 800 steps in the samples, then falls off, but then got decent again at 1800 steps. It makes me wonder if it looks more like the Ted likeness if I did more epochs.

It takes a long time to fully finetune Cascade at time of writing this it seems, and I'm struggling to figure it out. It didn't look like me overall and looked pretty undertrained at 3400 steps at batch size 3 with 60 photos. Thinking this is going to need a lot more steps, which doesn't seem in line with it "training faster than SDXL". I could increase learning rate on everything here again, but in SDXL that always seemed to make the results works.

This guy is getting pretty decent results of his cat though at 8000 steps (but overfitting) and is using a very large 7 batch size https://www.reddit.com/r/StableDiffusion/comments/1azmhte/my_cat_in_different_stylesstablecascade_stagec/ with kohya scripts directly.

I ran out of disk space though because I fell asleep and it was saving too often at 300 steps.

@vgaggia
Copy link

vgaggia commented Feb 26, 2024

I'm actually busy trying to training stable cascade with around a dataset of 180k images, although i am using onetrainer cause it seems to be less memory intensive for some reason.

I have also noticed the fact that the training gets better and worse constantly as it trains, it's gonna be a while for my training to finish on a single gpu so no clue when i can actually show some results

@vgaggia
Copy link

vgaggia commented Feb 26, 2024

I sure will find out if it's a massive fail!

Have you considered trying a very high learn rate maybe it trains differently than we're used to, it is supposed to be easier to train if i remember right.

@betterftr
Copy link

for me it generates samples 192x192 during training, trying to figure out why, since I set w and h 1024

@bmaltais
Copy link
Owner Author

The small samples is related to how the as-script code is actually being used. Nothing I can do. This is something only Koby’s can address… but given how heavy creating samples is I suspect this was by design.

@betterftr
Copy link

well as a temporary solution one can increase the --w and --h to 4096 for 4x size :D

@jordoh
Copy link

jordoh commented Mar 1, 2024

Someone posted a workflow for converting the unet models here to work with official comfyui workflow (to get rid of that error) Simple enough. I've been out of town but will try it when I get back.

comfyanonymous/ComfyUI#2893 (comment)

Note that only loads the unet, not the clip, so you aren't able to utilize the (more effective) text encoder training.

@311-code
Copy link

311-code commented Mar 1, 2024

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

@jordoh
Copy link

jordoh commented Mar 1, 2024

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

Maybe? I've been trying this with a model trained by the original Stable Cascade repo code and get errors as the model it produces isn't loadable as a clip model (I don't have a separate text encoder model from that process). It might work for kohya-ss trained models though - I'd be very interested to know if it does.

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

Yes, there's definitely a point, see this comment upthread for comparisons. For person-trained models, I'm unable to achieve any likeness with just the UNET (vs. generating with Stable Cascade repo that uses the trained CLIP).

@311-code
Copy link

311-code commented Mar 2, 2024

Yup saw that before. Sorry for confusion, I meant is there "any point" to using the official comfyui workflow vs unet workflow for this. I wonder if there would be a difference.

@jordoh
Copy link

jordoh commented Mar 3, 2024

Yup saw that before. Sorry for confusion, I meant "any point" to comfyui workflow vs unet workflow if we got unet and clip working in both workflows. I wonder if there would be a difference.

Oh, thanks for clarifying, I think I understand what you meant now: is there any difference between saving off a checkpoint with the trained unet then using that saved checkpoint vs. using the trained unet? Seems unlikely that would affect the output, as it's the same model, clip, and VAE either way, but might save some memory or load time to use the saved off checkpoint.

@311-code
Copy link

311-code commented Mar 4, 2024

Yes, thanks for info.

Something I just discovered I never knew about kohya gui. You can change prompt the prompt.txt in the samples folder as it's training to change the samples.

This was pretty helpful. Finding it useful if you are saving a lot of checkpoints every however many epochs/steps and want to see something different.

@segalinc
Copy link

segalinc commented Mar 4, 2024

will this feature work on multiple gpus?

@sapkun
Copy link

sapkun commented Mar 5, 2024

When the controlnet training script will be release for stable cascade?

@paboum
Copy link
Contributor

paboum commented Mar 8, 2024

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

Please try adaptive optimizers already, e.g Prodigy. I'm a newbie here, never even used those LR parameters. Also, I hope this new feature will work fine with those, so at least one test is in order.

@311-code
Copy link

311-code commented Mar 12, 2024

I will need to look into prodigy also. I've heard good things.

Just want to give an update though, I tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps.

Can't get sdxl dreambooth or full finetuning level results with a trained human. Tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out.

@segalinc
Copy link

segalinc commented Mar 12, 2024 via email

@311-code
Copy link

311-code commented Mar 13, 2024

Was thinking of trying that out but heard it may not train the text encoder like this does, Edit: Nm I believe it can

I will give it a go though just to see how it compares, thanks!

@311-code
Copy link

311-code commented Mar 15, 2024

Some info from Kohya Cascade branch since things have stagnated here, if anyone want to try:

Official learning rate for Cascade default is 1e-4 or (0.0001) and official settings use bf16 for training.

The first time, specify --text_model_checkpoint_path and --save_text_model to save
the Text Encoder weights. From the next time, specify --text_model_checkpoint_path to load the saved weights.

Note:

A quick clarification, Stable Cascade uses Stage A & B to compress images and Stage C is used for the text-conditional
learning. Therefore, it makes sense to train a LoRA or ControlNet only for Stage C. You also don't train a LoRA or
ControlNet for the Stable Diffusion VAE right?

If your GPU allows for it, you should definitely go for the large Stage C, which has 3.6 billion parameters.
It is a lot better and was finetuned a lot more. Also, the ControlNet and Lora examples are only for the large Stage C at the moment.
For Stage B the difference is not so big. The large Stage B is better at reconstructing small details,
but if your GPU is not so powerful, just go for the smaller one.

I finally got Onetrainer working to compare, will report back.

Edit: Comparing to Kohya gui but had side issue. Onetrainer seems to have a custom-made diffusers to .safetensors converter after training and it's not great imo. I would recommend doing a manual conversion of a backup from diffusers loader node to checkpoint save node in comfyui if comparing.

@mhaines94108
Copy link

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

image

I have spent several weeks trying to fine-tune Stable Cascade on a dataset of ~50K photos, and my results have a very similar finger-painted look. I've been using the sample code straight from Stable Diffusion. I guess I'll try Kohya's scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

13 participants