-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable-cascade support #1982
Comments
Happy to see this, not much I can add to the discussion other than a thank you for undertaking this. People like yourself creating and maintaining these tools are the reason so much content exists. Thank you |
Thank you so much!! |
Can someone share a toml config file for a simple one concept finetuning? I never do finetuning and apparently using .toml is the way to go now... and I have no clue how to configure it ;-) My 1st quest in making the GUI is getting to properly finetune a Stable Cascade model... and I need a proper .toml to run this example command:
Once I am successful I will be in a better place to judge how to put the GUI together... At 1st I tought I would just extend the finetuning tab to support Stable Cascade... but I think it might just be better to create a dedicated tab for it... still unsure... |
I have figured it out...
|
I have shared the test dataset in the stable_cascade branch. Look under the examples folder. You can play with it for now. |
If you find better parameters for better results please share. Training SC is hugely VRAM intensive. |
I have 4090 24gb. I'll dive into this today and report back. How many photos do you recommend I use for ideal use in Cascade to test this? |
I did my test with 8… I don’t think the disappointing result is due to that… I tried using other optimiser but I don’t have enough vram. |
Ok I feel like I'm close, but I'm not familiar this new code. Is there any basic info you can provide of where to put the training images and the format of sample .json for cascade? It's very different. |
I did provide everything in the stable_cascade branch. Look in the example folder in that branch. You will find the dataset, toml file for the dataset, etc. The new way of configuring the image for finetuning in the latest sd-scripts code is to use a .toml file... this is what the new SC Finetuning tab is configured to use... |
Thank you, I completely missed your examples folder. I just read everything here also: https://github.com/kohya-ss/sd-scripts/tree/stable-cascade as per your main page link. Went and read the stable cascade branch. (good info + docs folder for general fine-tune) but had to translate the Japanese. After replacing examples folder with additions/images/toml where do I place all of files? I am assuming either leave them there or to your empty "dataset" folder. Edit/Update: The .toml file in examples folder controls dataset location. |
The dataset can be anywhere. Simply edit the toml file to point to it and specify the repeats, resolution, etc. |
Ok, so the SC fine tuning tab always looks at the examples folder for the toml file. I will edit the toml with images path. Trying this out again today! |
Actually it does not. Just make sure to put the path to your toml in the SC Finetuning and it should work. It does not need to be in the example. |
Ok got it, I see it under SC Finetuning tab > folders > Dataset toml path (it looks for .json by default) "selected all filetypes" then chose the model files for each field, then the .toml. in the \examples\stable_cascade folder. It's training now. Not sure if you have plans to map the .toma file to gradio interface input fields with some instructions in there, but I think it would help a lot for novices like me in getting into this. Thanks again for working on this btw, it's pretty huge for the community having easy to use training like this imo. Edit: I keep editing my posts because my brain can't think straight the last few days and want the info to be as clear as possible for users. |
No worries, I keep editing mine to :-) as fast as a GUI to manage and create the toml dataset file it might be possible but I feel it might just be easier to just create one by hand. The complexity of building a gradio interface for that is beyond my current knowledge… but I am sure someone could do it. if someone want to take a crack at creating a toml dataset gradio gui class I will gladly add it to the interface. |
Thank you for sharing this. I will test it out later tonight after work and familly stuff ;-) I will update the content of the branch with you updates so it can help others who want to cut their teeth on this ;-) The sample you provided is actually pretty great. Probably a combination of your parameters, source data and training the text encoder |
That first one is much better likeness than I ever got. I really had to fight with text encoder unet model combinations. Maybe increasing text encoder learning rate a bit could help as it has the biggest impact? |
Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit. |
Ok, I spent I spent a couple more days testing. Tried a few things, no captions, changing classes, general tokens, training self with 60 photos like I would do on SDXL. Overall it's very difficult to figure out the right combination of unet model and text encoder model to use in comfyui, or what number of steps for 60 photos is the best for Cascade. Maybe this will change with future diffusers updates? To complicate things, the 13 ted photos look good at 800 steps in the samples, then falls off, but then got decent again at 1800 steps. It makes me wonder if it looks more like the Ted likeness if I did more epochs. It takes a long time to fully finetune Cascade at time of writing this it seems, and I'm struggling to figure it out. It didn't look like me overall and looked pretty undertrained at 3400 steps at batch size 3 with 60 photos. Thinking this is going to need a lot more steps, which doesn't seem in line with it "training faster than SDXL". I could increase learning rate on everything here again, but in SDXL that always seemed to make the results works. This guy is getting pretty decent results of his cat though at 8000 steps (but overfitting) and is using a very large 7 batch size https://www.reddit.com/r/StableDiffusion/comments/1azmhte/my_cat_in_different_stylesstablecascade_stagec/ with kohya scripts directly. I ran out of disk space though because I fell asleep and it was saving too often at 300 steps. |
I'm actually busy trying to training stable cascade with around a dataset of 180k images, although i am using onetrainer cause it seems to be less memory intensive for some reason. I have also noticed the fact that the training gets better and worse constantly as it trains, it's gonna be a while for my training to finish on a single gpu so no clue when i can actually show some results |
I sure will find out if it's a massive fail! Have you considered trying a very high learn rate maybe it trains differently than we're used to, it is supposed to be easier to train if i remember right. |
for me it generates samples 192x192 during training, trying to figure out why, since I set w and h 1024 |
The small samples is related to how the as-script code is actually being used. Nothing I can do. This is something only Koby’s can address… but given how heavy creating samples is I suspect this was by design. |
well as a temporary solution one can increase the --w and --h to 4096 for 4x size :D |
Note that only loads the unet, not the clip, so you aren't able to utilize the (more effective) text encoder training. |
Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow) and wondering if there's any point in doing this over just using unet, was hoping it might give better results. |
Maybe? I've been trying this with a model trained by the original Stable Cascade repo code and get errors as the model it produces isn't loadable as a clip model (I don't have a separate text encoder model from that process). It might work for kohya-ss trained models though - I'd be very interested to know if it does.
Yes, there's definitely a point, see this comment upthread for comparisons. For person-trained models, I'm unable to achieve any likeness with just the UNET (vs. generating with Stable Cascade repo that uses the trained CLIP). |
Yup saw that before. Sorry for confusion, I meant is there "any point" to using the official comfyui workflow vs unet workflow for this. I wonder if there would be a difference. |
Oh, thanks for clarifying, I think I understand what you meant now: is there any difference between saving off a checkpoint with the trained unet then using that saved checkpoint vs. using the trained unet? Seems unlikely that would affect the output, as it's the same model, clip, and VAE either way, but might save some memory or load time to use the saved off checkpoint. |
Yes, thanks for info. Something I just discovered I never knew about kohya gui. You can change prompt the prompt.txt in the samples folder as it's training to change the samples. This was pretty helpful. Finding it useful if you are saving a lot of checkpoints every however many epochs/steps and want to see something different. |
will this feature work on multiple gpus? |
When the controlnet training script will be release for stable cascade? |
Please try adaptive optimizers already, e.g Prodigy. I'm a newbie here, never even used those LR parameters. Also, I hope this new feature will work fine with those, so at least one test is in order. |
I will need to look into prodigy also. I've heard good things. Just want to give an update though, I tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps. Can't get sdxl dreambooth or full finetuning level results with a trained human. Tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out. |
Have you checked what one trainer is doind that seems like people are
getting really nice results using it?
…On Mon, Mar 11, 2024, 9:41 PM brentjohnston ***@***.***> wrote:
Just want to give an update, I tried all the stuff I mentioned earlier,
and tried to train a 60's celebrity with 74 photos on Cascade, tried a ton
of settings and text encoder mode/unet model combinations, LR settings,
steps.
I can't get sdxl dreambooth level results with a trained human or a
flexible model. I tried a ton of stuff over like 8 hours. I think now that
SD3 is coming out I may just wait it out, and focus on sdxl again for now.
I hope SD3 not as difficult to train as this is, but we'll see. I'm could
still be doing something wrong here. I can't produce the results I had
above with Ted.
—
Reply to this email directly, view it on GitHub
<#1982 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACHLPEUBQASHPKN73XAVNKTYX2IRRAVCNFSM6AAAAABDOHW3TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQGM2DQNJVGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Was thinking of trying that out but heard it may not train the text encoder like this does, Edit: Nm I believe it can I will give it a go though just to see how it compares, thanks! |
Some info from Kohya Cascade branch since things have stagnated here, if anyone want to try: Official learning rate for Cascade default is 1e-4 or (0.0001) and official settings use bf16 for training.
I finally got Onetrainer working to compare, will report back. Edit: Comparing to Kohya gui but had side issue. Onetrainer seems to have a custom-made diffusers to .safetensors converter after training and it's not great imo. I would recommend doing a manual conversion of a backup from |
I have spent several weeks trying to fine-tune Stable Cascade on a dataset of ~50K photos, and my results have a very similar finger-painted look. I've been using the sample code straight from Stable Diffusion. I guess I'll try Kohya's scripts. |
I have started work on supporting stable-cascade in the GUI,,, hope it will not be too much of a pain to implement. Let's discuss it in here.
The text was updated successfully, but these errors were encountered: