Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Validation Loss to detect Overtraining #193

Open
Naegles opened this issue Feb 15, 2023 · 9 comments
Open

Adding Validation Loss to detect Overtraining #193

Naegles opened this issue Feb 15, 2023 · 9 comments

Comments

@Naegles
Copy link

Naegles commented Feb 15, 2023

I saw that this repo has added the ability to use a validation loss to help figure out the optimal amount of training. Might be an interesting addition.

https://github.com/victorchall/EveryDream2trainer/blob/main/doc/VALIDATION.md

@shirayu
Copy link
Contributor

shirayu commented Feb 18, 2023

Good idea.
I believe that it is not enough to simply observe loss/average, since the weights are continually being updated.

@kohya-ss
Copy link
Owner

Thank you for the suggestion!
I was wondering if validation loss is a valid metric since SD loss is so fluctuating, but the document in EveryDream repo is very interesting.

I will consider to implement the validation loss, but it will take some time...

@uwidev
Copy link

uwidev commented Mar 30, 2023

I think this would be a good means to determine learning rates. There's a lot of speculation as to the proper rate, but with validation, we should be able to prove appropriate learning rates. The question then would be if that value would be universally good when training something specific (e.g. style), or good for everything. I think there's a lot of potential to be gained with this.

@slashedstar
Copy link

So... is this still being considered or was it scraped? @kohya-ss

@kohya-ss
Copy link
Owner

kohya-ss commented May 7, 2023

Sorry it has taken so long. This is on the task list, but I have not been able to get to it due to priority issues with other tasks.

It also needs to enhance the dataset classes, it will take a time...

@AMorporkian
Copy link

I have implemented this in a fork I originally created to implement the new noise scheduling functions. I won't create a PR because the massive refactoring I have done is largely experimental, but you can check it out here. I'm doing a run with wandb right now to test LoRA settings.

https://github.com/AMorporkian/kohya_ss/tree/hypertune

Sorry for the absolutely terrible commits, I was just messing around and the messages are not at all descriptive. If there isn't any movement on this by next week, I'll take some time to actually make some proper code that would integrate better.

@kmacmcfarlane
Copy link

This seems like a really cool idea. I have to make a lot of different training runs to figure out what the optimum settings are for a given training set. Even doing things in epochs, number of training images seems to still affect the results somewhat.

@rockerBOO
Copy link
Contributor

I have implemented this in a fork I originally created to implement the new noise scheduling functions. I won't create a PR because the massive refactoring I have done is largely experimental, but you can check it out here. I'm doing a run with wandb right now to test LoRA settings.

https://github.com/AMorporkian/kohya_ss/tree/hypertune

I borrowed some of your ideas (random_split and collate updates) and put a copy/paste validation loss together. Helped to not have to create a custom dataset to get this ball rolling.

If anyone else is interested in testing, see #914 . Expanding into validation datasets would be the next step but this works well with minimal changes.

@jacquesfeng123
Copy link

any updates on this :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants