-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix a bug in 2nd order schedulers when using in ensemble of experts config #5511
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Hey @yiyixuxu, seems like you adapted the original fix I shared in the issue description which does indeed work for some cases. But in our testing we found it didn't work in all cases. Here is the final code we ended up shipping in our fork of diffusers: playgroundai@1bab033 This line didn't make sense to us: Overall I am not fully satisfied with the latest code in our fork because I don't fully understand second order schedulers, but empirically, the idea is that when splitting the timesteps array, the left split must be of even length and the right spilt must be of odd length. I.e. if we have timesteps This seems to work for various combinations of |
I don't think it is same as your fix - it seems like the I would really appreciate if you can try out the code in this PR and share with me if you find examples where it fails :) to answer some of your questions
yes
heun is implemented based on the algorithem 2 in K-diffusion paper https://arxiv.org/abs/2206.00364. Feel free to take a look if you're interested. Basically for 2nd order scheduler, for each actual "step", we need to run the model foward pass twice to approximate the solution. In order to fit it in our API and to be consistent with our design philosophy, we decided to duplicate the timesteps. So if you intended to run |
src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py
Outdated
Show resolved
Hide resolved
@nhnt11 I think your code changes here are very similar / the same to what has been done here. I've tested this PR and it fully fixes the problem (nice job @yiyixuxu). What's happening here is the following:
This means that when we filter out all timesteps that are lower than a certain number so that we can start there, we always end up with an even number of
or
It is not possible to end up with something like Now, Heun schedulers are designed to always finish a full step in the middle of duplicated timesteps, e.g. => Therefore we always need to add plus one here to ensure that the heun step is finished. |
Thanks for the explanations! I am on the move so I definitely need to read everything again, but just want to call out that when I logged the timesteps for other second order schedulers (like DPM2), the duplication you mentioned was not seen. Only Heun had that duplication. Maybe that's also a bug? I am not sure. |
Some code: common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
dpm2sched = KDPM2DiscreteScheduler(**common_config)
dpm2sched.set_timesteps(25)
print(dpm2sched.timesteps) Result:
|
I'm guessing this is because with DPM2 there is this interpolation step so while it's not exactly duplicated, it's effectively the same principle. BTW thanks again for engaging as I learn this stuff, I want to make sure we are wiring things correctly in our production setup :) And big thanks for this PR! |
# mean that we cut the timesteps in the middle of the denoising step | ||
# (between 1st and 2nd devirative) which leads to incorrect results. By adding 1 | ||
# we ensure that the denoising process always ends after the 2nd derivate step of the scheduler | ||
num_inference_steps = num_inference_steps + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think num_inference_steps
will always be even. Consider the case where the scheduler has been set up by calling set_timesteps(25)
and then supplied denoising_start = 0.6
. The following code prints an even number:
sched = SCHEDULERS["DPM2"]
sched.set_timesteps(25)
denoising_start = 0.6
discrete_timestep_cutoff = int(
round(
sched.config.num_train_timesteps
- (denoising_start * sched.config.num_train_timesteps)
)
)
num_inference_steps = (sched.timesteps < discrete_timestep_cutoff).sum().item()
print(num_inference_steps)
This prints 20. For denoising_start=0.5, it prints 25. So num_inference_steps
may be either odd or even depending on num_inference_steps
and denoising_start
passed into the pipeline call.
I think we should add 1 to it only if it's even, and not if it's odd. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah DPM2 can actually be different indeed if it interpolates, (Heun can't)
Nice catch! I'll open a PR to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @patrickvonplaten, all testcases I threw at it now produce good images.
I am just curious why we don't need to do any logic in the base step (where denoising_end is supplied to the base SDXL pipeline without img2img) since I would expect that we would want to ensure that the final timestep there is a second order timestep, and as far as I can tell it still has the odd/even issue when splitting. Maybe I am wrong!
In any case, latest main
now produces good images for all my testcases like I said. Thank you! 🙏
…onfig (huggingface#5511) * fix * fix copies * remove heun from tests * add back heun and fix the tests to include 2nd order * fix the other test too * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * make style * add more comments --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>
…onfig (huggingface#5511) * fix * fix copies * remove heun from tests * add back heun and fix the tests to include 2nd order * fix the other test too * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * make style * add more comments --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>
…onfig (huggingface#5511) * fix * fix copies * remove heun from tests * add back heun and fix the tests to include 2nd order * fix the other test too * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * make style * add more comments --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>
This PR fix the bug reported in #5447 and allow users to 2nd order schedulers for ensemble of experts approach
I'm using an toy example to quickly illustrate the error we are having
3, 2, 1, 0
set_timesteps
, we repeat the index for 2nd order update (steps in[]
are 2nd order update) :3, [2], 2, [1], 1, [0], 0
1
, we will have3, [2], 2
[1], 1, [0], 0
-> You can see that the timesteps for refiner is wrong now, should be
2, [1], 1, [0], 0
this script now works