[refactor] FreeInit #6644

a-r-r-o-w · 2024-01-19T21:15:20Z

What does this PR do?

Removes the additional generator that was added for freeinit and reduces LoC a bit. I figured the second generator wasn't required and you can still get consistent generation by just using the one passed for inference. Results are the same as before:

Who can review?

@DN6

a-r-r-o-w · 2024-01-19T21:36:33Z

@DN6 @sayakpaul In light of #6571, I propose moving freeinit and future quality/efficiency-related utilities into separate mixins (Sayak said something similar in his review comment in 6571). As this technique can be applied to any video pipeline for better consistency (at the cost of a few more sampling iterations), I think it would be good to add support to the major ones. Moving it to a separate mixin will greatly reduce the number of copied lines and the only changes that would be required would be abstracting the denoise loop and adding an if-statement. What do you think?

DN6 · 2024-01-22T07:49:40Z

@a-r-r-o-w Works for me. You can define the mixin inside the animatediff pipeline module.

a-r-r-o-w · 2024-01-23T06:50:42Z

@DN6 What do you think about the refactor? To use FreeInit in other pipelines, the following changes would be required:

Add FreeInitMixin to base classes the pipeline derives from.
Abstract the denoising loop into separate method, and branch to the freeinit loop if enabled or otherwise default denoise loop. Note that the current implementation of FreeInitMixin has a magical call to self._denoise_loop and is not explicit. We should probably rethink this.
denoise_args is a dict required to hold parameters to the actual denoising loop. This seems okay because we know what arguments the actual denoise loop would take. FreeInit only requires that it contains latents, timesteps and num_inference_steps at least.

yiyixuxu · 2024-01-24T01:42:42Z

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

a-r-r-o-w · 2024-01-24T02:33:19Z

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

The reason for adding the mixin is that the technique can be applied to all AnimateDiff, TextToVideoSynth, Modelscope and maybe other video models that are currently or will be supported by diffusers. At the cost of few extra steps, freeinit allows much more temporally coherent videos to be generated and I've been using it on ComfyUI for a while now, and I believe many others are too. Combined with sliding window context support for AnimateDiff, you can generate really high quality, temporally coherent and long videos without random jumpy-ness.

The other way to go about adding it to different pipelines, if it's okay on the diffusers teams' end, would be to copy-paste freeinit related logic but IMO it would lead to a lot of code duplication (~400 lines per pipeline), which seems quite unnecessary.

yiyixuxu · 2024-01-24T03:25:38Z

@DN6
do you think this is something we should add to every video pipeline?

DN6 · 2024-01-24T18:14:30Z

So FreeInit can be used with the following models that we're about to add (they're all based on AnimateDiff) #6289 #6328 and #6698

I haven't checked the results for T2VidSynth, but the results for AnimateDiff based models is quite good. Here's an example of PIA generation with (left) and without (right)

I believe I2VGenXL (#6665) is based on modelscope-damo-text-to-video-synthesis so I think FreeInit can be applied there as well. For now I would say we keep the Mixin limited to just AnimateDiff based models. Although we might need to think about the design a little bit more.

yiyixuxu · 2024-01-24T18:50:30Z

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?

This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

patrickvonplaten · 2024-01-26T12:33:03Z

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?

This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

I'm also not a big fan of the pipeline being much more unreadable, but having a whole new pipeline might also be quite bad for discoverability. Think putting in the same pipeline is still the better of the (not so good) options. Is there any way we can move it into the same pipeline while keeping the code easy-to-understand / clean?

patrickvonplaten · 2024-01-26T12:33:42Z

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

Agree. As long as it's only in one pipeline, no need for a mixin. I'd say we should only do mixins if it applies for 5+ pipelines (or if we're sure it applies to every video pipeline)

a-r-r-o-w · 2024-01-26T12:52:05Z

Agree. As long as it's only in one pipeline, no need for a mixin. I'd say we should only do mixins if it applies for 5+ pipelines (or if we're sure it applies to every video pipeline)

I personally don't think it would be a good decision to maintain separate pipelines as FreeInit is a technique/feature just like FreeU. It is quite general and can be applied to multiple video generation strategies. I can definitely vouch for the usage of FreeInit for the quality improvement it provides. At the moment, we can apply this technique directly to AnimateDiff (t2v, i2v, v2v, controlnet, PIA) and TextToVideoSynth. I'm yet to test with SVD.

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?
This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

I'm also not a big fan of the pipeline being much more unreadable, but having a whole new pipeline might also be quite bad for discoverability. Think putting in the same pipeline is still the better of the (not so good) options. Is there any way we can move it into the same pipeline while keeping the code easy-to-understand / clean?

I think we need to try and refactor this so all the mathematical parts are abstracted away and we are left with just readable code. There is not much to FreeInit:

Extra loop
Each iteration of the loop "cleans" up latents based on certain interpretation it draws from intermediate video generations
Some mathematical magic to apply filters on the latents

The latter two can be easily moved into a mixin that no one really needs to look into unless they'd like to understand FreeInit. What's left is finding a good way to incorporate the extra loop that needs access to a few denoising arguments.

HuggingFaceDocBuilderDev · 2024-01-26T13:23:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6 · 2024-01-26T17:42:42Z

@yiyixuxu @patrickvonplaten Is the readability issue mainly due to the fact that there are two denoising loops that can run within the pipeline?

I do think it's worth exploring a bit how FreeInit can be incorporated here because the quality gains are definitely noticeable.

yiyixuxu · 2024-01-26T18:51:17Z

Is the readability issue mainly due to the fact that there are two denoising loops that can run within the pipeline?

personally, I found our pipelines to be very readable because you can always get a good idea about what's going on by quickly reading through the steps in the __call__ method. We abstracted some of the details away with methods such as prepare_latents, encode_prompt but they are usually self-exploratory, and their definitions are pretty consistent across all the pipelines so that users don't really need to look into them when trying to understand the pipeline

this is just not the case here though. I think it would be ok to have a nested denoising loop structure inside __call__ and abstract away the freeinit related code as much as possible just like @a-r-r-o-w proposed:) - but would it work when freeinit is diabled?

yiyixuxu · 2024-01-26T19:02:19Z

also, how much is freeinit used? i.e. it's slower but more people would rather use it than without it?

yiyixuxu · 2024-01-27T05:41:53Z

Would something like this work?

num_free_inti_iters = self._free_init_num_Iters if self.free_init_enabled else 1

for i in range(num_free_init_iters):
       if self.free_init_enabled:
               latents, num_warmup_steps = self.apply_free_init(..)
       with self.pregress_gar(total=num_Inference_steps) as progress_bar:
               ....
               (our regular denoising loop)
               ...

a-r-r-o-w · 2024-02-04T16:42:24Z

This is how I had it originally in the PR that added freeinit support but we didn't like the idea at the time. I feel nesting the loop like this might limit how changes/additions are done in the future? Hmm, I will think of a better way to refactor this.

DN6 · 2024-02-06T12:27:55Z

I think @yiyixuxu's primary concern here is that the video pipelines __call__ looks very different now with the two inference functions if we use FreeInit as is. Which is fair. I think trying to keep all the pipelines looking as similar as possible does help readability.

I think we can revisit the idea of the nested loop. I opened a PR: #6874 with a proposal that takes into account both your concerns @a-r-r-o-w and @yiyixuxu.

LMK what you think.

a-r-r-o-w · 2024-02-07T15:35:56Z

In light of 6874, we can close this as it solves the problem mostly. We can revisit the refactoring of the outer loop introduced by FreeInit if we have future changes that build on something similar as mentioned by Yiyi in 6874's review. Thanks

refactor freeinit

65f218d

a-r-r-o-w added 3 commits January 23, 2024 11:52

Merge branch 'main' into freeinit-rm-gen

4b91c32

free init mixin

bef9dcc

update

605de4f

yiyixuxu mentioned this pull request Feb 5, 2024

[Design] better design to switch pipelines for already loaded pipeline #6531

Closed

DN6 mentioned this pull request Feb 6, 2024

[Refactor] FreeInit for AnimateDiff based pipelines #6874

Merged

6 tasks

a-r-r-o-w closed this Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] FreeInit #6644

[refactor] FreeInit #6644

a-r-r-o-w commented Jan 19, 2024

a-r-r-o-w commented Jan 19, 2024 •

edited

Loading

DN6 commented Jan 22, 2024

a-r-r-o-w commented Jan 23, 2024

yiyixuxu commented Jan 24, 2024

a-r-r-o-w commented Jan 24, 2024 •

edited

Loading

yiyixuxu commented Jan 24, 2024

DN6 commented Jan 24, 2024 •

edited

Loading

yiyixuxu commented Jan 24, 2024 •

edited

Loading

patrickvonplaten commented Jan 26, 2024

patrickvonplaten commented Jan 26, 2024

a-r-r-o-w commented Jan 26, 2024

HuggingFaceDocBuilderDev commented Jan 26, 2024

DN6 commented Jan 26, 2024

yiyixuxu commented Jan 26, 2024 •

edited

Loading

yiyixuxu commented Jan 26, 2024

yiyixuxu commented Jan 27, 2024

a-r-r-o-w commented Feb 4, 2024

DN6 commented Feb 6, 2024 •

edited

Loading

a-r-r-o-w commented Feb 7, 2024

[refactor] FreeInit #6644

[refactor] FreeInit #6644

Conversation

a-r-r-o-w commented Jan 19, 2024

What does this PR do?

Who can review?

a-r-r-o-w commented Jan 19, 2024 • edited Loading

DN6 commented Jan 22, 2024

a-r-r-o-w commented Jan 23, 2024

yiyixuxu commented Jan 24, 2024

a-r-r-o-w commented Jan 24, 2024 • edited Loading

yiyixuxu commented Jan 24, 2024

DN6 commented Jan 24, 2024 • edited Loading

yiyixuxu commented Jan 24, 2024 • edited Loading

patrickvonplaten commented Jan 26, 2024

patrickvonplaten commented Jan 26, 2024

a-r-r-o-w commented Jan 26, 2024

HuggingFaceDocBuilderDev commented Jan 26, 2024

DN6 commented Jan 26, 2024

yiyixuxu commented Jan 26, 2024 • edited Loading

yiyixuxu commented Jan 26, 2024

yiyixuxu commented Jan 27, 2024

a-r-r-o-w commented Feb 4, 2024

DN6 commented Feb 6, 2024 • edited Loading

a-r-r-o-w commented Feb 7, 2024

a-r-r-o-w commented Jan 19, 2024 •

edited

Loading

a-r-r-o-w commented Jan 24, 2024 •

edited

Loading

DN6 commented Jan 24, 2024 •

edited

Loading

yiyixuxu commented Jan 24, 2024 •

edited

Loading

yiyixuxu commented Jan 26, 2024 •

edited

Loading

DN6 commented Feb 6, 2024 •

edited

Loading