Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] FreeInit #6644

Closed
wants to merge 4 commits into from
Closed

Conversation

a-r-r-o-w
Copy link
Member

What does this PR do?

Removes the additional generator that was added for freeinit and reduces LoC a bit. I figured the second generator wasn't required and you can still get consistent generation by just using the one passed for inference. Results are the same as before:

Who can review?

@DN6

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Jan 19, 2024

@DN6 @sayakpaul In light of #6571, I propose moving freeinit and future quality/efficiency-related utilities into separate mixins (Sayak said something similar in his review comment in 6571). As this technique can be applied to any video pipeline for better consistency (at the cost of a few more sampling iterations), I think it would be good to add support to the major ones. Moving it to a separate mixin will greatly reduce the number of copied lines and the only changes that would be required would be abstracting the denoise loop and adding an if-statement. What do you think?

@DN6
Copy link
Collaborator

DN6 commented Jan 22, 2024

@a-r-r-o-w Works for me. You can define the mixin inside the animatediff pipeline module.

@a-r-r-o-w
Copy link
Member Author

@DN6 What do you think about the refactor? To use FreeInit in other pipelines, the following changes would be required:

  • Add FreeInitMixin to base classes the pipeline derives from.
  • Abstract the denoising loop into separate method, and branch to the freeinit loop if enabled or otherwise default denoise loop. Note that the current implementation of FreeInitMixin has a magical call to self._denoise_loop and is not explicit. We should probably rethink this.
  • denoise_args is a dict required to hold parameters to the actual denoising loop. This seems okay because we know what arguments the actual denoise loop would take. FreeInit only requires that it contains latents, timesteps and num_inference_steps at least.

@yiyixuxu
Copy link
Collaborator

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Jan 24, 2024

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

The reason for adding the mixin is that the technique can be applied to all AnimateDiff, TextToVideoSynth, Modelscope and maybe other video models that are currently or will be supported by diffusers. At the cost of few extra steps, freeinit allows much more temporally coherent videos to be generated and I've been using it on ComfyUI for a while now, and I believe many others are too. Combined with sliding window context support for AnimateDiff, you can generate really high quality, temporally coherent and long videos without random jumpy-ness.

The other way to go about adding it to different pipelines, if it's okay on the diffusers teams' end, would be to copy-paste freeinit related logic but IMO it would lead to a lot of code duplication (~400 lines per pipeline), which seems quite unnecessary.

@yiyixuxu
Copy link
Collaborator

@DN6
do you think this is something we should add to every video pipeline?

@DN6
Copy link
Collaborator

DN6 commented Jan 24, 2024

So FreeInit can be used with the following models that we're about to add (they're all based on AnimateDiff) #6289 #6328 and #6698

I haven't checked the results for T2VidSynth, but the results for AnimateDiff based models is quite good. Here's an example of PIA generation with (left) and without (right)
freeint-vs-default

I believe I2VGenXL (#6665) is based on modelscope-damo-text-to-video-synthesis so I think FreeInit can be applied there as well. For now I would say we keep the Mixin limited to just AnimateDiff based models. Although we might need to think about the design a little bit more.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Jan 24, 2024

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?

This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

@patrickvonplaten
Copy link
Contributor

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?

This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

I'm also not a big fan of the pipeline being much more unreadable, but having a whole new pipeline might also be quite bad for discoverability. Think putting in the same pipeline is still the better of the (not so good) options. Is there any way we can move it into the same pipeline while keeping the code easy-to-understand / clean?

@patrickvonplaten
Copy link
Contributor

We don't need to introduce a mixin that is just used by one model/pipeline IMO

cc @patrickvonplaten here

Agree. As long as it's only in one pipeline, no need for a mixin. I'd say we should only do mixins if it applies for 5+ pipelines (or if we're sure it applies to every video pipeline)

@a-r-r-o-w
Copy link
Member Author

Agree. As long as it's only in one pipeline, no need for a mixin. I'd say we should only do mixins if it applies for 5+ pipelines (or if we're sure it applies to every video pipeline)

I personally don't think it would be a good decision to maintain separate pipelines as FreeInit is a technique/feature just like FreeU. It is quite general and can be applied to multiple video generation strategies. I can definitely vouch for the usage of FreeInit for the quality improvement it provides. At the moment, we can apply this technique directly to AnimateDiff (t2v, i2v, v2v, controlnet, PIA) and TextToVideoSynth. I'm yet to test with SVD.

does it make sense to make separate pipelines for freeinit? the denoising loop is different, no?
This pipeline is a lot harder to read now compared with the last time when I looked at it without the freeinit.

I'm also not a big fan of the pipeline being much more unreadable, but having a whole new pipeline might also be quite bad for discoverability. Think putting in the same pipeline is still the better of the (not so good) options. Is there any way we can move it into the same pipeline while keeping the code easy-to-understand / clean?

I think we need to try and refactor this so all the mathematical parts are abstracted away and we are left with just readable code. There is not much to FreeInit:

  • Extra loop
  • Each iteration of the loop "cleans" up latents based on certain interpretation it draws from intermediate video generations
  • Some mathematical magic to apply filters on the latents

The latter two can be easily moved into a mixin that no one really needs to look into unless they'd like to understand FreeInit. What's left is finding a good way to incorporate the extra loop that needs access to a few denoising arguments.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@DN6
Copy link
Collaborator

DN6 commented Jan 26, 2024

@yiyixuxu @patrickvonplaten Is the readability issue mainly due to the fact that there are two denoising loops that can run within the pipeline?

I do think it's worth exploring a bit how FreeInit can be incorporated here because the quality gains are definitely noticeable.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Jan 26, 2024

Is the readability issue mainly due to the fact that there are two denoising loops that can run within the pipeline?

personally, I found our pipelines to be very readable because you can always get a good idea about what's going on by quickly reading through the steps in the __call__ method. We abstracted some of the details away with methods such as prepare_latents, encode_prompt but they are usually self-exploratory, and their definitions are pretty consistent across all the pipelines so that users don't really need to look into them when trying to understand the pipeline

this is just not the case here though. I think it would be ok to have a nested denoising loop structure inside __call__ and abstract away the freeinit related code as much as possible just like @a-r-r-o-w proposed:) - but would it work when freeinit is diabled?

@yiyixuxu
Copy link
Collaborator

also, how much is freeinit used? i.e. it's slower but more people would rather use it than without it?

@yiyixuxu
Copy link
Collaborator

Would something like this work?

num_free_inti_iters = self._free_init_num_Iters if self.free_init_enabled else 1

for i in range(num_free_init_iters):
       if self.free_init_enabled:
               latents, num_warmup_steps = self.apply_free_init(..)
       with self.pregress_gar(total=num_Inference_steps) as progress_bar:
               ....
               (our regular denoising loop)
               ...
   

@a-r-r-o-w
Copy link
Member Author

This is how I had it originally in the PR that added freeinit support but we didn't like the idea at the time. I feel nesting the loop like this might limit how changes/additions are done in the future? Hmm, I will think of a better way to refactor this.

@DN6
Copy link
Collaborator

DN6 commented Feb 6, 2024

I think @yiyixuxu's primary concern here is that the video pipelines __call__ looks very different now with the two inference functions if we use FreeInit as is. Which is fair. I think trying to keep all the pipelines looking as similar as possible does help readability.

I think we can revisit the idea of the nested loop. I opened a PR: #6874 with a proposal that takes into account both your concerns @a-r-r-o-w and @yiyixuxu.

LMK what you think.

@a-r-r-o-w
Copy link
Member Author

In light of 6874, we can close this as it solves the problem mostly. We can revisit the refactoring of the outer loop introduced by FreeInit if we have future changes that build on something similar as mentioned by Yiyi in 6874's review. Thanks

@a-r-r-o-w a-r-r-o-w closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants