Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add imagic to community pipelines #958

Merged

Conversation

MarkRich
Copy link
Contributor

@MarkRich MarkRich commented Oct 24, 2022

Part of #895 and bigger story #841

Followed the rough code / parameters given here: https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb.

A few notes for reviews:

  • Seems like a bad idea to modify the stable diffusion pipeline to take text embeddings; that said it wasn't clear how we plan to re-use this pipeline given these custom generated embeddings. Any advice on how to pass these in? Or is modification given reasonable?
  • Also seems like a bad idea to define the learning rates where I do. That said this is a bit of an odd stable diffusion pipeline since we're training in the __call__, so not sure where these are expected to go.
  • Similar comment as in Add Composable diffusion to community pipeline examples  #951: Test condition given in readme doesn't work locally and need to make a few changes to make it work locally, but I expect this is only an issue w/ local testing and/or I am missing something w.r.t. how these scripts are intended to be tested.

Results:

Requires 24gb of vram and takes about 7-10 minutes on a 3090, though apparently it's 30g vram in 5pm on an a100 in original script. So reasonable performance?

Initial Image:
obama

Prompt: "A photo of Barack Obama smiling with a big grin"

Image from just text embedding:
image_from_pure_text_embeds

Image after text embeddings have been optimized:
image_from_modified_text_embeds

Final image at alpha = 0.8
final_image_finetuned_and_embedding_8

Final Image at alpha = 1.5
final_image_finetuned_and_embedding_15

Final Image at alpha = 2.
final_image_finetuned_and_embedding_20

Looking forward to any comments!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 24, 2022

The documentation is not available anymore as the PR was closed or merged.

@patil-suraj patil-suraj self-assigned this Oct 24, 2022
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @MarkRich!

The design is generally fine with me! Also related to #955 - seems like there are multiple use cases for custom text_embeddings already

@patil-suraj could you do a more in-depth review?

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool @MarkRich , thanks a lot for adding the feature

Th pr looking really good! Just left a few nits.

And I'm not sure yet, if we should modify the StableDiffusionPipeline to allow text_embeddinsg, we are discussing it here #955

For now, since we are adding a custom pipeline, I would suggest we could add to functions to the pipeline.

  1. pipeline.train to train the embeddings and mode
  2. pipeline.__call__ or pipeline.generate to generate the images.

wdyt @patrickvonplaten

examples/community/README.md Outdated Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
Comment on lines +232 to +223
optimizer = torch.optim.Adam(
[text_embeddings], # only optimize the embeddings
lr=embedding_learning_rate,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also allow the option to use 8 but optimizer

examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
@MarkRich MarkRich force-pushed the add-imagic-to-community-pipelines branch from 7c97305 to 39d7d9e Compare October 27, 2022 17:27
@MarkRich
Copy link
Contributor Author

Addressed all your comments @patil-suraj aside from 8-bit optimization which may take slightly longer for to instrument due to an unrelated error. Let me know if you have any other comments!

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing the comments @MarkRich !Looks good, will give it try now and then merge soon :)

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example works great, I just two more comments, then it should be good to merge :)

examples/community/imagic_stable_diffusion.py Outdated Show resolved Hide resolved
examples/community/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks a lot :-)
@patil-suraj feel free to merge whenever

@MarkRich MarkRich force-pushed the add-imagic-to-community-pipelines branch from f499721 to a3448a3 Compare November 1, 2022 03:28
@MarkRich
Copy link
Contributor Author

MarkRich commented Nov 1, 2022

Updated with comments from the review; should be good to go!

@patil-suraj
Copy link
Contributor

Thanks a lot @MarkRich ! The tests failures are unrelated, merging!

@patil-suraj patil-suraj merged commit a793b1f into huggingface:main Nov 1, 2022
@zhongyi-zhou
Copy link

@MarkRich Thanks for the amazing code!
I have one question on this line:

noise_pred = self.unet(noisy_latents, timesteps, text_embeddings).sample

Why it is text_embeddings instead of text_embeddings_orig?

According to the Imagic paper, in the "model fine-tuning" stage, it says

  • "We fine-tune them with the same reconstruction loss, but conditioned on $e_{tgt}$, as $e_{opt}$ is optimized for the base model only" (Page 5; the second column)
    P 3M N LB(2R}$MPUMVKC9

From my understanding $e_{tgt}$ is text_embeddings_orig and $e_{opt}$ is text_embeddings in this code.
In practice, these two values should be very similar, and probably that's the reason why the code still work amazingly well.

Please correct me if I am wrong. Thanks!

@shiranzada
Copy link

shiranzada commented Mar 20, 2023

Thanks @zhongyi-zhou, we condition on e_tgt during finetuning only for the super resolution models of Imagen. This part is not relevant for Stable Diffusion.

For the base model (Imagen-64 or LDM) we condition on e_opt during finetuning rather than on e_tgt in order to overfit the image (for e_opt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants