-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add imagic to community pipelines #958
Add imagic to community pipelines #958
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @MarkRich!
The design is generally fine with me! Also related to #955 - seems like there are multiple use cases for custom text_embeddings
already
@patil-suraj could you do a more in-depth review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool @MarkRich , thanks a lot for adding the feature
Th pr looking really good! Just left a few nits.
And I'm not sure yet, if we should modify the StableDiffusionPipeline
to allow text_embeddinsg, we are discussing it here #955
For now, since we are adding a custom pipeline, I would suggest we could add to functions to the pipeline.
pipeline.train
to train the embeddings and modepipeline.__call__
orpipeline.generate
to generate the images.
wdyt @patrickvonplaten
optimizer = torch.optim.Adam( | ||
[text_embeddings], # only optimize the embeddings | ||
lr=embedding_learning_rate, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also allow the option to use 8 but optimizer
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
7c97305
to
39d7d9e
Compare
Addressed all your comments @patil-suraj aside from 8-bit optimization which may take slightly longer for to instrument due to an unrelated error. Let me know if you have any other comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing the comments @MarkRich !Looks good, will give it try now and then merge soon :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example works great, I just two more comments, then it should be good to merge :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks a lot :-)
@patil-suraj feel free to merge whenever
…on for imagic pipeline
f499721
to
a3448a3
Compare
Updated with comments from the review; should be good to go! |
Thanks a lot @MarkRich ! The tests failures are unrelated, merging! |
@MarkRich Thanks for the amazing code!
Why it is According to the Imagic paper, in the "model fine-tuning" stage, it says
From my understanding Please correct me if I am wrong. Thanks! |
Thanks @zhongyi-zhou, we condition on e_tgt during finetuning only for the super resolution models of Imagen. This part is not relevant for Stable Diffusion. For the base model (Imagen-64 or LDM) we condition on e_opt during finetuning rather than on e_tgt in order to overfit the image (for e_opt) |
Part of #895 and bigger story #841
Followed the rough code / parameters given here: https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb.
A few notes for reviews:
__call__
, so not sure where these are expected to go.Results:
Requires 24gb of vram and takes about 7-10 minutes on a 3090, though apparently it's 30g vram in 5pm on an a100 in original script. So reasonable performance?
Initial Image:
Prompt: "A photo of Barack Obama smiling with a big grin"
Image from just text embedding:
Image after text embeddings have been optimized:
Final image at alpha = 0.8
Final Image at alpha = 1.5
Final Image at alpha = 2.
Looking forward to any comments!