-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use 🧨diffusers model #1384
use 🧨diffusers model #1384
Conversation
and hey, I already hit the first obstacle to using a stock diffusers pipeline: the stock pipelines take in the prompt as text, but Invoke does its own handling of the text and wants to pass in the data for the CLIP text embeddings instead. This is fine, diffusers pretty much expects most applications doing anything interesting will hit the point of needing to customize their pipeline anyway. It just means a bit more code is required in order to get even the basic proof-of-concept up. |
Very cool to see that |
Patrick, why do you say that almost like it's a surprise? 😄 Was serving as an application backend not the plan for diffusers all along? Don't make me second-guess myself here. It'll make me look bad in front of the Invoke devs! 🙈 As for what diffusers could do to help, a fine place to start would be the refactoring the StableDiffusionPipeline to aid reusability and extensibility: huggingface/diffusers#551 (comment) |
I've pushed a proof of concept for txt2img. It is super rough, but it does succeed in producing an image for a prompt. I've updated this PR's main description with a checklist of things we need to do to support it for real. |
Haha that sounds good - we've starting factoring out methods as done in this PR: huggingface/diffusers#1224 - the |
and update associated things in Generate & Generator to not instantly fail when that happens
Update: made model loading much better. made output much worse. Like no-longer-recognizable worse. But I committed anyway because it does run, and it's so much easier to fiddle with now that it's not taking extra gigabytes of RAM. I suspect this implementation of InvokeAI/ldm/invoke/generator/diffusers_pipeline.py Lines 327 to 333 in b39d04d
but maybe it's something else, like |
fixed! I didn't notice it was making 256px images instead of 512. |
Added initial support for switching schedulers. Some of them look like they need further configuration. |
Remove IPNDM scheduler; it is not behaving.
Found the missing bit. k_lms and k_euler schedulers fixed. |
8ec60b3
to
fdf2ed2
Compare
# Conflicts: # ldm/invoke/model_cache.py # setup.py
The current test failure seems to be the same as the failure in |
We get to remove some code by using methods that were factored out in the base class.
# Conflicts: # ldm/invoke/generator/diffusers_pipeline.py
now that we can use it directly from diffusers 0.8.1
Pushed support for img2img. Seems to be working, at least with DDIM. LMS and Euler don't do so well. Might be a few things to follow up on to get proper reproducible-with-seed results. |
# Conflicts: # .github/workflows/test-invoke-conda.yml
2712af1
to
afae108
Compare
The RunwayML models still do.
# Conflicts: # ldm/invoke/generator/base.py # ldm/invoke/generator/inpaint.py # ldm/invoke/generator/omnibus.py
Models in the CompVis and stabilityai repos no longer require them. (But runwayml still does.)
# Conflicts: # .github/workflows/test-invoke-conda.yml # .github/workflows/test-invoke-pip.yml
# Conflicts: # environments-and-requirements/requirements-base.txt
→ Moved to #1583[Can't change the working branch of an existing PR.] |
→ Moved to #1583
[Can't change the working branch of an existing PR.]
I think the plan is that we keep the public APIs in
ldm.invoke.generator
stable while swapping out the implementations to be diffusers-based.That looks like it'll be primarily in the
make_image
methods of those Generators.It might be possible to split things up by the different tasks (txt2img, inpainting, etc) to separate PRs? Which I'll be in favor of if that makes smaller PRs, but I don't know yet whether that will help that much.
Usage
Add a section to your
models.yaml
like this:Note the
format: diffusers
.The
repo_name
is as it appears on huggingface.co.To Do: txt2img
free_gpu_mem
with features from 🤗 Accelerateinvoke_ai_web_server
. Not sure if the other instances are still in use?threshold
models.yaml.example
configure_invokeai
(formerlypreload_models
)To Do: txt2img
waiting on upstream diffusers
models.diffusion.cross_attention_control
might be an obstacle, as that's not in stockdiffusers
yet and it meddles with some internals. The prompt-to-prompt authors do have a reference implementation that uses diffusers: https://nbviewer.org/github/google/prompt-to-prompt/blob/main/prompt-to-prompt_stable.ipynbTo Do: inpainting
discussion thread: https://discord.com/channels/1020123559063990373/1031668022294884392