Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use 🧨diffusers model #1384

Closed
wants to merge 63 commits into from
Closed

Conversation

keturn
Copy link
Contributor

@keturn keturn commented Nov 5, 2022

→ Moved to #1583

[Can't change the working branch of an existing PR.]


I think the plan is that we keep the public APIs in ldm.invoke.generator stable while swapping out the implementations to be diffusers-based.

That looks like it'll be primarily in the make_image methods of those Generators.

It might be possible to split things up by the different tasks (txt2img, inpainting, etc) to separate PRs? Which I'll be in favor of if that makes smaller PRs, but I don't know yet whether that will help that much.

Test invoke.py

Usage

Add a section to your models.yaml like this:

diffusers-1.5:
  description: Diffusers version of Stable Diffusion version 1.5
  format: diffusers
  repo_name: runwayml/stable-diffusion-v1-5

Note the format: diffusers.
The repo_name is as it appears on huggingface.co.

To Do: txt2img

  • don't load redundant models! (i.e. both the ckpt and the diffusers formats)
  • allow scheduler selection
  • support extra scheduler parameters (e.g. DDIM's eta).
  • honor float16 setting for loading model
  • honor free_gpu_mem with features from 🤗 Accelerate
  • update callback users with the new signature.
    • at least done for invoke_ai_web_server. Not sure if the other instances are still in use?
  • fix prompt fragment weighting. Refer to WeightedFrozenCLIPEmbedder.
  • honor threshold
  • honor safety-checker setting
  • update models.yaml.example
  • update configure_invokeai (formerly preload_models)

To Do: txt2img

  • make sure we use the correct seeded noise

waiting on upstream diffusers

To Do: inpainting

discussion thread: https://discord.com/channels/1020123559063990373/1031668022294884392

@keturn
Copy link
Contributor Author

keturn commented Nov 5, 2022

and hey, I already hit the first obstacle to using a stock diffusers pipeline: the stock pipelines take in the prompt as text, but Invoke does its own handling of the text and wants to pass in the data for the CLIP text embeddings instead.

This is fine, diffusers pretty much expects most applications doing anything interesting will hit the point of needing to customize their pipeline anyway. It just means a bit more code is required in order to get even the basic proof-of-concept up.

@patrickvonplaten
Copy link
Contributor

Very cool to see that diffusers can be useful to serve as a backend for this library. If you need any help with the migration or require additional features, we're very open to help 🤗

@keturn
Copy link
Contributor Author

keturn commented Nov 9, 2022

Patrick, why do you say that almost like it's a surprise? 😄 Was serving as an application backend not the plan for diffusers all along? Don't make me second-guess myself here. It'll make me look bad in front of the Invoke devs! 🙈

As for what diffusers could do to help, a fine place to start would be the refactoring the StableDiffusionPipeline to aid reusability and extensibility: huggingface/diffusers#551 (comment)

@keturn
Copy link
Contributor Author

keturn commented Nov 9, 2022

I've pushed a proof of concept for txt2img. It is super rough, but it does succeed in producing an image for a prompt.

I've updated this PR's main description with a checklist of things we need to do to support it for real.

@patrickvonplaten
Copy link
Contributor

ed a proof of concept for txt2img. It is super rough, but it does succeed in producing an image for a prompt.

I've updated this PR's main description with a checklist of things we need to do to support it for real.

Haha that sounds good - we've starting factoring out methods as done in this PR: huggingface/diffusers#1224 - the __call__ method should get cleaner bit by bit ;-)

@keturn
Copy link
Contributor Author

keturn commented Nov 10, 2022

Update: made model loading much better. made output much worse.

Like no-longer-recognizable worse. But I committed anyway because it does run, and it's so much easier to fiddle with now that it's not taking extra gigabytes of RAM.

I suspect this implementation of get_learned_conditionings:

text_fragments = c[0]
text_input = self._tokenize(text_fragments)
with torch.inference_mode():
token_ids = text_input.input_ids.to(self.text_encoder.device)
text_embeddings = self.text_encoder(token_ids)[0]
return text_embeddings, text_input.input_ids

but maybe it's something else, like configure_model_padding.

@keturn
Copy link
Contributor Author

keturn commented Nov 10, 2022

fixed! I didn't notice it was making 256px images instead of 512.

@keturn
Copy link
Contributor Author

keturn commented Nov 10, 2022

Added initial support for switching schedulers. Some of them look like they need further configuration.

@keturn
Copy link
Contributor Author

keturn commented Nov 10, 2022

Found the missing bit. k_lms and k_euler schedulers fixed.

@keturn
Copy link
Contributor Author

keturn commented Nov 23, 2022

The current test failure seems to be the same as the failure in development rather than anything specific to this PR.

We get to remove some code by using methods that were factored out in the base class.
# Conflicts:
#	ldm/invoke/generator/diffusers_pipeline.py
now that we can use it directly from diffusers 0.8.1
@keturn
Copy link
Contributor Author

keturn commented Nov 24, 2022

Pushed support for img2img. Seems to be working, at least with DDIM. LMS and Euler don't do so well.

Might be a few things to follow up on to get proper reproducible-with-seed results.

@keturn keturn mentioned this pull request Nov 27, 2022
31 tasks
@keturn
Copy link
Contributor Author

keturn commented Nov 27, 2022

→ Moved to #1583

[Can't change the working branch of an existing PR.]

@keturn keturn closed this Nov 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants