From 35b0799263202a628ca6f2ea42a889607ce4e49c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=2E=20Tolga=20Cang=C3=B6z?= <mtcangoz@gmail.com> Date: Sat, 9 Dec 2023 09:49:20 +0300 Subject: [PATCH] Fix typos and trim trailing whitespaces --- README.md | 4 +- examples/README.md | 10 +- examples/community/README.md | 176 +++++++++--------- examples/research_projects/README.md | 4 +- .../README.md | 10 +- .../multi_token_clip.py | 0 .../requirements.txt | 0 .../requirements_flax.txt | 0 .../textual_inversion.py | 0 .../textual_inversion_flax.py | 0 .../research_projects/onnxruntime/README.md | 2 +- .../onnxruntime/text_to_image/README.md | 4 +- .../onnxruntime/textual_inversion/README.md | 8 +- examples/research_projects/realfill/README.md | 2 +- .../research_projects/sdxl_flax/README.md | 18 +- examples/t2i_adapter/README_sdxl.md | 4 +- examples/text_to_image/README.md | 30 +-- examples/textual_inversion/README.md | 12 +- .../unconditional_image_generation/README.md | 6 +- examples/wuerstchen/text_to_image/README.md | 2 +- src/diffusers/pipelines/README.md | 52 +++--- ...inous_encoder.py => continuous_encoder.py} | 0 .../pipeline_spectrogram_diffusion.py | 2 +- .../pipelines/stable_diffusion/README.md | 22 +-- 24 files changed, 183 insertions(+), 185 deletions(-) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/README.md (97%) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/multi_token_clip.py (100%) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/requirements.txt (100%) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/requirements_flax.txt (100%) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/textual_inversion.py (100%) rename examples/research_projects/{mulit_token_textual_inversion => multi_token_textual_inversion}/textual_inversion_flax.py (100%) rename src/diffusers/pipelines/spectrogram_diffusion/{continous_encoder.py => continuous_encoder.py} (100%) diff --git a/README.md b/README.md index 489e0d154af2..b52813d7e1a7 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi ## Quickstart -Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 15000+ checkpoints): +Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 16000+ checkpoints): ```python from diffusers import DiffusionPipeline @@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9 - https://github.com/deep-floyd/IF - https://github.com/bentoml/BentoML - https://github.com/bmaltais/kohya_ss -- +6000 other amazing GitHub repositories 💪 +- +7000 other amazing GitHub repositories 💪 Thank you for using us ❤️. diff --git a/examples/README.md b/examples/README.md index f0d8a6bb57f0..8bcde2d84f1c 100644 --- a/examples/README.md +++ b/examples/README.md @@ -18,8 +18,7 @@ limitations under the License. Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library for a variety of use cases involving training or fine-tuning. -**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, -please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines). +**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines). Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. More specifically, this means: @@ -27,11 +26,10 @@ More specifically, this means: - **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script. - **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. - **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. -- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling -point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. +- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. We provide **official** examples that cover the most popular tasks of diffusion models. -*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. +*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you! Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: @@ -39,7 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie | Task | 🤗 Accelerate | 🤗 Datasets | Colab |---|---|:---:|:---:| | [**Unconditional Image Generation**](./unconditional_image_generation) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ | +| [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ | | [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | [**ControlNet**](./controlnet) | ✅ | ✅ | - diff --git a/examples/community/README.md b/examples/community/README.md index 78a89acf7a57..e121e68bc9ce 100755 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -8,13 +8,13 @@ If a community doesn't work as expected, please open an issue and ping the autho | Example | Description | Code Example | Colab | Author | |:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:| -| LLM-grounded Diffusion (LMD+) | LMD greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. [Project page.](https://llm-grounded-diffusion.github.io/) [See our full codebase (also with diffusers).](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [LLM-grounded Diffusion (LMD+)](#llm-grounded-diffusion) | [Huggingface Demo](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SXzMSeAB-LJYISb2yrUOdypLz4OYWUKj) | [Long (Tony) Lian](https://tonylian.com/) | -| CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) | +| LLM-grounded Diffusion (LMD+) | LMD greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. [Project page.](https://llm-grounded-diffusion.github.io/) [See our full codebase (also with diffusers).](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [LLM-grounded Diffusion (LMD+)](#llm-grounded-diffusion) | [Huggingface Demo](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SXzMSeAB-LJYISb2yrUOdypLz4OYWUKj) | [Long (Tony) Lian](https://tonylian.com/) | +| CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) | | One Step U-Net (Dummy) | Example showcasing of how to use Community Pipelines (see https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) | | Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | - | [SkyTNT](https://github.com/SkyTNT) | -| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) +| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) | Wild Card Stable Diffusion | Stable Diffusion Pipeline that supports prompts that contain wildcard terms (indicated by surrounding double underscores), with values instantiated randomly from a corresponding txt file or a dictionary of possible values | [Wildcard Stable Diffusion](#wildcard-stable-diffusion) | - | [Shyam Sudhakaran](https://github.com/shyamsn97) | | [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) | Stable Diffusion Pipeline that supports prompts that contain "|" in prompts (as an AND condition) and weights (separated by "|" as well) to positively / negatively weight prompts. | [Composable Stable Diffusion](#composable-stable-diffusion) | - | [Mark Rich](https://github.com/MarkRich) | | Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | - | [Mark Rich](https://github.com/MarkRich) | @@ -24,27 +24,27 @@ If a community doesn't work as expected, please open an issue and ping the autho | Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) | | Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) | | K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | -| Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | +| Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | Stable Diffusion v1.1-1.4 Comparison | Run all 4 model checkpoints for Stable Diffusion and compare their results together | [Stable Diffusion Comparison](#stable-diffusion-comparisons) | - | [Suvaditya Mukherjee](https://github.com/suvadityamuk) | MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt | [MagicMix](#magic-mix) | - | [Partho Das](https://github.com/daspartho) | | Stable UnCLIP | Diffusion Pipeline for combining prior model (generate clip image embedding from text, UnCLIPPipeline `"kakaobrain/karlo-v1-alpha"`) and decoder pipeline (decode clip image embedding to image, StableDiffusionImageVariationPipeline `"lambdalabs/sd-image-variations-diffusers"` ). | [Stable UnCLIP](#stable-unclip) | - | [Ray Wang](https://wrong.wang) | -| UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | -| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | +| UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | +| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - | [Aengus (Duc-Anh)](https://github.com/aengusng8) | -| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | +| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | | TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | -| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) | -| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.0986) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) | +| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) | +| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.0986) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) | | TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | -| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) | -| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) | +| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) | +| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) | | TensorRT Stable Diffusion Inpainting Pipeline | Accelerates the Stable Diffusion Inpainting Pipeline using TensorRT | [TensorRT Stable Diffusion Inpainting Pipeline](#tensorrt-inpainting-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | -| IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon) +| IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon) | Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#Zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) | -Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting | [Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline) | - | [Andrew Zhu](https://xhinker.medium.com/) | -FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images | [Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline) | - | [Shauray Singh](https://shauray8.github.io/about_shauray/) | -sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) | -prompt-to-prompt | change parts of a prompt and retain image structure (see [paper page](https://prompt-to-prompt.github.io/)) | [Prompt2Prompt Pipeline](#prompt2prompt-pipeline) | - | [Umer H. Adil](https://twitter.com/UmerHAdil) | +Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting | [Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline) | - | [Andrew Zhu](https://xhinker.medium.com/) | +FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images | [Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline) | - | [Shauray Singh](https://shauray8.github.io/about_shauray/) | +sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) | +prompt-to-prompt | change parts of a prompt and retain image structure (see [paper page](https://prompt-to-prompt.github.io/)) | [Prompt2Prompt Pipeline](#prompt2prompt-pipeline) | - | [Umer H. Adil](https://twitter.com/UmerHAdil) | | Latent Consistency Pipeline | Implementation of [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) | [Latent Consistency Pipeline](#latent-consistency-pipeline) | - | [Simian Luo](https://github.com/luosiallen) | | Latent Consistency Img2img Pipeline | Img2img pipeline for Latent Consistency Models | [Latent Consistency Img2Img Pipeline](#latent-consistency-img2img-pipeline) | - | [Logan Zoellner](https://github.com/nagolinc) | | Latent Consistency Interpolation Pipeline | Interpolate the latent space of Latent Consistency Models with multiple prompts | [Latent Consistency Interpolation Pipeline](#latent-consistency-interpolation-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pK3NrLWJSiJsBynLns1K1-IDTW9zbPvl?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) | @@ -77,7 +77,7 @@ import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( - "longlian/lmd_plus", + "longlian/lmd_plus", custom_pipeline="llm_grounded_diffusion", custom_revision="main", variant="fp16", torch_dtype=torch.float16 @@ -112,7 +112,7 @@ import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( - "longlian/lmd_plus", + "longlian/lmd_plus", custom_pipeline="llm_grounded_diffusion", variant="fp16", torch_dtype=torch.float16 ) @@ -139,7 +139,7 @@ images[0].save("./lmd_plus_generation.jpg") ### CLIP Guided Stable Diffusion -CLIP guided stable diffusion can help to generate more realistic images +CLIP guided stable diffusion can help to generate more realistic images by guiding stable diffusion at every denoising step with an additional CLIP model. The following code requires roughly 12GB of GPU RAM. @@ -159,7 +159,7 @@ guided_pipeline = DiffusionPipeline.from_pretrained( custom_pipeline="clip_guided_stable_diffusion", clip_model=clip_model, feature_extractor=feature_extractor, - + torch_dtype=torch.float16, ) guided_pipeline.enable_attention_slicing() @@ -180,7 +180,7 @@ for i in range(4): generator=generator, ).images[0] images.append(image) - + # save images locally for i, img in enumerate(images): img.save(f"./clip_guided_sd/image_{i}.png") @@ -234,7 +234,7 @@ frame_filepaths = pipe.walk( ) ``` -The output of the `walk(...)` function returns a list of images saved under the folder as defined in `output_dir`. You can use these images to create videos of stable diffusion. +The output of the `walk(...)` function returns a list of images saved under the folder as defined in `output_dir`. You can use these images to create videos of stable diffusion. > **Please have a look at https://github.com/nateraw/stable-diffusion-videos for more in-detail information on how to create videos using stable diffusion as well as more feature-complete functionality.** @@ -310,7 +310,7 @@ import torch pipe = DiffusionPipeline.from_pretrained( 'hakurei/waifu-diffusion', custom_pipeline="lpw_stable_diffusion", - + torch_dtype=torch.float16 ) pipe=pipe.to("cuda") @@ -377,7 +377,7 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( custom_pipeline="speech_to_image_diffusion", speech_model=model, speech_processor=processor, - + torch_dtype=torch.float16, ) @@ -435,7 +435,7 @@ import torch pipe = DiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", custom_pipeline="wildcard_stable_diffusion", - + torch_dtype=torch.float16, ) prompt = "__animal__ sitting on a __object__ wearing a __clothing__" @@ -449,7 +449,7 @@ out = pipe( ) ``` -### Composable Stable diffusion +### Composable Stable diffusion [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) proposes conjunction and negation (negative prompts) operators for compositional generation with conditional diffusion models. @@ -499,7 +499,7 @@ tvu.save_image(grid, f'{prompt}_{args.weights}' + '.png') ``` ### Imagic Stable Diffusion -Allows you to edit an image using stable diffusion. +Allows you to edit an image using stable diffusion. ```python import requests @@ -539,7 +539,7 @@ image = res.images[0] image.save('./imagic/imagic_image_alpha_2.png') ``` -### Seed Resizing +### Seed Resizing Test seed resizing. Originally generate an image in 512 by 512, then generate image with same seed at 512 by 592 using seed resizing. Finally, generate 512 by 592 using original stable diffusion pipeline. ```python @@ -667,14 +667,14 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( detection_pipeline=language_detection_pipeline, translation_model=trans_model, translation_tokenizer=trans_tokenizer, - + torch_dtype=torch.float16, ) diffuser_pipeline.enable_attention_slicing() diffuser_pipeline = diffuser_pipeline.to(device) -prompt = ["a photograph of an astronaut riding a horse", +prompt = ["a photograph of an astronaut riding a horse", "Una casa en la playa", "Ein Hund, der Orange isst", "Un restaurant parisien"] @@ -715,7 +715,7 @@ mask_image = PIL.Image.open(mask_path).convert("RGB").resize((512, 512)) pipe = DiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting", custom_pipeline="img2img_inpainting", - + torch_dtype=torch.float16 ) pipe = pipe.to("cuda") @@ -758,8 +758,8 @@ prompt = "a cup" # the masked out region will be replaced with this image = pipe(image=image, text=text, prompt=prompt).images[0] ``` -### Bit Diffusion -Based https://arxiv.org/abs/2208.04202, this is used for diffusion on discrete data - eg, discreate image data, DNA sequence data. An unconditional discreate image can be generated like this: +### Bit Diffusion +Based https://arxiv.org/abs/2208.04202, this is used for diffusion on discrete data - eg, discreate image data, DNA sequence data. An unconditional discreate image can be generated like this: ```python from diffusers import DiffusionPipeline @@ -837,8 +837,8 @@ Usage:- ```python from diffusers import DiffusionPipeline -#Return a CheckpointMergerPipeline class that allows you to merge checkpoints. -#The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to +#Return a CheckpointMergerPipeline class that allows you to merge checkpoints. +#The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to #merge for convenience pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="checkpoint_merger") @@ -861,16 +861,16 @@ image = merged_pipe(prompt).images[0] ``` Some examples along with the merge details: -1. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" ; Sigmoid interpolation; alpha = 0.8 +1. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" ; Sigmoid interpolation; alpha = 0.8 ![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stability_v1_4_waifu_sig_0.8.png) -2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8 +2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8 ![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/waifu_openjourney_inv_sig_0.8.png) -3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5 +3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5 ![Stable plus Waifu plus openjourney add_diff 0.5](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stable_waifu_openjourney_add_diff_0.5.png) @@ -937,8 +937,8 @@ pipe = DiffusionPipeline.from_pretrained( img = Image.open('phone.jpg') mix_img = pipe( - img, - prompt = 'bed', + img, + prompt = 'bed', kmin = 0.3, kmax = 0.5, mix_factor = 0.5, @@ -1049,7 +1049,7 @@ print(pipeline.prior_scheduler) ### UnCLIP Text Interpolation Pipeline -This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps. +This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps. ```python import torch @@ -1086,7 +1086,7 @@ The resulting images in order:- ### UnCLIP Image Interpolation Pipeline -This Diffusion Pipeline takes two images or an image_embeddings tensor of size 2 and interpolates between their embeddings using spherical interpolation ( slerp ). The input images/image_embeddings are converted to image embeddings by the pipeline's image_encoder and the interpolation is done on the resulting image_embeddings over the number of steps specified. Defaults to 5 steps. +This Diffusion Pipeline takes two images or an image_embeddings tensor of size 2 and interpolates between their embeddings using spherical interpolation ( slerp ). The input images/image_embeddings are converted to image embeddings by the pipeline's image_encoder and the interpolation is done on the resulting image_embeddings over the number of steps specified. Defaults to 5 steps. ```python import torch @@ -1127,8 +1127,8 @@ The resulting images in order:- ![result5](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPImageInterpolationSamples/resolve/main/starry_to_flowers_5.png) ### DDIM Noise Comparative Analysis Pipeline -#### **Research question: What visual concepts do the diffusion models learn from each noise level during training?** -The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution. +#### **Research question: What visual concepts do the diffusion models learn from each noise level during training?** +The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution. The approach consists of the following steps: 1. The input is an image x0. @@ -1170,7 +1170,7 @@ Here is the result of this pipeline (which is DDIM) on CelebA-HQ dataset. ### CLIP Guided Img2Img Stable Diffusion -CLIP guided Img2Img stable diffusion can help to generate more realistic images with an initial image +CLIP guided Img2Img stable diffusion can help to generate more realistic images with an initial image by guiding stable diffusion at every denoising step with an additional CLIP model. The following code requires roughly 12GB of GPU RAM. @@ -1322,8 +1322,8 @@ target_prompt = "A golden retriever" # run the pipeline result_image = pipeline( - base_prompt=base_prompt, - target_prompt=target_prompt, + base_prompt=base_prompt, + target_prompt=target_prompt, image=cropped_image, ) @@ -1537,7 +1537,7 @@ python -m pip install intel_extension_for_pytorch==<version_name> -f https://dev pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_ipex") # For Float32 pipe.prepare_for_ipex(prompt, dtype=torch.float32, height=512, width=512) #value of image height/width should be consistent with the pipeline inference -# For BFloat16 +# For BFloat16 pipe.prepare_for_ipex(prompt, dtype=torch.bfloat16, height=512, width=512) #value of image height/width should be consistent with the pipeline inference ``` @@ -1545,7 +1545,7 @@ Then you can use the ipex pipeline in a similar way to the default stable diffus ```python # For Float32 image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' -# For BFloat16 +# For BFloat16 with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' ``` @@ -1604,21 +1604,21 @@ latency = elapsed_time(pipe4) print("Latency of StableDiffusionPipeline--fp32",latency) ``` - + ### CLIP Guided Images Mixing With Stable Diffusion ![clip_guided_images_mixing_examples](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/main.png) -CLIP guided stable diffusion images mixing pipeline allows to combine two images using standard diffusion models. -This approach is using (optional) CoCa model to avoid writing image description. +CLIP guided stable diffusion images mixing pipeline allows to combine two images using standard diffusion models. +This approach is using (optional) CoCa model to avoid writing image description. [More code examples](https://github.com/TheDenk/images_mixing) ### Stable Diffusion XL Long Weighted Prompt Pipeline -This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style. +This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style. -You can provide both `prompt` and `prompt_2`. if only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. +You can provide both `prompt` and `prompt_2`. if only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. ```python from diffusers import DiffusionPipeline @@ -1639,8 +1639,8 @@ neg_prompt = "blur, low quality, carton, animate" pipe.to("cuda") images = pipe( - prompt = prompt - , negative_prompt = neg_prompt + prompt = prompt + , negative_prompt = neg_prompt ).images[0] pipe.to("cpu") @@ -1648,7 +1648,7 @@ torch.cuda.empty_cache() images ``` -In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result. +In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result. ![Stable Diffusion XL Long Weighted Prompt Pipeline sample](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_long_weighted_prompt.png) ## Example Images Mixing (with CoCa) @@ -1700,7 +1700,7 @@ mixing_pipeline.enable_attention_slicing() mixing_pipeline = mixing_pipeline.to("cuda") # Pipeline running -generator = torch.Generator(device="cuda").manual_seed(17) +generator = torch.Generator(device="cuda").manual_seed(17) def download_image(url): response = requests.get(url) @@ -1729,7 +1729,7 @@ pipe_images = mixing_pipeline( ### Stable Diffusion Mixture Tiling This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details. - + ```python from diffusers import LMSDiscreteScheduler, DiffusionPipeline @@ -1802,7 +1802,7 @@ image.save('tensorrt_inpaint_mecha_robot.png') ### Stable Diffusion Mixture Canvas This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details. - + ```python from PIL import Image from diffusers import LMSDiscreteScheduler, DiffusionPipeline @@ -2011,7 +2011,7 @@ Reference Image ![reference_image](https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png) -Output Image +Output Image `prompt: 1 girl` @@ -2022,7 +2022,7 @@ Reference Image ![reference_image](https://github.com/huggingface/diffusers/assets/34944964/449bdab6-e744-4fb2-9620-d4068d9a741b) -Output Image +Output Image `prompt: A dog` @@ -2103,7 +2103,7 @@ Let's have a look at the images (*512X512*) | Without Feedback | With Feedback (1st image) | |---------------------|---------------------| -| ![Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_wo_feedback.jpg) | ![Feedback Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_w_feedback.png) | +| ![Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_wo_feedback.jpg) | ![Feedback Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_w_feedback.png) | ### Masked Im2Im Stable Diffusion Pipeline @@ -2256,7 +2256,7 @@ pipe.to(torch_device="cuda", torch_dtype=torch.float32) prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" # Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. -num_inference_steps = 4 +num_inference_steps = 4 images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images ``` @@ -2292,7 +2292,7 @@ input_image=Image.open("myimg.png") strength = 0.5 #strength =0 (no change) strength=1 (completely overwrite image) # Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. -num_inference_steps = 4 +num_inference_steps = 4 images = pipe(prompt=prompt, image=input_image, strength=strength, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images ``` @@ -2345,7 +2345,7 @@ assert len(images) == (len(prompts) - 1) * num_interpolation_steps ``` ### StableDiffusionUpscaleLDM3D Pipeline -[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D. +[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D. The abstract from the paper is: *Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods* @@ -2386,8 +2386,8 @@ upscaled_depth.save(f"upscaled_lemons_depth.png") ''' ### ControlNet + T2I Adapter Pipeline -This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once. -It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively. +This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once. +It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively. ```py import cv2 @@ -2538,7 +2538,7 @@ pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae rp_args = { "mode":"rows", "div": "1;1;1" -} +} prompt =""" green hair twintail BREAK @@ -2567,7 +2567,7 @@ for image in images: ### Cols, Rows mode In the Cols, Rows mode, you can split the screen vertically and horizontally and assign prompts to each region. The split ratio can be specified by 'div', and you can set the division ratio like '3;3;2' or '0.1;0.5'. Furthermore, as will be described later, you can also subdivide the split Cols, Rows to specify more complex regions. -In this image, the image is divided into three parts, and a separate prompt is applied to each. The prompts are divided by 'BREAK', and each is applied to the respective region. +In this image, the image is divided into three parts, and a separate prompt is applied to each. The prompts are divided by 'BREAK', and each is applied to the respective region. ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline2.png) ``` green hair twintail BREAK @@ -2625,7 +2625,7 @@ prompt =""" a girl in street with shirt, tie, skirt BREAK red, shirt BREAK green, tie BREAK -blue , skirt +blue , skirt """ ``` ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline3.png) @@ -2644,7 +2644,7 @@ If only one input is given for multiple regions, they are all assumed to be the The difference is that in Prompt, duplicate regions are added, whereas in Prompt-EX, duplicate regions are overwritten sequentially. Since they are processed in order, setting a TARGET with a large regions first makes it easier for the effect of small regions to remain unmuffled. ### Accuracy -In the case of a 512 x 512 image, Attention mode reduces the size of the region to about 8 x 8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact. +In the case of a 512 x 512 image, Attention mode reduces the size of the region to about 8 x 8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact. ``` girl hair twintail frills,ribbons, dress, face BREAK girl, ,face @@ -2675,13 +2675,13 @@ Negative prompts are equally effective across all regions, but it is possible to To activate Regional Prompter, it is necessary to enter settings in rp_args. The items that can be set are as follows. rp_args is a dictionary type. ### Input Parameters -Parameters are specified through the `rp_arg`(dictionary type). +Parameters are specified through the `rp_arg`(dictionary type). ``` rp_args = { "mode":"rows", "div": "1;1;1" -} +} pipe(prompt =prompt, rp_args = rp_args) ``` @@ -2760,7 +2760,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur def get_kernel(self): return self.k - + self.kernel_size = kernel_size self.conv = Blurkernel(blur_type='gaussian', kernel_size=kernel_size, @@ -2835,7 +2835,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur * ![sample](https://github.com/tongdaxu/Images/assets/22267548/0ceb5575-d42e-4f0b-99c0-50e69c982209) * The reconstruction is perceptually similar to the source image, but different in details. * In dps_pipeline.py, we also provide a super-resolution example, which should produce: - * Downsampled image: + * Downsampled image: * ![dps_mea](https://github.com/tongdaxu/Images/assets/22267548/ff6a33d6-26f0-42aa-88ce-f8a76ba45a13) * Reconstructed image: * ![dps_generated_image](https://github.com/tongdaxu/Images/assets/22267548/b74f084d-93f4-4845-83d8-44c0fa758a5f) @@ -2930,7 +2930,7 @@ The original repo can be found at [repo](https://github.com/PRIS-CV/DemoFusion). - `show_image` (`bool`, defaults to False): Determine whether to show intermediate results during generation. -``` +```py from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( @@ -2945,24 +2945,24 @@ prompt = "Envision a portrait of an elderly woman, her face a canvas of time, fr negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic" images = pipe( - prompt, + prompt, negative_prompt=negative_prompt, - height=3072, - width=3072, - view_batch_size=16, + height=3072, + width=3072, + view_batch_size=16, stride=64, - num_inference_steps=50, + num_inference_steps=50, guidance_scale=7.5, - cosine_scale_1=3, - cosine_scale_2=1, - cosine_scale_3=1, + cosine_scale_1=3, + cosine_scale_2=1, + cosine_scale_3=1, sigma=0.8, - multi_decoder=True, + multi_decoder=True, show_image=True ) ``` You can display and save the generated images as: -``` +```py def image_grid(imgs, save_path=None): w = 0 @@ -2980,7 +2980,7 @@ def image_grid(imgs, save_path=None): if save_path != None: img.save(save_path + "/img_{}.jpg".format((i + 1) * 1024)) w += w_ - + return grid image_grid(images, save_path="./outputs/") diff --git a/examples/research_projects/README.md b/examples/research_projects/README.md index ef50d423e68f..f1716eea8d61 100644 --- a/examples/research_projects/README.md +++ b/examples/research_projects/README.md @@ -1,7 +1,7 @@ # Research projects -This folder contains various research projects using 🧨 Diffusers. -They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder. +This folder contains various research projects using 🧨 Diffusers. +They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder. Updating them to the most recent version of the library will require some work. To use any of them, just run the command diff --git a/examples/research_projects/mulit_token_textual_inversion/README.md b/examples/research_projects/multi_token_textual_inversion/README.md similarity index 97% rename from examples/research_projects/mulit_token_textual_inversion/README.md rename to examples/research_projects/multi_token_textual_inversion/README.md index 1303f73c1756..97a71b41c7ea 100644 --- a/examples/research_projects/mulit_token_textual_inversion/README.md +++ b/examples/research_projects/multi_token_textual_inversion/README.md @@ -1,6 +1,6 @@ ## [Deprecated] Multi Token Textual Inversion -**IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the officail textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).** +**IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the official textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).** The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten. @@ -17,9 +17,9 @@ Feel free to add these options to your training! In practice num_vec_per_token a [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. -## Running on Colab +## Running on Colab -Colab for training +Colab for training [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) Colab for inference @@ -53,7 +53,7 @@ accelerate config ### Cat toy example -You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. +You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). @@ -63,7 +63,7 @@ Run the following command to authenticate your token huggingface-cli login ``` -If you have already cloned the repo, then you won't need to go through these steps. +If you have already cloned the repo, then you won't need to go through these steps. <br> diff --git a/examples/research_projects/mulit_token_textual_inversion/multi_token_clip.py b/examples/research_projects/multi_token_textual_inversion/multi_token_clip.py similarity index 100% rename from examples/research_projects/mulit_token_textual_inversion/multi_token_clip.py rename to examples/research_projects/multi_token_textual_inversion/multi_token_clip.py diff --git a/examples/research_projects/mulit_token_textual_inversion/requirements.txt b/examples/research_projects/multi_token_textual_inversion/requirements.txt similarity index 100% rename from examples/research_projects/mulit_token_textual_inversion/requirements.txt rename to examples/research_projects/multi_token_textual_inversion/requirements.txt diff --git a/examples/research_projects/mulit_token_textual_inversion/requirements_flax.txt b/examples/research_projects/multi_token_textual_inversion/requirements_flax.txt similarity index 100% rename from examples/research_projects/mulit_token_textual_inversion/requirements_flax.txt rename to examples/research_projects/multi_token_textual_inversion/requirements_flax.txt diff --git a/examples/research_projects/mulit_token_textual_inversion/textual_inversion.py b/examples/research_projects/multi_token_textual_inversion/textual_inversion.py similarity index 100% rename from examples/research_projects/mulit_token_textual_inversion/textual_inversion.py rename to examples/research_projects/multi_token_textual_inversion/textual_inversion.py diff --git a/examples/research_projects/mulit_token_textual_inversion/textual_inversion_flax.py b/examples/research_projects/multi_token_textual_inversion/textual_inversion_flax.py similarity index 100% rename from examples/research_projects/mulit_token_textual_inversion/textual_inversion_flax.py rename to examples/research_projects/multi_token_textual_inversion/textual_inversion_flax.py diff --git a/examples/research_projects/onnxruntime/README.md b/examples/research_projects/onnxruntime/README.md index 204d9c951c99..a7935189f61b 100644 --- a/examples/research_projects/onnxruntime/README.md +++ b/examples/research_projects/onnxruntime/README.md @@ -2,4 +2,4 @@ **This research project is not actively maintained by the diffusers team. For any questions or comments, please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.** -This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime. \ No newline at end of file +This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime. diff --git a/examples/research_projects/onnxruntime/text_to_image/README.md b/examples/research_projects/onnxruntime/text_to_image/README.md index cd9397939ac2..48bce2065444 100644 --- a/examples/research_projects/onnxruntime/text_to_image/README.md +++ b/examples/research_projects/onnxruntime/text_to_image/README.md @@ -34,7 +34,7 @@ accelerate config ### Pokemon example -You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. +You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). @@ -68,7 +68,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ - --output_dir="sd-pokemon-model" + --output_dir="sd-pokemon-model" ``` Please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions. \ No newline at end of file diff --git a/examples/research_projects/onnxruntime/textual_inversion/README.md b/examples/research_projects/onnxruntime/textual_inversion/README.md index 9f08983eaaad..261c4b49eeaf 100644 --- a/examples/research_projects/onnxruntime/textual_inversion/README.md +++ b/examples/research_projects/onnxruntime/textual_inversion/README.md @@ -3,9 +3,9 @@ [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. -## Running on Colab +## Running on Colab -Colab for training +Colab for training [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) Colab for inference @@ -39,7 +39,7 @@ accelerate config ### Cat toy example -You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. +You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). @@ -49,7 +49,7 @@ Run the following command to authenticate your token huggingface-cli login ``` -If you have already cloned the repo, then you won't need to go through these steps. +If you have already cloned the repo, then you won't need to go through these steps. <br> diff --git a/examples/research_projects/realfill/README.md b/examples/research_projects/realfill/README.md index b70f425368e5..91821031d2e0 100644 --- a/examples/research_projects/realfill/README.md +++ b/examples/research_projects/realfill/README.md @@ -35,7 +35,7 @@ from accelerate.utils import write_basic_config write_basic_config() ``` -When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. +When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. ### Toy example diff --git a/examples/research_projects/sdxl_flax/README.md b/examples/research_projects/sdxl_flax/README.md index 612fdf1edd43..252b7b596ec0 100644 --- a/examples/research_projects/sdxl_flax/README.md +++ b/examples/research_projects/sdxl_flax/README.md @@ -72,8 +72,8 @@ params = jax.tree_util.tree_map(lambda x: x.astype(jnp.bfloat16), params) params["scheduler"] = scheduler_state ``` This section adjusts the data types of the model parameters. -We convert all parameters to `bfloat16` to speed-up the computation with model weights. -**Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss +We convert all parameters to `bfloat16` to speed-up the computation with model weights. +**Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss in precision is degrading the pipeline's performance too significantly. **3. Define Inputs to Pipeline** @@ -146,12 +146,12 @@ For this we will be using a JAX feature called [Ahead of Time](https://jax.readt In [sdxl_single_aot.py](./sdxl_single_aot.py) we give a simple example of how to write our own parallelization logic for text-to-image generation pipeline in JAX using [StabilityAI's Stable Diffusion XL](stabilityai/stable-diffusion-xl-base-1.0) -We add a `aot_compile` function that compiles the `pipeline._generate` function +We add a `aot_compile` function that compiles the `pipeline._generate` function telling JAX which input arguments are static, that is, arguments that -are known at compile time and won't change. In our case, it is num_inference_steps, +are known at compile time and won't change. In our case, it is num_inference_steps, height, width and return_latents. -Once the function is compiled, these parameters are omitted from future calls and +Once the function is compiled, these parameters are omitted from future calls and cannot be changed without modifying the code and recompiling. ```python @@ -205,9 +205,9 @@ def generate( g = jnp.array([guidance_scale] * prompt_ids.shape[0], dtype=jnp.float32) g = g[:, None] images = p_generate( - prompt_ids, - p_params, - rng, + prompt_ids, + p_params, + rng, g, None, neg_prompt_ids) @@ -220,7 +220,7 @@ def generate( The first forward pass after AOT compilation still takes a while longer than subsequent passes, this is because on the first pass, JAX uses Python dispatch, which Fills the C++ dispatch cache. -When using jit, this extra step is done automatically, but when using AOT compilation, +When using jit, this extra step is done automatically, but when using AOT compilation, it doesn't happen until the function call is made. ```python diff --git a/examples/t2i_adapter/README_sdxl.md b/examples/t2i_adapter/README_sdxl.md index d583341c367f..1e5a19fedad1 100644 --- a/examples/t2i_adapter/README_sdxl.md +++ b/examples/t2i_adapter/README_sdxl.md @@ -42,7 +42,7 @@ from accelerate.utils import write_basic_config write_basic_config() ``` -When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. +When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. ## Circle filling dataset @@ -85,7 +85,7 @@ accelerate launch train_t2i_adapter_sdxl.py \ To better track our training experiments, we're using the following flags in the command above: * `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`. -* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. +* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. Our experiments were conducted on a single 40GB A100 GPU. diff --git a/examples/text_to_image/README.md b/examples/text_to_image/README.md index e2cbaca2a9d8..94fd63f33067 100644 --- a/examples/text_to_image/README.md +++ b/examples/text_to_image/README.md @@ -36,7 +36,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha ### Pokemon example -You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. +You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). @@ -71,7 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ - --output_dir="sd-pokemon-model" + --output_dir="sd-pokemon-model" ``` <!-- accelerate_snippet_end --> @@ -145,11 +145,11 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ - --max_train_steps=15000 \ + --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ - --output_dir="sd-pokemon-model" + --output_dir="sd-pokemon-model" ``` @@ -157,7 +157,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended -value when using it is 5.0. +value when using it is 5.0. You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/text2image-finetune-minsnr) that compares the loss surfaces of the following setups: @@ -167,7 +167,7 @@ You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/tex For our small Pokemons dataset, the effects of Min-SNR weighting strategy might not appear to be pronounced, but for larger datasets, we believe the effects will be more pronounced. -Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds. +Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds. ## Training with LoRA @@ -186,7 +186,7 @@ on consumer GPUs like Tesla T4, Tesla V100. ### Training -First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). +First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** @@ -197,7 +197,7 @@ export MODEL_NAME="CompVis/stable-diffusion-v1-4" export DATASET_NAME="lambdalabs/pokemon-blip-captions" ``` -For this example we want to directly store the trained LoRA embeddings on the Hub, so +For this example we want to directly store the trained LoRA embeddings on the Hub, so we need to be logged in and add the `--push_to_hub` flag. ```bash @@ -225,11 +225,11 @@ The above command will also run inference as fine-tuning progresses and log the The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___** -You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw). +You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw). ### Inference -Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You +Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You need to pass the `output_dir` for loading the LoRA weights which, in this case, is `sd-pokemon-model-lora`. ```python @@ -248,9 +248,9 @@ image.save("pokemon.png") If you are loading the LoRA parameters from the Hub and if the Hub repository has a `base_model` tag (such as [this](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/README.md?code=true#L4)), then -you can do: +you can do: -```py +```py from huggingface_hub.repocard import RepoCard lora_model_id = "sayakpaul/sd-model-finetuned-lora-t4" @@ -287,7 +287,7 @@ python train_text_to_image_flax.py \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ - --output_dir="sd-pokemon-model" + --output_dir="sd-pokemon-model" ``` To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata). @@ -321,5 +321,5 @@ According to [this issue](https://github.com/huggingface/diffusers/issues/2234#i ## Stable Diffusion XL -* We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). -* We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). +* We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). +* We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). diff --git a/examples/textual_inversion/README.md b/examples/textual_inversion/README.md index 0a1d8a459fc6..0a2723f0982f 100644 --- a/examples/textual_inversion/README.md +++ b/examples/textual_inversion/README.md @@ -3,9 +3,9 @@ [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. -## Running on Colab +## Running on Colab -Colab for training +Colab for training [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) Colab for inference @@ -84,11 +84,11 @@ accelerate launch textual_inversion.py \ A full training run takes ~1 hour on one V100 GPU. -**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618) +**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618) only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`. -However, one can also add multiple embedding vectors for the placeholder token -to increase the number of fine-tuneable parameters. This can help the model to learn -more complex details. To use multiple embedding vectors, you should define `--num_vectors` +However, one can also add multiple embedding vectors for the placeholder token +to increase the number of fine-tuneable parameters. This can help the model to learn +more complex details. To use multiple embedding vectors, you should define `--num_vectors` to a number larger than one, *e.g.*: ```bash --num_vectors 5 diff --git a/examples/unconditional_image_generation/README.md b/examples/unconditional_image_generation/README.md index d83dc928c7a1..2990b3abf3f5 100644 --- a/examples/unconditional_image_generation/README.md +++ b/examples/unconditional_image_generation/README.md @@ -27,7 +27,7 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e accelerate config ``` -### Unconditional Flowers +### Unconditional Flowers The command to train a DDPM UNet model on the Oxford Flowers dataset: @@ -52,7 +52,7 @@ A full training run takes 2 hours on 4xV100 GPUs. <img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" /> -### Unconditional Pokemon +### Unconditional Pokemon The command to train a DDPM UNet model on the Pokemon dataset: @@ -96,7 +96,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ --logger="wandb" ``` -To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`. +To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`. ### Using your own data diff --git a/examples/wuerstchen/text_to_image/README.md b/examples/wuerstchen/text_to_image/README.md index 5378e3ef5253..8b2040b4ca7f 100644 --- a/examples/wuerstchen/text_to_image/README.md +++ b/examples/wuerstchen/text_to_image/README.md @@ -72,7 +72,7 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de ### Prior Training -First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). +First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). ```bash export DATASET_NAME="lambdalabs/pokemon-blip-captions" diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 7562040596e9..d5125ae5caf2 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -1,33 +1,33 @@ # 🧨 Diffusers Pipelines Pipelines provide a simple way to run state-of-the-art diffusion models in inference. -Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler +Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler components - all of which are needed to have a functioning end-to-end diffusion system. As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models: - [Autoencoder](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/vae.py#L392) - [Conditional Unet](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/unet_2d_condition.py#L12) - [CLIP text encoder](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel) -- a scheduler component, [scheduler](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), +- a scheduler component, [scheduler](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), - a [CLIPImageProcessor](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPImageProcessor), - as well as a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py). -All of these components are necessary to run stable diffusion in inference even though they were trained +All of these components are necessary to run stable diffusion in inference even though they were trained or created independently from each other. -To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. +To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. More specifically, we strive to provide pipelines that - 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implementation according to the corresponding paper (*e.g.* [LDMTextToImagePipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), -- 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), +- 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), - 3. are easy to understand with code that is self-explanatory and can be read along-side the official paper (see [Pipelines summary](#pipelines-summary)), - 4. can easily be contributed by the community (see the [Contribution](#contribution) section). -**Note** that pipelines do not (and should not) offer any training functionality. +**Note** that pipelines do not (and should not) offer any training functionality. If you are looking for *official* training examples, please have a look at [examples](https://github.com/huggingface/diffusers/tree/main/examples). ## Pipelines Summary -The following table summarizes all officially supported pipelines, their corresponding paper, and if +The following table summarizes all officially supported pipelines, their corresponding paper, and if available a colab notebook to directly try them out. | Pipeline | Source | Tasks | Colab @@ -35,35 +35,35 @@ available a colab notebook to directly try them out. | [dance diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/Harmonai-org/sample-generator) | *Unconditional Audio Generation* | | [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | *Unconditional Image Generation* | | [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | *Unconditional Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | *Text-to-Image Generation* | -| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | *Unconditional Image Generation* | -| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | *Unconditional Image Generation* | -| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | -| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | +| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | *Text-to-Image Generation* | +| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | *Unconditional Image Generation* | +| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | *Unconditional Image Generation* | +| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | +| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | | [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) | [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Image-to-Image Text-Guided Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) | [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Text-Guided Image Inpainting* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) -| [stochastic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | *Unconditional Image Generation* | +| [stochastic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | *Unconditional Image Generation* | -**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. +**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below. ## Pipelines API -Diffusion models often consist of multiple independently-trained models or other previously existing components. +Diffusion models often consist of multiple independently-trained models or other previously existing components. -Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. +Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: - [`from_pretrained` method](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L139) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.* -"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be +"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format `<name>: ["<library>", "<class name>"]`. `<name>` is the attribute name given to the loaded instance of `<class name>` which can be found in the library or pipeline folder called `"<library>"`. -- [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. -In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated +- [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. +In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated from the local path. - [`to`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L118) which accepts a `string` or `torch.device` to move all models that are of type `torch.nn.Module` to the passed device. The behavior is fully analogous to [PyTorch's `to` method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to). -- [`__call__`] method to use the pipeline in inference. `__call__` defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for +- [`__call__`] method to use the pipeline in inference. `__call__` defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for each pipeline, one should look directly into the respective pipeline. **Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should @@ -71,12 +71,12 @@ not be used for training. If you want to store the gradients during the forward ## Contribution -We are more than happy about any contribution to the officially supported pipelines 🤗. We aspire +We are more than happy about any contribution to the officially supported pipelines 🤗. We aspire all of our pipelines to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. -- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the [`DiffusionPipeline` class](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L56) or be directly attached to the model and scheduler components of the pipeline. -- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and -use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most +- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the [`DiffusionPipeline` class](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L56) or be directly attached to the model and scheduler components of the pipeline. +- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and +use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method. - **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. We try to make the pipeline code as readable as possible so that each part –from pre-processing to diffusing to post-processing– can easily be adapted. If you would like the community to benefit from your customized pipeline, we would love to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better. - **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. @@ -93,8 +93,8 @@ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] - +image = pipe(prompt).images[0] + image.save("astronaut_rides_horse.png") ``` diff --git a/src/diffusers/pipelines/spectrogram_diffusion/continous_encoder.py b/src/diffusers/pipelines/spectrogram_diffusion/continuous_encoder.py similarity index 100% rename from src/diffusers/pipelines/spectrogram_diffusion/continous_encoder.py rename to src/diffusers/pipelines/spectrogram_diffusion/continuous_encoder.py diff --git a/src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py b/src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py index 93af3b1189d0..88725af452c2 100644 --- a/src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py +++ b/src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py @@ -29,7 +29,7 @@ from ..onnx_utils import OnnxRuntimeModel from ..pipeline_utils import AudioPipelineOutput, DiffusionPipeline -from .continous_encoder import SpectrogramContEncoder +from .continuous_encoder import SpectrogramContEncoder from .notes_encoder import SpectrogramNotesEncoder diff --git a/src/diffusers/pipelines/stable_diffusion/README.md b/src/diffusers/pipelines/stable_diffusion/README.md index 66df9a811afb..5b6424308f02 100644 --- a/src/diffusers/pipelines/stable_diffusion/README.md +++ b/src/diffusers/pipelines/stable_diffusion/README.md @@ -6,13 +6,13 @@ Stable Diffusion was proposed in [Stable Diffusion Announcement](https://stabili The summary of the model is the following: -*Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page. The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.* +*Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page. The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.* ## Tips: - Stable Diffusion has the same architecture as [Latent Diffusion](https://arxiv.org/abs/2112.10752) but uses a frozen CLIP Text Encoder instead of training the text encoder jointly with the diffusion model. - An in-detail explanation of the Stable Diffusion model can be found under [Stable Diffusion with 🧨 Diffusers](https://huggingface.co/blog/stable_diffusion). -- If you don't want to rely on the Hugging Face Hub and having to pass a authentication token, you can +- If you don't want to rely on the Hugging Face Hub and having to pass a authentication token, you can download the weights with `git lfs install; git clone https://huggingface.co/runwayml/stable-diffusion-v1-5` and instead pass the local path to the cloned folder to `from_pretrained` as shown below. - Stable Diffusion can work with a variety of different samplers as is shown below. @@ -28,7 +28,7 @@ download the weights with `git lfs install; git clone https://huggingface.co/run ### Using Stable Diffusion without being logged into the Hub. -If you want to download the model weights using a single Python line, you need to be logged in via `huggingface-cli login`. +If you want to download the model weights using a single Python line, you need to be logged in via `huggingface-cli login`. ```python from diffusers import DiffusionPipeline @@ -61,8 +61,8 @@ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] - +image = pipe(prompt).images[0] + image.save("astronaut_rides_horse.png") ``` @@ -75,13 +75,13 @@ from diffusers import StableDiffusionPipeline, DDIMScheduler scheduler = DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", + "runwayml/stable-diffusion-v1-5", scheduler=scheduler, ).to("cuda") prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] - +image = pipe(prompt).images[0] + image.save("astronaut_rides_horse.png") ``` @@ -94,13 +94,13 @@ from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler lms = LMSDiscreteScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", + "runwayml/stable-diffusion-v1-5", scheduler=lms, ).to("cuda") prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] - +image = pipe(prompt).images[0] + image.save("astronaut_rides_horse.png") ```