From f0ca3ed66a23d440e3ddcb799fe0458babceb76c Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Mon, 20 May 2024 09:49:16 -0700 Subject: [PATCH 1/4] noise schedule --- docs/source/en/_toctree.yml | 2 + .../en/using-diffusers/scheduler_features.md | 73 +++++++++++++++++++ docs/source/en/using-diffusers/schedulers.md | 56 -------------- 3 files changed, 75 insertions(+), 56 deletions(-) create mode 100644 docs/source/en/using-diffusers/scheduler_features.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 2f4651ba3417..34c86503d42b 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -59,6 +59,8 @@ title: Distributed inference with multiple GPUs - local: using-diffusers/merge_loras title: Merge LoRAs + - local: using-diffusers/scheduler_features + title: Scheduler features - local: using-diffusers/callback title: Pipeline callbacks - local: using-diffusers/reusing_seeds diff --git a/docs/source/en/using-diffusers/scheduler_features.md b/docs/source/en/using-diffusers/scheduler_features.md new file mode 100644 index 000000000000..8bbe3d1fad14 --- /dev/null +++ b/docs/source/en/using-diffusers/scheduler_features.md @@ -0,0 +1,73 @@ + + +# Scheduler features + +The scheduler is an important component of any diffusion model because it controls the entire denoising (or sampling) process. There are many types of schedulers, some are optimized for speed and some for quality. With Diffusers, you can modify the scheduler configuration to use custom noise schedules, sigmas, and rescale the noise schedule. Changing these parameters can have profound effects on inference quality and speed. + +This guide will demonstrate how to use these features to improve inference quality. + +> [!TIP] +> Diffusers currently only supports the `timesteps` and `sigmas` parameters for a select list of schedulers and pipelines. Feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you want to extend these parameters to a scheduler and pipeline that does not currently support it! + +## Timestep schedules + +The timestep or noise schedule determines the amount of noise at each sampling step. The scheduler uses this to generate an image with the corresponding amount of noise at each step. The timestep schedule is generated from the scheduler's default configuration, but you can customize the scheduler to use new and optimized sampling schedules that aren't in Diffusers yet. + +For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. This optimal schedule for 10 steps was calculated to be: + +```py +from diffusers.schedulers import AysSchedules + +sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"] +print(sampling_schedule) +"[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]" +``` + +You can use the AYS sampling schedule in a pipeline by passing it to the `timesteps` parameter. + +```py +pipeline = StableDiffusionXLPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") +pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++") + +prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" +generator = torch.Generator(device="cpu").manual_seed(2487854446) +image = pipeline( + prompt=prompt, + negative_prompt="", + generator=generator, + timesteps=sampling_schedule, +).images[0] +``` + +
+
+ +
AYS timestep schedule 10 steps
+
+
+ +
Linearly-spaced timestep schedule 10 steps
+
+
+ +
Linearly-spaced timestep schedule 25 steps
+
+
+ +## Sigmas + +## Rescale noise schedule diff --git a/docs/source/en/using-diffusers/schedulers.md b/docs/source/en/using-diffusers/schedulers.md index bfc8aa1a2108..01dab2bed7fe 100644 --- a/docs/source/en/using-diffusers/schedulers.md +++ b/docs/source/en/using-diffusers/schedulers.md @@ -212,62 +212,6 @@ images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True). images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) ``` -## Custom Timestep Schedules - -With all our schedulers, you can choose one of the popular timestep schedules using configurations such as `timestep_spacing`, `interpolation_type`, and `use_karras_sigmas`. Some schedulers also provide the flexibility to use a custom timestep schedule. You can use any list of arbitrary timesteps, we will use the AYS timestep schedule here as example. It is a set of 10-step optimized timestep schedules released by researchers from Nvidia that can achieve significantly better quality compared to the preset timestep schedules. You can read more about their research [here](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/). - -```python -from diffusers.schedulers import AysSchedules -sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"] -print(sampling_schedule) -``` -``` -[999, 845, 730, 587, 443, 310, 193, 116, 53, 13] -``` - -You can then create a pipeline and pass this custom timestep schedule to it as `timesteps`. - -```python -pipe = StableDiffusionXLPipeline.from_pretrained( - "SG161222/RealVisXL_V4.0", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") - -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++") - -prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" - -generator = torch.Generator(device="cpu").manual_seed(2487854446) - -image = pipe( - prompt=prompt, - negative_prompt="", - generator=generator, - timesteps=sampling_schedule, -).images[0] -``` -The generated image has better quality than the default linear timestep schedule for the same number of steps, and it is similar to the default timestep scheduler when running for 25 steps. - -
-
- -
AYS timestep schedule 10 steps
-
-
- -
Linearly-spaced timestep schedule 10 steps
-
-
- -
Linearly-spaced timestep schedule 25 steps
-
-
- -> [!TIP] -> 🤗 Diffusers currently only supports `timesteps` and `sigmas` for a selected list of schedulers and pipelines, but feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you want to extend feature to a scheduler and pipeline that does not currently support it! - - ## Models Models are loaded from the [`ModelMixin.from_pretrained`] method, which downloads and caches the latest version of the model weights and configurations. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache instead of re-downloading them. From bb6f7ab3f089c500e706a60c87fd2b428b2aa96f Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Mon, 20 May 2024 14:48:59 -0700 Subject: [PATCH 2/4] sigmas and zero snr --- .../en/using-diffusers/image_quality.md | 46 +------ .../en/using-diffusers/scheduler_features.md | 121 +++++++++++++++++- 2 files changed, 120 insertions(+), 47 deletions(-) diff --git a/docs/source/en/using-diffusers/image_quality.md b/docs/source/en/using-diffusers/image_quality.md index 8961f88b904d..c25fa1467edf 100644 --- a/docs/source/en/using-diffusers/image_quality.md +++ b/docs/source/en/using-diffusers/image_quality.md @@ -12,54 +12,10 @@ specific language governing permissions and limitations under the License. # Controlling image quality -The components of a diffusion model, like the UNet and scheduler, can be optimized to improve the quality of generated images leading to better image lighting and details. These techniques are especially useful if you don't have the resources to simply use a larger model for inference. You can enable these techniques during inference without any additional training. +The components of a diffusion model, like the UNet and scheduler, can be optimized to improve the quality of generated images leading to better details. These techniques are especially useful if you don't have the resources to simply use a larger model for inference. You can enable these techniques during inference without any additional training. This guide will show you how to turn these techniques on in your pipeline and how to configure them to improve the quality of your generated images. -## Lighting - -The Stable Diffusion models aren't very good at generating images that are very bright or dark because the scheduler doesn't start sampling from the last timestep and it doesn't enforce a zero signal-to-noise ratio (SNR). The [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://hf.co/papers/2305.08891) paper fixes these issues which are now available in some Diffusers schedulers. - -> [!TIP] -> For inference, you need a model that has been trained with *v_prediction*. To train your own model with *v_prediction*, add the following flag to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts. -> -> ```bash -> --prediction_type="v_prediction" -> ``` - -For example, load the [ptx0/pseudo-journey-v2](https://hf.co/ptx0/pseudo-journey-v2) checkpoint which was trained with `v_prediction` and the [`DDIMScheduler`]. Now you should configure the following parameters in the [`DDIMScheduler`]. - -* `rescale_betas_zero_snr=True` to rescale the noise schedule to zero SNR -* `timestep_spacing="trailing"` to start sampling from the last timestep - -Set `guidance_rescale` in the pipeline to prevent over-exposure. A lower value increases brightness but some of the details may appear washed out. - -```py -from diffusers import DiffusionPipeline, DDIMScheduler - -pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", use_safetensors=True) - -pipeline.scheduler = DDIMScheduler.from_config( - pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing" -) -pipeline.to("cuda") -prompt = "cinematic photo of a snowy mountain at night with the northern lights aurora borealis overhead, 35mm photograph, film, professional, 4k, highly detailed" -generator = torch.Generator(device="cpu").manual_seed(23) -image = pipeline(prompt, guidance_rescale=0.7, generator=generator).images[0] -image -``` - -
-
- -
default Stable Diffusion v2-1 image
-
-
- -
image with zero SNR and trailing timestep spacing enabled
-
-
- ## Details [FreeU](https://hf.co/papers/2309.11497) improves image details by rebalancing the UNet's backbone and skip connection weights. The skip connections can cause the model to overlook some of the backbone semantics which may lead to unnatural image details in the generated image. This technique does not require any additional training and can be applied on the fly during inference for tasks like image-to-image and text-to-video. diff --git a/docs/source/en/using-diffusers/scheduler_features.md b/docs/source/en/using-diffusers/scheduler_features.md index 8bbe3d1fad14..9bdd77c53058 100644 --- a/docs/source/en/using-diffusers/scheduler_features.md +++ b/docs/source/en/using-diffusers/scheduler_features.md @@ -23,7 +23,7 @@ This guide will demonstrate how to use these features to improve inference quali The timestep or noise schedule determines the amount of noise at each sampling step. The scheduler uses this to generate an image with the corresponding amount of noise at each step. The timestep schedule is generated from the scheduler's default configuration, but you can customize the scheduler to use new and optimized sampling schedules that aren't in Diffusers yet. -For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. This optimal schedule for 10 steps was calculated to be: +For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. The optimal [10-step schedule](https://github.com/huggingface/diffusers/blob/a7bf77fc284810483f1e60afe34d1d27ad91ce2e/src/diffusers/schedulers/scheduling_utils.py#L51) for Stable Diffusion XL is: ```py from diffusers.schedulers import AysSchedules @@ -41,7 +41,7 @@ pipeline = StableDiffusionXLPipeline.from_pretrained( torch_dtype=torch.float16, variant="fp16", ).to("cuda") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++") +pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++") prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" generator = torch.Generator(device="cpu").manual_seed(2487854446) @@ -70,4 +70,121 @@ image = pipeline( ## Sigmas +The `sigmas` parameter is the amount of noise added at each timestep according to the timestep schedule. Like the `timesteps` parameter, you can customize the `sigmas` parameter to control how much noise is added at each step. When you use a custom `sigmas` value, the `timesteps` are calculated from the custom `sigmas` value and the default scheduler configuration is ignored. + +For example, you can manually pass the [sigmas](https://github.com/huggingface/diffusers/blob/6529ee67ec02fcf58d2fd9242164ea002b351d75/src/diffusers/schedulers/scheduling_utils.py#L55) for something like the 10-step AYS schedule from before to the pipeline. + +```py +import torch + +from diffusers import DiffusionPipeline, EulerDiscreteScheduler + +model_id = "stabilityai/stable-diffusion-xl-base-1.0" +pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") +pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) + +sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.0] +prompt = "anthropomorphic capybara wearing a suit and working with a computer" +generator = torch.Generator(device='cuda').manual_seed(123) +image = pipeline( + prompt=prompt, + num_inference_steps=10, + sigmas=sigmas, + generator=generator +).images[0] +``` + +When you take a look at the scheduler's `timesteps` parameter, you'll see that it is the same as the AYS timestep schedule because the `timestep` schedule is calculated from the `sigmas`. + +```py +print(f" timesteps: {pipe.scheduler.timesteps}") +"timesteps: tensor([999., 845., 730., 587., 443., 310., 193., 116., 53., 13.], device='cuda:0')" +``` + +### Karras sigmas + +> [!TIP] +> Refer to the scheduler API [overview](../api/schedulers/overview) for a list of schedulers that support Karras sigmas. + +Karras scheduler's use the timestep schedule and sigmas from the [Elucidating the Design Space of Diffusion-Based Generative Models](https://hf.co/papers/2206.00364) paper. This scheduler variant applies a smaller amount of noise per step as it approaches the end of the sampling process compared to other schedulers, and can increase the level of details in the generated image. + +Enable Karras sigmas by setting `use_karras_sigmas=True` in the scheduler. + +```py +import torch +from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler + +pipeline = StableDiffusionXLPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") +pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True) + +prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" +generator = torch.Generator(device="cpu").manual_seed(2487854446) +image = pipeline( + prompt=prompt, + negative_prompt="", + generator=generator, +).images[0] +``` + +
+
+ +
Karras sigmas enabled
+
+
+ +
Karras sigmas disabled
+
+
+ ## Rescale noise schedule + +In the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://hf.co/papers/2305.08891) paper, the authors discovered that common noise schedules allowed some signal to leak into the last timestep. This signal leakage at inference can cause models to only generate images with medium brightness. By enforcing a zero signal-to-noise ratio (SNR) for the timstep schedule and sampling from the last timestep, the model can be improved to generate very bright or dark images. + +> [!TIP] +> For inference, you need a model that has been trained with *v_prediction*. To train your own model with *v_prediction*, add the following flag to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts. +> +> ```bash +> --prediction_type="v_prediction" +> ``` + +For example, load the [ptx0/pseudo-journey-v2](https://hf.co/ptx0/pseudo-journey-v2) checkpoint which was trained with `v_prediction` and the [`DDIMScheduler`]. Configure the following parameters in the [`DDIMScheduler`]: + +* `rescale_betas_zero_snr=True` to rescale the noise schedule to zero SNR +* `timestep_spacing="trailing"` to start sampling from the last timestep + +Set `guidance_rescale` in the pipeline to prevent over-exposure. A lower value increases brightness but some of the details may appear washed out. + +```py +from diffusers import DiffusionPipeline, DDIMScheduler + +pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", use_safetensors=True) + +pipeline.scheduler = DDIMScheduler.from_config( + pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing" +) +pipeline.to("cuda") +prompt = "cinematic photo of a snowy mountain at night with the northern lights aurora borealis overhead, 35mm photograph, film, professional, 4k, highly detailed" +generator = torch.Generator(device="cpu").manual_seed(23) +image = pipeline(prompt, guidance_rescale=0.7, generator=generator).images[0] +image +``` + +
+
+ +
default Stable Diffusion v2-1 image
+
+
+ +
image with zero SNR and trailing timestep spacing enabled
+
+
From c16d56ab061061d64bdfbc5805254fa72888d300 Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Tue, 21 May 2024 12:14:58 -0700 Subject: [PATCH 3/4] feedback --- .../en/using-diffusers/scheduler_features.md | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/docs/source/en/using-diffusers/scheduler_features.md b/docs/source/en/using-diffusers/scheduler_features.md index 9bdd77c53058..5828bd575b4a 100644 --- a/docs/source/en/using-diffusers/scheduler_features.md +++ b/docs/source/en/using-diffusers/scheduler_features.md @@ -68,6 +68,49 @@ image = pipeline( +## Timestep spacing + +The way sample steps are selected in the schedule can affect the quality of the generated image, especially with respect to [rescaling the noise schedule](#rescale-noise-schedule), which can enable a model to generate much brighter or darker images. Diffusers provides three timestep spacing methods: + +- `leading` creates evenly spaced steps +- `linspace` includes the first and last steps and evenly selects the remaining intermediate steps +- `trailing` only includes the last step and evenly selects the remaining intermediate steps starting from the end + +It is recommended to use the `trailing` spacing method because it generates higher quality images with more details when there are fewer sample steps. But the difference in quality is not as obvious for more standard sample step values. + +```py +import torch +from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler + +pipeline = StableDiffusionXLPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + variant="fp16", +).to("cuda") +pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, timestep_spacing="trailing") + +prompt = "A cinematic shot of a cute little black cat sitting on a pumpkin at night" +generator = torch.Generator(device="cpu").manual_seed(2487854446) +image = pipeline( + prompt=prompt, + negative_prompt="", + generator=generator, + num_inference_steps=5, +).images[0] +image +``` + +
+
+ +
trailing spacing after 5 steps
+
+
+ +
leading spacing after 5 steps
+
+
+ ## Sigmas The `sigmas` parameter is the amount of noise added at each timestep according to the timestep schedule. Like the `timesteps` parameter, you can customize the `sigmas` parameter to control how much noise is added at each step. When you use a custom `sigmas` value, the `timesteps` are calculated from the custom `sigmas` value and the default scheduler configuration is ignored. From e635b1f6295cccbac68d6c31a32492170699be8b Mon Sep 17 00:00:00 2001 From: Steven Liu Date: Tue, 28 May 2024 14:31:09 -0700 Subject: [PATCH 4/4] feedback --- docs/source/en/using-diffusers/scheduler_features.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/en/using-diffusers/scheduler_features.md b/docs/source/en/using-diffusers/scheduler_features.md index 5828bd575b4a..445acdccc489 100644 --- a/docs/source/en/using-diffusers/scheduler_features.md +++ b/docs/source/en/using-diffusers/scheduler_features.md @@ -152,6 +152,8 @@ print(f" timesteps: {pipe.scheduler.timesteps}") > [!TIP] > Refer to the scheduler API [overview](../api/schedulers/overview) for a list of schedulers that support Karras sigmas. +> +> Karras sigmas should not be used for models that weren't trained with them. For example, the base Stable Diffusion XL model shouldn't use Karras sigmas but the [DreamShaperXL](https://hf.co/Lykon/dreamshaper-xl-1-0) model can since they are trained with Karras sigmas. Karras scheduler's use the timestep schedule and sigmas from the [Elucidating the Design Space of Diffusion-Based Generative Models](https://hf.co/papers/2206.00364) paper. This scheduler variant applies a smaller amount of noise per step as it approaches the end of the sampling process compared to other schedulers, and can increase the level of details in the generated image.