Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

brandonwsaw · 2023-12-08T10:25:25Z

Hey folks, I'm getting much worse behavior with Diffusers than A1111 when using ControlNet Inpainting. I'm using the exact same model, seed, inputs, etc. but it's clear the inpainting behavior is very different. Below is one example but I have more if it's helpful.
Lots of artifacts from Diffusers, A1111 essentially just recolors. Thanks for all your help, let me know how else I can be helpful.

Original Image

Diffusers Inpainting

A1111 Inpainting

Diffusers Script:

from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, DDIMScheduler, AutoencoderKL, EulerAncestralDiscreteScheduler
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image


init_image = load_image("image.png")
init_image = init_image.resize((1024, 1024))

generator = torch.Generator(device="cpu").manual_seed(478847657)

mask_image = load_image("mask.png")
mask_image = mask_image.resize((1024, 1024))


def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

# generate images
output_images = pipe(
    "red hair",
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    guidance_scale=7,
    controlnet_conditioning_scale=0.5,
).images

# Save the images
for i, image in enumerate(output_images):
    image.save(f'output{i+0}.png')

A111 Settings
red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 1024x1024, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 4, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.3, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a

--

Bonus Example (Top: Diffusers, Bottom: A1111)

The text was updated successfully, but these errors were encountered:

bghira · 2023-12-08T21:50:58Z

do you mind posting the image of the mask?

brandonwsaw · 2023-12-08T22:46:25Z

do you mind posting the image of the mask?

Sure:

I don't suspect it's related to the mask. This one isn't great, but similar one is used with the A1111 results. And getting the same problem with even very simple masks like the eyes one.

bghira · 2023-12-09T04:52:16Z

thank you. if you search the github issues you'll find one discussing inpainting in Diffusers vs A1111. there's some postprocessing you have to do, using the mask to actually composite the inpainted area into the original image. i wanted to see the mask so i could be more clear what the end result should be.

brandonwsaw · 2023-12-10T01:55:18Z

thank you. if you search the github issues you'll find one discussing inpainting in Diffusers vs A1111. there's some postprocessing you have to do, using the mask to actually composite the inpainted area into the original image. i wanted to see the mask so i could be more clear what the end result should be.

sorry to clarify, are you saying this is something I can solve myself with some postprocessing of the mask beforehand? I'm not sure I found the right issue you're referencing, do you mean this one? #5808

bghira · 2023-12-10T11:49:16Z

yes, currently it's done via post.

#4782
#3880

bghira · 2023-12-10T11:52:04Z

#4536 might actually be what you need.

brandonwsaw · 2023-12-11T03:40:09Z

Thanks, will play around with this, but this issue seems different to me - I'm seeing very different inpainting behavior within the mask than I get from A111, not issues outside the mask. (Although, I actually have noticed that in some other projects so this is good to know).

bghira · 2023-12-11T12:49:01Z

well the DDIM in Diffusers has some issues (#6068 comes to mind mostly) and so you might want to try Euler or even Euler A.

tolgacangoz · 2023-12-11T16:49:24Z

Hi @brandonwsaw. It seems that you used DDIM in the code but Euler a in A1111. Also, diffusers has not supported several A1111 features such as Mask blur yet.

kadirnar · 2023-12-11T17:22:47Z

They are adding mask_blur support. But the inpaint pipeline doesn't work well.

#6072

yiyixuxu · 2023-12-11T17:47:01Z

Hi @brandonwsaw

thanks for the issue!

Yeah I think there are lots of differences in settings, most have been summarized by @bghira and @StandardAI :

mask_blur: it is just a pre-processing step for the mask; you can use this line to create blurred mask and use it instead
```
          mask_b = mask.filter(ImageFilter.GaussianBlur(0.4))
```
controlnet_conditioning_scale are different: 0.5 in diffusers 0.3 in auto1111
schedulers are different
image sizes are different:auto1111 config says the output size is 1024; does this mean an upscaler is applied?
post-processing is different, diffusers do not overlay the output to the original image, and this should be responsible for the difference we see in the unmasked area.
what is "pixel-perfect" in auto1111 setting? what option is it corresponding to in UI?
what is the "masked_content" mode here? Is it "originaL"? if so, if we want to achieve similar in diffusers, you would use a strength value that's slightly lower than 1.0, e.g. 0.999. in diffusers, when you pass strength == 1.0, it will use a random noise as initial latent, which is similar to the "latent_noise" mode in auto1111

brandonwsaw · 2023-12-11T19:25:06Z

Thanks all for your input and help. I had some red herrings in there, my fault - I pasted over A1111 settings from a run that didn't match, but I'm seeing the same behavior even when all settings are identical. Here's an example where settings are identicall. You can see A1111 seems to be a recolor, diffusers has pretty different behavior inside the mask.

Both are: Euler A, 512x512, CFG 7, ControlNet Weight 0.5, Original Latent, Denoising 1, Mask Blur 0

A111:

red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 512x512, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.5, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a

Diffusers:

from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image


init_image = load_image("image (1).png")
init_image = init_image.resize((512, 512))

generator = torch.Generator(device="cpu").manual_seed(478847657)

mask_image = load_image("hair-mask (1).png")
mask_image = mask_image.resize((512, 512))


def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False

# generate images
output_images = pipe(
    "red hair",
    negative_prompt='',
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    guidance_scale=7,
    controlnet_conditioning_scale=0.5,
    strength=0.999,
).images

# Save the images
for i, image in enumerate(output_images):
    image.save(f'output{i+3}.png')```

brandonwsaw · 2023-12-11T19:29:00Z

Hi @brandonwsaw

thanks for the issue!

Yeah I think there are lots of differences in settings, most have been summarized by @bghira and @StandardAI :
mask_blur: it is just a pre-processing step for the mask; you can use this line to create blurred mask and use it instead
          mask_b = mask.filter(ImageFilter.GaussianBlur(0.4))
controlnet_conditioning_scale are different: 0.5 in diffusers 0.3 in auto1111

schedulers are different

image sizes are different:auto1111 config says the output size is 1024; does this mean an upscaler is applied?

post-processing is different, diffusers do not overlay the output to the original image, and this should be responsible for the difference we see in the unmasked area.

what is "pixel-perfect" in auto1111 setting? what option is it corresponding to in UI?

what is the "masked_content" mode here? Is it "originaL"? if so, if we want to achieve similar in diffusers, you would use a strength value that's slightly lower than 1.0, e.g. 0.999. in diffusers, when you pass strength == 1.0, it will use a random noise as initial latent, which is similar to the "latent_noise" mode in auto1111

Thanks, interesting to know about mask blur, post processing, and especially the masked content, but I did play with those and they don't seem responsible. I turned off mask blur and used the 0.999 trick in the example above. A1111 also produces a similar result with mask_content set to latent noise.

I'm not exactly sure what Pixel Perfect is, here's the UI, default is False:

yiyixuxu · 2023-12-11T19:39:19Z

@brandonwsaw

Interesting..
thanks a lot for these additional experiments! Can we set controlnet_conditioning_scale = 0 in both to compare? just want to see if the difference coming from the controlnet part or inpaint part

brandonwsaw · 2023-12-11T19:59:19Z

Sure, here's with the control weight at 0:

A1111:

red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 512x512, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a

Diffusers:

    "red hair",
    negative_prompt='',
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    guidance_scale=7,
    controlnet_conditioning_scale=0.0,
    strength=0.999,
).images

yiyixuxu · 2023-12-11T20:04:47Z

@brandonwsaw
thanks! will look into now:)

yiyixuxu · 2023-12-12T01:56:21Z

hi @brandonwsaw
There are two things I noticed here:

the image and mask you provided has different aspect ratio: image size is 395 x 393 (not 1:1), and mask size is 572 x 572(1:1); so simply running the PIL.Image.resize() method on both will cause the image and mask to slightly mismatch; In auto1111 you used "crop and resize", which crop the image to 393 x 393 first before resize to 572 x 572
I don't think "simply recolor the hair" is the expected behavior, even for the inpaint controlnet in auto1111. Normally it would use "masked image" as input for controlnet, which would be the same as diffusers, i.e. the output of make_inpaint_condition. However, in this particular example, because the image and mask you provided have different sizes, it decided to use the "image" instead of "masked image" as the control_image; here is an example output from auto1111 when your image and mask have same size:
if you want to use this pipeline to only recolor hair, you can modify the make_inpaint_condition function. This script will generate same result as auto1111

from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
from diffusers import EulerAncestralDiscreteScheduler, ControlNetModel, StableDiffusionControlNetInpaintPipeline


init_image = load_image("yiyi_image_girl.png")

generator = torch.Generator(device="cpu").manual_seed(478847657)

mask_image = load_image("yiyi_image_mask_girl.png")

def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False

# generate images
output_images = pipe(
    "red hair",
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    guidance_scale=7,
    controlnet_conditioning_scale=0.5,
    strength=0.999,
).images

# Save the images
for i, image in enumerate(output_images):
    image.save(f'test_5_output{i+3}.png')

image

mask

output

kadirnar · 2023-12-12T15:10:00Z

@yiyixuxu ,

How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model.
Mikubill/sd-webui-controlnet#2225

tolgacangoz · 2023-12-12T15:47:44Z

@yiyixuxu ,

How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225

Isn't this what you are looking for or did I understand something wrong?

kadirnar · 2023-12-12T15:51:18Z

@yiyixuxu ,
How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225

Isn't this what you are looking for or did I understand something wrong?

No. I'm looking for the sdxl version of this model.

https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint

tolgacangoz · 2023-12-12T16:02:57Z

@yiyixuxu ,
How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. Mikubill/sd-webui-controlnet#2225

Isn't this what you are looking for or did I understand something wrong?

No. I'm looking for the sdxl version of this model.

https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint

OK then, sry 😅.

brandonwsaw · 2023-12-13T18:51:50Z

@yiyixuxu thanks for looking into this. I don't think mask size is the issue here - I grabbed a quick screenshot with the snip tool to post here which is why one of them is slightly different dimensions. But the image/mask I used in my script are both 512x512 (below). And in A111, I'm using their native inpaint function to draw on top of the original image, so the image/mask must be identical.

Interesting, I'll give that mask inpaint condition a shot, seems neat. But I do suspect there's something going on with controlnet, I'm getting worse results even outside of hair recoloring. Here's an example of changing the mouth, again results are pretty different. It's harder to see the differences bc it's smaller (that's why I picked the hair example to show), but Diffusers has more artifacts, blurry lines, and generally lower quality.

Don't want to take up more of your time if you don't think there's something underlying here, but after spending a lot of time trying to recreate A111 results with Diffusers across different experiments it feels like the controlnet for Diffusers isn't as effective for inpainting.

A1111

open mouth, talking, laughing Negative prompt: closed mouth Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 1024x1024, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 0, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.3, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a

Diffusers

from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image


init_image = load_image("image.png")
init_image = init_image.resize((1024, 1024))

generator = torch.Generator(device="cpu").manual_seed(478847657)

mask_image = load_image("mouth-mask.png")
mask_image = mask_image.resize((1024, 1024))


def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False

# generate images
output_images = pipe(
    "open mouth, talking, laughing",
    negative_prompt='closed mouth',
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    guidance_scale=7,
    controlnet_conditioning_scale=0.3,
    strength=0.999,
).images

# Save the images
for i, image in enumerate(output_images):
    image.save(f'output{i+8}.png')```

bghira · 2023-12-13T20:05:09Z

i think the difference comes down to seeds. although A1111's output has worse image compression artifacts.

the inpainted mouth looks bad there, too. some kind of image ghosting, lips where they don't belong or something?

as opposed to Diffusers...

but i don't think it's "much worse results" with Diffusers. am i missing it? i don't have the best eyes.

github-actions · 2024-01-09T15:04:57Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

simbrams · 2024-01-18T10:48:39Z

Hey, I'm running into the same issue, did you guys found a solution to this small quality difference ?

github-actions · 2024-02-11T15:05:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu added the inpainting issues/questions related related to inpainting/outpainting label Dec 8, 2023

yiyixuxu self-assigned this Dec 8, 2023

github-actions bot added the stale Issues that haven't received updates label Jan 9, 2024

github-actions bot closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

brandonwsaw commented Dec 8, 2023

bghira commented Dec 8, 2023

brandonwsaw commented Dec 8, 2023

bghira commented Dec 9, 2023 •

edited

Loading

brandonwsaw commented Dec 10, 2023 •

edited

Loading

bghira commented Dec 10, 2023

bghira commented Dec 10, 2023

brandonwsaw commented Dec 11, 2023

bghira commented Dec 11, 2023

tolgacangoz commented Dec 11, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

brandonwsaw commented Dec 11, 2023 •

edited

Loading

brandonwsaw commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

brandonwsaw commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

yiyixuxu commented Dec 12, 2023 •

edited

Loading

kadirnar commented Dec 12, 2023

tolgacangoz commented Dec 12, 2023

kadirnar commented Dec 12, 2023

tolgacangoz commented Dec 12, 2023

brandonwsaw commented Dec 13, 2023

bghira commented Dec 13, 2023

github-actions bot commented Jan 9, 2024

simbrams commented Jan 18, 2024

github-actions bot commented Feb 11, 2024

Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

Comments

brandonwsaw commented Dec 8, 2023

bghira commented Dec 8, 2023

brandonwsaw commented Dec 8, 2023

bghira commented Dec 9, 2023 • edited Loading

brandonwsaw commented Dec 10, 2023 • edited Loading

bghira commented Dec 10, 2023

bghira commented Dec 10, 2023

brandonwsaw commented Dec 11, 2023

bghira commented Dec 11, 2023

tolgacangoz commented Dec 11, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

brandonwsaw commented Dec 11, 2023 • edited Loading

brandonwsaw commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

brandonwsaw commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

yiyixuxu commented Dec 12, 2023 • edited Loading

kadirnar commented Dec 12, 2023

tolgacangoz commented Dec 12, 2023

kadirnar commented Dec 12, 2023

tolgacangoz commented Dec 12, 2023

brandonwsaw commented Dec 13, 2023

bghira commented Dec 13, 2023

github-actions bot commented Jan 9, 2024

simbrams commented Jan 18, 2024

github-actions bot commented Feb 11, 2024

bghira commented Dec 9, 2023 •

edited

Loading

brandonwsaw commented Dec 10, 2023 •

edited

Loading

brandonwsaw commented Dec 11, 2023 •

edited

Loading

yiyixuxu commented Dec 12, 2023 •

edited

Loading