-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101
Comments
do you mind posting the image of the mask? |
thank you. if you search the github issues you'll find one discussing inpainting in Diffusers vs A1111. there's some postprocessing you have to do, using the mask to actually composite the inpainted area into the original image. i wanted to see the mask so i could be more clear what the end result should be. |
sorry to clarify, are you saying this is something I can solve myself with some postprocessing of the mask beforehand? I'm not sure I found the right issue you're referencing, do you mean this one? #5808 |
#4536 might actually be what you need. |
Thanks, will play around with this, but this issue seems different to me - I'm seeing very different inpainting behavior within the mask than I get from A111, not issues outside the mask. (Although, I actually have noticed that in some other projects so this is good to know). |
well the DDIM in Diffusers has some issues (#6068 comes to mind mostly) and so you might want to try Euler or even Euler A. |
Hi @brandonwsaw. It seems that you used |
They are adding mask_blur support. But the inpaint pipeline doesn't work well. |
Hi @brandonwsaw thanks for the issue! Yeah I think there are lots of differences in settings, most have been summarized by @bghira and @StandardAI :
|
Thanks, interesting to know about mask blur, post processing, and especially the masked content, but I did play with those and they don't seem responsible. I turned off mask blur and used the 0.999 trick in the example above. A1111 also produces a similar result with mask_content set to latent noise. I'm not exactly sure what Pixel Perfect is, here's the UI, default is False: |
Interesting.. |
@brandonwsaw |
hi @brandonwsaw
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
from diffusers import EulerAncestralDiscreteScheduler, ControlNetModel, StableDiffusionControlNetInpaintPipeline
init_image = load_image("yiyi_image_girl.png")
generator = torch.Generator(device="cpu").manual_seed(478847657)
mask_image = load_image("yiyi_image_mask_girl.png")
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image
control_image = make_inpaint_condition(init_image, mask_image)
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"stablediffusionapi/anything-v5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.safety_checker = None
pipe.requires_safety_checker = False
# generate images
output_images = pipe(
"red hair",
num_inference_steps=20,
generator=generator,
image=init_image,
mask_image=mask_image,
control_image=control_image,
guidance_scale=7,
controlnet_conditioning_scale=0.5,
strength=0.999,
).images
# Save the images
for i, image in enumerate(output_images):
image.save(f'test_5_output{i+3}.png') |
How can I do this for SDXL? Because there is no sdxl-controlnet-inpaint model. |
Isn't this what you are looking for or did I understand something wrong? |
No. I'm looking for the sdxl version of this model. |
OK then, sry 😅. |
@yiyixuxu thanks for looking into this. I don't think mask size is the issue here - I grabbed a quick screenshot with the snip tool to post here which is why one of them is slightly different dimensions. But the image/mask I used in my script are both 512x512 (below). And in A111, I'm using their native inpaint function to draw on top of the original image, so the image/mask must be identical. Interesting, I'll give that mask inpaint condition a shot, seems neat. But I do suspect there's something going on with controlnet, I'm getting worse results even outside of hair recoloring. Here's an example of changing the mouth, again results are pretty different. It's harder to see the differences bc it's smaller (that's why I picked the hair example to show), but Diffusers has more artifacts, blurry lines, and generally lower quality. Don't want to take up more of your time if you don't think there's something underlying here, but after spending a lot of time trying to recreate A111 results with Diffusers across different experiments it feels like the controlnet for Diffusers isn't as effective for inpainting. A1111
|
i think the difference comes down to seeds. although A1111's output has worse image compression artifacts. the inpainted mouth looks bad there, too. some kind of image ghosting, lips where they don't belong or something? as opposed to Diffusers... but i don't think it's "much worse results" with Diffusers. am i missing it? i don't have the best eyes. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey, I'm running into the same issue, did you guys found a solution to this small quality difference ? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey folks, I'm getting much worse behavior with Diffusers than A1111 when using ControlNet Inpainting. I'm using the exact same model, seed, inputs, etc. but it's clear the inpainting behavior is very different. Below is one example but I have more if it's helpful.
Lots of artifacts from Diffusers, A1111 essentially just recolors. Thanks for all your help, let me know how else I can be helpful.
Original Image
Diffusers Inpainting
A1111 Inpainting
Diffusers Script:
A111 Settings
red hair Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 478847657, Size: 1024x1024, Model hash: a1535d0a42, Denoising strength: 1, Mask blur: 4, ControlNet 0: "Module: none, Model: control_v11p_sd15_inpaint [ebff9138], Weight: 0.3, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0-2-g4afaaf8a
--
Bonus Example (Top: Diffusers, Bottom: A1111)
The text was updated successfully, but these errors were encountered: