Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline] Add LEDITS++ pipelines #6074

Merged
merged 56 commits into from
Mar 13, 2024
Merged

Conversation

manuelbrack
Copy link
Contributor

@manuelbrack manuelbrack commented Dec 6, 2023

What does this PR do?

We add a set of pipelines implementing the LEDITS++ image editing method. We provide an implementation for StableDiffusion, SD-XL and DeepFloyd-IF.
Additionally, we made some minor adjustments to the DPM-Solver scheduler to support image inversion.

There are still some obvious TODOs left that we would appreciate some help with:

  • Finalize IF implementation
  • Adjust input checks to new, image editing setting (e.g. fail gracefully if inference is called before inversion)
  • Add documentation
  • Write unit tests

@patrickvonplaten, @apolinario, @linoytsaban you should all be able to commit to our fork. Would be great if you could help @kathath and I out a bit 😄

Who can review?

@patrickvonplaten

@manuelbrack manuelbrack changed the title Add LEDITS++ pipelines [WIP] [Pipeline] Add LEDITS++ pipelines Dec 6, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@patrickvonplaten patrickvonplaten requested a review from DN6 February 13, 2024 15:41
@patrickvonplaten
Copy link
Contributor

Can we give this another review (cc @yiyixuxu and also maybe @DN6 )

@manuelbrack manuelbrack requested a review from yiyixuxu February 13, 2024 16:34
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two questions:

  1. For edit_warmup_steps, edit_guidance_scale, edit_threshold, reverse_editing_direction, edit_cooldown_steps: do we really need to allow all of these argument to be list? if so let's make sure that if they are passed as int, we convert them into list of expected length so we do not need these if ... else.. statement inside the noise edit loop
  2. we should refactor the noise edit loop here. That means you will need to create a few utility functions or methods on the pipeline
for c, ( noise_pred_edit_concept, ... ) in enumerate(zip(noise_pred_edit_concepts, ... )):
    if i  >= edit_warmup_steps_c and (edit_cooldown_steps_c is None or i < edit_cooldown_steps_c):
        ... 
        if user_mask is not None:
            noise_guidance_edit_tmp = noise_guidance_edit_tmp * user_mask

        if use_cross_attn_mask:
            noise_guidance_edit_tmp, attn_mask = self.apply_cross_attn_mask(...)
        
            if use_intersedt_mask:
                noise_guidance_edit_tmp = noise_guidance_edit_tmp * attn_mask
            else:
                noise_guidance_edit_tmp = self.apply_intersect_mask(noise_guidance_edit_tmp, attn_mask)
        
        else:
            noise_guidance_edit_tmp = make_up_a_name_here()

Comment on lines +390 to +393
scheduler = DPMSolverMultistepScheduler.from_config(
scheduler.config, algorithm_type="sde-dpmsolver++", solver_order=2
)
logger.warning(
Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's:

  1. throw an error or warning here
  2. update the doc string examples to explicitly use DPM or ddim scheduler
pipe = LEditsPPPipelineStableDiffusion.from_pretrained()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
                scheduler.config, algorithm_type="sde-dpmsolver++", solver_order=2
            )
...

clip_skip: Optional[int] = None,
callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**kwargs,

I don't think this is used

Comment on lines +908 to +911
if use_intersect_mask:
use_cross_attn_mask = True

if use_cross_attn_mask:
Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. got it now. let's pass these two arguments to check_inputs

if user_intersect_mask and not use_cross_attn_mask:
    raise ValueError("...")

I would be ok to update the argument and throw a warning here too if that's what you prefer


latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)

text_embed_input = text_embeddings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text_embed_input = text_embeddings

text_embed_input = text_embeddings

# predict the noise residual
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embed_input).sample
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embed_input).sample
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample

Comment on lines +975 to +978
if isinstance(edit_warmup_steps, list):
edit_warmup_steps_c = edit_warmup_steps[c]
else:
edit_warmup_steps_c = edit_warmup_steps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make sure edit_warmup_steps is a list and remove the if else here

Suggested change
if isinstance(edit_warmup_steps, list):
edit_warmup_steps_c = edit_warmup_steps[c]
else:
edit_warmup_steps_c = edit_warmup_steps

Comment on lines +982 to +994
if isinstance(edit_guidance_scale, list):
edit_guidance_scale_c = edit_guidance_scale[c]
else:
edit_guidance_scale_c = edit_guidance_scale

if isinstance(edit_threshold, list):
edit_threshold_c = edit_threshold[c]
else:
edit_threshold_c = edit_threshold
if isinstance(reverse_editing_direction, list):
reverse_editing_direction_c = reverse_editing_direction[c]
else:
reverse_editing_direction_c = reverse_editing_direction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(edit_guidance_scale, list):
edit_guidance_scale_c = edit_guidance_scale[c]
else:
edit_guidance_scale_c = edit_guidance_scale
if isinstance(edit_threshold, list):
edit_threshold_c = edit_threshold[c]
else:
edit_threshold_c = edit_threshold
if isinstance(reverse_editing_direction, list):
reverse_editing_direction_c = reverse_editing_direction[c]
else:
reverse_editing_direction_c = reverse_editing_direction

Comment on lines +996 to +1001
if isinstance(edit_cooldown_steps, list):
edit_cooldown_steps_c = edit_cooldown_steps[c]
elif edit_cooldown_steps is None:
edit_cooldown_steps_c = i + 1
else:
edit_cooldown_steps_c = edit_cooldown_steps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(edit_cooldown_steps, list):
edit_cooldown_steps_c = edit_cooldown_steps[c]
elif edit_cooldown_steps is None:
edit_cooldown_steps_c = i + 1
else:
edit_cooldown_steps_c = edit_cooldown_steps

if self.sem_guidance is None:
self.sem_guidance = torch.zeros((len(timesteps), *noise_pred_uncond.shape))

for c, noise_pred_edit_concept in enumerate(noise_pred_edit_concepts):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to refactor this loop to have below structure:

for c, (
    noise_pred_edit_concept, edit_warmup_steps_c, edit_guidance_scale_c,edit_threshold_c, reverse_editing_direction_c, edit_cooldown_steps_c) in enumerate(zip(noise_pred_edit_concepts, edit_warmup_steps, edit_guidance_scale, edit_threshold, reverse_editing_direction, edit_cooldown_steps )):
    if i  >= edit_warmup_steps_c and (edit_cooldown_steps_c is None or i < edit_cooldown_steps_c):
        if reverse_editing_direction_c:
            noise_guidance_edit_tmp = noise_guidance_edit_tmp * -1

        noise_guidance_edit_tmp = noise_guidance_edit_tmp * edit_guidance_scale_c

        if user_mask is not None:
            noise_guidance_edit_tmp = noise_guidance_edit_tmp * user_mask

        if use_cross_attn_mask:
            noise_guidance_edit_tmp, attn_mask = self.apply_cross_attn_mask(...)
        
            if use_intersedt_mask:
                noise_guidance_edit_tmp = noise_guidance_edit_tmp * attn_mask
            else:
                noise_guidance_edit_tmp = self.apply_intersect_mask(noise_guidance_edit_tmp, attn_mask)
        
        else:
            noise_guidance_edit_tmp = make_up_a_name_here()

Comment on lines +1003 to +1004
if i >= edit_cooldown_steps_c:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the two continue let's do

if i >= edit_warmup_steps_c and i < edit_cooldown_steps_c:
    ...

@@ -236,6 +237,23 @@ def step_index(self):
"""
return self._step_index

@property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't make these changes, no?
why are they showing up here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange. Because this is not different from the current main branch in diffusers. So it isn't a change at all.

@linoytsaban linoytsaban merged commit 00eca4b into huggingface:main Mar 13, 2024
15 checks passed
@vladmandic
Copy link
Contributor

vladmandic commented Mar 13, 2024

this looks very promising, few comments to polish it for production...

  • cannot make either sd or sdxl pipelines work with model cpu offloading
    as it doesnt pull text_encoder from cpu to gpu on invert()
  • LEditsPPPipelineStableDiffusionXL fails in invert().encode_image when running in fp16
    since VAE by default gets upcast causing float vs half runtime error
    workaround: pipe.vae.config.force_upcast = False
  • pipelines do not work with GPU torch.Generator()
    since tensor never gets moved to same device as generator created noise:
    xts[idx] = self.scheduler.add_noise(x0, noise, torch.Tensor([t]))
  • SDXL pipeline does not take width/height/clip_skip args like SD pipeline does
  • both pipelines do not set param pipe.num_timesteps which makes it hard to check progress in callbacks, etc.
    should be as simple as pipe.num_timesteps = len(pipe.inversion_steps)
  • guidance_rescale > 0 causes runtime error since noise_pred_edit_concepts is a tuple,
    not ndarray, so .mean() does not work
  • if source_guidance_scale is <=1, it causes runtime errors instead of disabling guidance
  • math for edit_cooldown_steps seems strange as it uses absolute values, not relative to timesteps
  • generally, not sure why explicit invert() is even needed,
    it can easily be combined in main pipeline __call__ method
    especially since its output is discarded and it sets values on pipeline directly
  • there is a note in docs https://huggingface.co/docs/diffusers/main/en/api/pipelines/ledits_pp
    that dpm++ 2m in diffusers is sub-optimal, but no info on why or example outputs?

@yiyixuxu
Copy link
Collaborator

thanks for the feedback! @vladmandic

@manuelbrack can we address them in the refactor PR too?

@manuelbrack manuelbrack mentioned this pull request Mar 14, 2024
2 tasks
@vladmandic
Copy link
Contributor

one more item:

  • pipeline fails in aggregate_attention on non-standard input image sizes, for example 872x512 or even simple 560x560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants