Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Flux IP Adapter #10261

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Support Flux IP Adapter #10261

wants to merge 11 commits into from

Conversation

hlky
Copy link
Collaborator

@hlky hlky commented Dec 17, 2024

What does this PR do?

Adds support for XLabs Flux IP Adapter.

Example

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = load_image("assets_statue.jpg").resize((1024, 1024))

pipe.load_ip_adapter("XLabs-AI/flux-ip-adapter", weight_name="ip_adapter.safetensors", image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14")
pipe.set_ip_adapter_scale(1.0)

image = pipe(
    width=1024,
    height=1024,
    prompt='wearing sunglasses',
    negative_prompt="",
    true_cfg=4.0,
    generator=torch.Generator().manual_seed(4444),
    ip_adapter_image=image,
).images[0]

image.save('flux_ipadapter_4444.jpg')

Input Output
assets_statue flux_ipadapter_4444

flux-ip-adapter-v2

Details

Note: true_cfg=1.0 is important, and strength is sensitive, fixed strength may not work, see here for more strength schedules, good results will require experimentation with strength schedules and the start/stop values. Results also vary with input image, I had no success with the statue image used for v1 test.

Multiple input images is not yet supported (dev note: apply torch.mean to the batch of image_embeds and to ip_attention)

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = load_image("monalisa.jpg").resize((1024, 1024))

pipe.load_ip_adapter("XLabs-AI/flux-ip-adapter-v2", weight_name="ip_adapter.safetensors", image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14")

def LinearStrengthModel(start, finish, size):
    return [
        (start + (finish - start) * (i / (size - 1))) for i in range(size)
    ]

ip_strengths = LinearStrengthModel(0.3, 0.92, 19)
pipe.set_ip_adapter_scale(ip_strengths)

image = pipe(
    width=1024,
    height=1024,
    prompt='wearing red sunglasses, golden chain and a green cap',
    negative_prompt="",
    true_cfg=1.0,
    generator=torch.Generator().manual_seed(0),
    ip_adapter_image=image,
).images[0]

image.save('result.jpg')

Input Output
monalisa result (13)

Notes

  • XLabs Flux IP Adapter produces bad results when used without CFG
    • Verifiable in original codebase, set --timestep_to_start_cfg greater than the number of steps to disable CFG
  • XLabs Flux IP Adapter also produces bad results when used with CFG in a batch (negative and positive concat)
  • This PR copies (most) of the changes from our pipeline_flux_with_cfg community example, except we run positive and negative separately.
  • Conversion script is optional, original weights will be converted on-the-fly from load_ip_adapter.
  • load_ip_adapter supports image_encoder_pretrained_model_name_or_path e.g. "openai/clip-vit-large-patch14" rather than just image_encoder_folder, also supports image_encoder_dtype with default torch.float16.
  • This required some changes to FluxTransformerBlock because of where ip_attention is applied to the hidden_states, see here in the original codebase.
  • flux-ip-adapter-v2 will be fixed and tested shortly.

Fixes #9825

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @yiyixuxu @DN6

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hlky hlky added the roadmap Add to current release roadmap label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Support IPAdapters for FLUX pipelines
2 participants