Add Immiscible Noise algorithm #1395

v0xie · 2024-06-28T06:58:22Z

This PR implements the algorithm in "Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment" (2024, Li et al.) https://arxiv.org/abs/2406.12303

The algorithm modifies the latents before noise is added to project training images onto only nearby noise. This is supposed to speed up convergence time and capture more fine detail in the trained model.
There is an noise assignment operation that is supposed to add some overhead to training time, but the paper describes it only adding 22.8ms when training with batch size of 1024.
Use by adding argument --immiscible_noise.

2024/06/27 - Outdated results - Expand for more

Here are some experimental results trained on the "monster_toy" dataset from the Dreambooth repository (https://github.com/google/dreambooth/blob/main/dataset/monster_toy/00.jpg). Keep in mind the dataset is only 5 images, so by Epoch 30 the model is already starting to be overtrained.

Training with Huber loss:
Training with no Huber loss:
The loss/epoch graph looks like the FID/Training Steps graphs from the paper:

Thank you for your consideration!

araleza · 2024-06-29T15:22:01Z

This sounds interesting. I fetched your branch, and ran one of my standard training runs (110 images, mostly high quality/resolution, with decent captions) at these learning rates:

Tenc: 1e-10
Unet: 1e-7
Batch size: 4
Loss: Huber
Format: fp32

Those are very slow learning rates, but the images still became 'wobbly' almost immediately, and even after 1500 iterations, it hadn't recovered:

Do other people see something similar?

~~I made a new build, and there's some new warning popping up:~~
sd-scripts-immi/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv2d(input, weight, bias, self.stride,
~~Seems to be a torchvision vs. torch vs. xformers version issue. I don't think that warning had any effect though, so I doubt the wobbly renders are caused by that.~~

Edit: Re-ran the same training run without --immiscible_noise and the images were sharp again, so the low quality images I saw are associated with --immiscible_noise, and not that cudnn warning.

feffy380 · 2024-06-30T07:53:44Z

@v0xie Your loss graph says these were trained with batch size 1, so there's nothing to assign. The fact that it's still affecting the loss tells me something is wrong with the implementation.

feffy380 · 2024-06-30T08:12:36Z

The immiscible noise is supposed to replace the original random noise, but the code is adding both to the latents.
The result is that the returned noisy_latents has twice as much noise as intended.

Based on the paper, we only need to:

Generate a batch of noise, preferably n >> 1
- They show you get better matching noise with larger batch sizes. We can't always use the latent batch size because for most users that's quite small, so you will never get a good match. At the very least it must be more than 1 to be able to perform matching at all.
Find similar noise-latent pairs
Use this noise to replace noise = torch.randn_like(latents).

Something like this (I don't know if my distance calculation is efficient but it does work in fp16):

 def get_noise_noisy_latents_and_timesteps(args, noise_scheduler, latents):
     # Sample noise that we'll add to the latents
-    noise = torch.randn_like(latents, device=latents.device)
+    if args.immiscible_diffusion:
+        # Immiscible Diffusion https://arxiv.org/abs/2406.12303
+        from scipy.optimize import linear_sum_assignment
+        n = args.immiscible_diffusion # arg is an integer for how many noise tensors to generate
+        size = [n] + list(latents.shape[1:])
+        noise = torch.randn(size, dtype=latents.dtype, layout=latents.layout, device=latents.device)
+        # find similar latent-noise pairs
+        latents_expanded = latents.half().unsqueeze(1).expand(-1, n, *latents.shape[1:])
+        noise_expanded = noise.half().unsqueeze(0).expand(latents.shape[0], *noise.shape)
+        dist = (latents_expanded - noise_expanded)**2
+        dist = dist.mean(list(range(2, dist.dim()))).cpu()
+        noise = noise[linear_sum_assignment(dist)[1]]
+    else:
+        noise = torch.randn_like(latents, device=latents.device)

araleza · 2024-06-30T09:11:08Z

Hey @feffy380, my first impression is that your code seems to be working. I set n = 32 for my first run with it (cause I hadn't read the bit in the paper where they recommend 1024 at that point), and I think I saw quality improvements even at that low level. I'm restarting a new run with n = 1024 now. Maybe make the default just be 1024, so people don't need to know what value to pass in?

One thing I noticed is that even though my training images are all real-world images, the sample renders continue to show cartoon-styled images longer than usual. I saw one even at iteration 550. I don't think that's an issue, it looks like it'll learn to stop doing that, but I found it interesting to note. (I stopped at iteration 650, so I don't know if I'd have gotten any more cartoon-style samples)

v0xie · 2024-06-30T12:32:54Z

Thank you for testing @araleza, and thank you for the detailed review @feffy380!

I incorporated the suggested changes and I'm running some tests now.

--immiscible_noise is now an integer argument which represents the size of batch of random noise to generate.
- Ex: --immiscible_noise=1024 for a batch size of 1024.

v0xie · 2024-06-30T12:41:12Z

I've been training with ip_noise_gamma=0.1 this whole time. Ran some tests without it to see what that's like.

**Dropdown for more images**

araleza · 2024-06-30T20:57:39Z

My test run with noise batch size 1024 has reached 11000 iterations now with feffy380's code (I haven't tried the new updated version from v0xie yet), and it's looking good.

My sample images look different in quality (better lighting, and fewer facial distortions on the difficult training images) to how they usually look without the immiscible noise parameter set. I'd like to try more training runs at different learning rates to be more confident, but as far as I can tell, this is a positive change.

araleza · 2024-07-02T21:01:52Z

Hi, so I grabbed the latest code in your branch again, @v0xie . I'm still seeing lots of very noisy, damaged images. When I look at the code, it seems there are two sections, the part that feffy380 wrote, and a second section that looks like this:

def immiscible_diffusion(args, noise_scheduler, latents, noise, timesteps):
    # "Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment" (2024) Li et al. arxiv.org/abs/2406.12303
    batch_size, _, _, _= latents.shape
    alpha_t = noise_scheduler.alphas.to(timesteps.device)
    alpha_t = alpha_t[timesteps]
    alpha_t = alpha_t.view(batch_size, 1, 1, 1)
    sqrt_alpha_t = torch.sqrt(alpha_t)
    sqrt_one_minus_alpha_t = torch.sqrt(1 - alpha_t)
    x_t_b = sqrt_alpha_t * latents + sqrt_one_minus_alpha_t * noise
    return x_t_b

[...]

    if args.immiscible_noise:
        latents = immiscible_diffusion(args, noise_scheduler, latents, noise, timesteps)

If I comment out the call to immiscible_diffusion() - which still leaves the call to immiscible_diffusion_get_noise() in the code - then the noisy corruption on the images goes away.

Looking at the paper you've linked, I can see why you added that second call. But I think there must be a bug in that implementation. :(

@feffy380: I've now done lots of runs with just that section of code you provided in place. These are the BEST runs of sdxl training that I've done to date. The quality gains are amazing - it's like a new model. And thanks go to @v0xie for finding this great paper.

feffy380 · 2024-07-02T22:33:34Z

    if args.immiscible_noise:
        latents = immiscible_diffusion(args, noise_scheduler, latents, noise, timesteps)
If I comment out the call to immiscible_diffusion() - which still leaves the call to immiscible_diffusion_get_noise() in the code - then the noisy corruption on the images goes away.

Like I said before, adding noise to the latents like this is wrong because the noise_scheduler already does that a few lines later. You get noisy results because the latents now have 2x noise, but the unet is only removing 1x noise. The extra noise has effectively become part of the ground truth, which completely corrupts the dataset.

araleza · 2024-07-02T22:39:32Z

@feffy380, I think that call is there to try to implement step 3 in this part of the paper:

Is there some other way of doing that step that might be correct, and better than just picking the closest noise to the current latent?

v0xie · 2024-07-02T23:52:40Z

You're absolutely correct about the double noise add @feffy380. Removed it and it's much improved. What's funny is that even with the double noise add I was getting pretty good results, which might speak to the effectiveness of this method.

Results after removing the double noise add; also trained a test with immiscible_noise=4096, which didn't add any noticeable delay to training, at least at 512^2.

feffy380 · 2024-07-04T16:20:38Z

@araleza Step 3 is adding noise to the latents, which is what noise_scheduler.add_noise(latents, noise, timesteps) already does. That's why we assign to the noise variable

araleza · 2024-07-04T16:30:17Z

@feffy380, thanks for helping me understand; I don't have a very strong knowledge of pytorch commands. The bit that confuses me still though is that the code that's now been removed has this section:

    sqrt_alpha_t = torch.sqrt(alpha_t)
    sqrt_one_minus_alpha_t = torch.sqrt(1 - alpha_t)
    x_t_b = sqrt_alpha_t * latents + sqrt_one_minus_alpha_t * noise

And that looks exactly like Step 3 in the paper:

But the bit we've kept doesn't have anything that looks like that equation. So how come it still works? Does the code section that's still around (i.e. the immiscible_diffusion_get_noise() function) implement that function with the two square roots in some way that isn't so obviously written out explicitly?

Edit: Or maybe those square roots are inside noise_scheduler.add_noise()?

78752 · 2024-07-05T02:17:06Z

After doing some testing I'm actually getting consistently slightly worse results with the latest iteration of this PR compared to 7b487ce. Certain high frequency details that appeared consistently with the original are lost when reusing the same settings and dataset. Not really sure why.

Clybius · 2024-11-03T14:26:24Z

Figure I'll put this out there since there appears to have been an update for immiscible diffusion (v2 on the arxiv?), along with code examples. I've gotten it simplified down for a single-process use-case (I think this works as intended?)

A notable change is the distance calculation, which seems to be rather different. By any means, it worked rather well on a test run, so I felt the need to share.

# https://github.com/yhli123/Immiscible-Diffusion/blob/main/stable_diffusion/conditional_ft_train_sd.py#L941
def immiscible_diffusion_get_noise_v2(latents, n = None):
    """
    Generates noise for immiscible diffusion, simplified for single process.
    """

    with torch.no_grad():
        batch_size = latents.shape[0] if n is None else n
        size = [batch_size] + list(latents.shape[1:])
        noise = torch.randn(size, dtype=latents.dtype, layout=latents.layout, device=latents.device) # [B, C, H, W]

        # Distance calculation
        distance = torch.linalg.vector_norm(
            0.10 * latents.to(torch.float16).flatten(start_dim=1).unsqueeze(1) -
            0.10 * noise.to(torch.float16).flatten(start_dim=1).unsqueeze(0),
            dim=2
        )  # [B, B]

        _, col_ind = linear_sum_assignment(distance.cpu().numpy())
        noise = noise[col_ind].to(latents.device)  # Assign the permuted noise

    return noise

In get_noise_noisy_latents_and_timesteps (or your model-specific noisy latent function,) replace the noise variable with a call to immiscible_diffusion_get_noise_v2.

feat: immiscible noise algorithm from arXiv.2406.12303

dd4c8a6

v0xie added 3 commits June 30, 2024 04:27

revise immiscible noise implementation (credit to @feffy380)

9708c5c

update argument name

53eb87a

refactor: re-implement immiscible noise latent assignment

7b487ce

fix: remove extra noise add

3f3a8cf

kohya-ss added the enhancement New feature or request label Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Immiscible Noise algorithm #1395

Add Immiscible Noise algorithm #1395

v0xie commented Jun 28, 2024 •

edited

Loading

araleza commented Jun 29, 2024 •

edited

Loading

feffy380 commented Jun 30, 2024

feffy380 commented Jun 30, 2024 •

edited

Loading

araleza commented Jun 30, 2024 •

edited

Loading

v0xie commented Jun 30, 2024

v0xie commented Jun 30, 2024 •

edited

Loading

araleza commented Jun 30, 2024

araleza commented Jul 2, 2024

feffy380 commented Jul 2, 2024 •

edited

Loading

araleza commented Jul 2, 2024 •

edited

Loading

v0xie commented Jul 2, 2024

feffy380 commented Jul 4, 2024

araleza commented Jul 4, 2024 •

edited

Loading

78752 commented Jul 5, 2024 •

edited

Loading

Clybius commented Nov 3, 2024

Add Immiscible Noise algorithm #1395

Are you sure you want to change the base?

Add Immiscible Noise algorithm #1395

Conversation

v0xie commented Jun 28, 2024 • edited Loading

Thank you for your consideration!

araleza commented Jun 29, 2024 • edited Loading

feffy380 commented Jun 30, 2024

feffy380 commented Jun 30, 2024 • edited Loading

araleza commented Jun 30, 2024 • edited Loading

v0xie commented Jun 30, 2024

v0xie commented Jun 30, 2024 • edited Loading

araleza commented Jun 30, 2024

araleza commented Jul 2, 2024

feffy380 commented Jul 2, 2024 • edited Loading

araleza commented Jul 2, 2024 • edited Loading

v0xie commented Jul 2, 2024

feffy380 commented Jul 4, 2024

araleza commented Jul 4, 2024 • edited Loading

78752 commented Jul 5, 2024 • edited Loading

Clybius commented Nov 3, 2024

v0xie commented Jun 28, 2024 •

edited

Loading

araleza commented Jun 29, 2024 •

edited

Loading

feffy380 commented Jun 30, 2024 •

edited

Loading

araleza commented Jun 30, 2024 •

edited

Loading

v0xie commented Jun 30, 2024 •

edited

Loading

feffy380 commented Jul 2, 2024 •

edited

Loading

araleza commented Jul 2, 2024 •

edited

Loading

araleza commented Jul 4, 2024 •

edited

Loading

78752 commented Jul 5, 2024 •

edited

Loading