Use `torch` in `get_2d_rotary_pos_embed` #10155

hlky · 2024-12-09T12:02:50Z

What does this PR do?

Refactors get_2d_rotary_pos_embed to use torch instead of numpy, and adds device argument so that tensors can be created on e.g. cuda.

Usage of get_2d_rotary_pos_embed in HunyuanDiT pipelines is updated to pass device.

torch and numpy versions match numerically.

Reproduction

from diffusers.models.embeddings import get_2d_rotary_pos_embed
import torch


def get_resize_crop_region_for_grid(src, tgt_size):
  th = tw = tgt_size
  h, w = src

  r = h / w

  # resize
  if r > 1:
      resize_height = th
      resize_width = int(round(th / h * w))
  else:
      resize_width = tw
      resize_height = int(round(tw / w * h))

  crop_top = int(round((th - resize_height) / 2.0))
  crop_left = int(round((tw - resize_width) / 2.0))

  return (crop_top, crop_left), (crop_top + resize_height, crop_left + resize_width)


height, width = 1024, 1024
height = int((height // 16) * 16)
width = int((width // 16) * 16)
num_attention_heads = 16
attention_head_dim = 88
patch_size = 2
inner_dim = num_attention_heads * attention_head_dim
grid_height = height // 8 // patch_size
grid_width = width // 8 // patch_size
base_size = 512 // 8 // patch_size
grid_crops_coords = get_resize_crop_region_for_grid(
  (grid_height, grid_width), base_size
)
image_rotary_emb_np = get_2d_rotary_pos_embed(
  inner_dim // num_attention_heads,
  grid_crops_coords,
  (grid_height, grid_width),
  output_type="np",
)

image_rotary_emb = get_2d_rotary_pos_embed(
  inner_dim // num_attention_heads,
  grid_crops_coords,
  (grid_height, grid_width),
  output_type="pt",
)

torch.testing.assert_close(image_rotary_emb[0], image_rotary_emb_np[0])
torch.testing.assert_close(image_rotary_emb[1], image_rotary_emb_np[1])

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @yiyixuxu

HuggingFaceDocBuilderDev · 2024-12-09T12:08:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky · 2024-12-10T11:12:05Z

Downstream usage - should be ok, this was already returning torch.Tensor and integrations are handling device casting

yiyixuxu · 2024-12-10T18:58:14Z

did we run hunyuan test?

hlky · 2024-12-10T19:01:00Z

Running now
https://github.com/huggingface/diffusers/actions/runs/12262870196/job/34213155782

hlky · 2024-12-10T19:19:39Z

Checkpoint used in the slow test is 404

diffusers/tests/pipelines/hunyuan_dit/test_hunyuan_dit.py

Line 320 in 49a9143

"XCLiu/HunyuanDiT-0523", revision="refs/pr/2", torch_dtype=torch.float16

https://huggingface.co/XCLiu/HunyuanDiT-0523

yiyixuxu · 2024-12-10T20:00:41Z

just run its docstring example manually would be fine for now
we should update the test too

hlky · 2024-12-10T20:33:42Z

There a slight change to the image.

import torch
from diffusers import HunyuanDiTPipeline
pipe = HunyuanDiTPipeline.from_pretrained(
    "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
)
pipe.to("cuda")
prompt = "An astronaut riding a horse"
image = pipe(prompt, generator=torch.Generator("cuda").manual_seed(0)).images[0]

Main:

PR:

hlky · 2024-12-10T22:55:23Z

Unclear why though, I'll run the test again. Edit: I haven't ran the reproduction on CUDA, might account for the difference.

>>> torch.abs(image_rotary_emb[0].flatten() - image_rotary_emb_np[0].flatten()).max()
tensor(0.)
>>> torch.abs(image_rotary_emb[1].flatten() - image_rotary_emb_np[1].flatten()).max()
tensor(0.)

hlky · 2024-12-10T23:13:36Z

Yes there's a very minor difference when we create the tensors on CUDA.

>>> torch.abs(image_rotary_emb[0].cpu().flatten() - image_rotary_emb_np[0].flatten()).max()
tensor(1.7881e-07)
>>> torch.abs(image_rotary_emb[1].cpu().flatten() - image_rotary_emb_np[1].flatten()).max()
tensor(1.7881e-07)

It's below PyTorch's tolerance for float32 though https://pytorch.org/docs/stable/testing.html

cc @yiyixuxu

hlky · 2024-12-17T16:48:17Z

I've added output_type and a deprecation message as in #10156

yiyixuxu added the close-to-merge label Dec 10, 2024

DN6 added the roadmap Add to current release roadmap label Dec 11, 2024

hlky added 2 commits December 17, 2024 16:39

Use torch in get_2d_rotary_pos_embed

6561450

Add deprecation

f2e7731

hlky force-pushed the np-get-2d-rotary-pos-embed branch from af5ecd9 to f2e7731 Compare December 17, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `torch` in `get_2d_rotary_pos_embed` #10155

Use `torch` in `get_2d_rotary_pos_embed` #10155

hlky commented Dec 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 9, 2024

hlky commented Dec 10, 2024

yiyixuxu commented Dec 10, 2024

hlky commented Dec 10, 2024

hlky commented Dec 10, 2024

yiyixuxu commented Dec 10, 2024

hlky commented Dec 10, 2024

hlky commented Dec 10, 2024 •

edited

Loading

hlky commented Dec 10, 2024

hlky commented Dec 17, 2024

Use torch in get_2d_rotary_pos_embed #10155

Are you sure you want to change the base?

Use torch in get_2d_rotary_pos_embed #10155

Conversation

hlky commented Dec 9, 2024 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Dec 9, 2024

hlky commented Dec 10, 2024

yiyixuxu commented Dec 10, 2024

hlky commented Dec 10, 2024

hlky commented Dec 10, 2024

yiyixuxu commented Dec 10, 2024

hlky commented Dec 10, 2024

hlky commented Dec 10, 2024 • edited Loading

hlky commented Dec 10, 2024

hlky commented Dec 17, 2024

Use `torch` in `get_2d_rotary_pos_embed` #10155

Use `torch` in `get_2d_rotary_pos_embed` #10155

hlky commented Dec 9, 2024 •

edited

Loading

hlky commented Dec 10, 2024 •

edited

Loading