Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aten.native_group_norm.default #538

Open
jdh8 opened this issue Dec 2, 2024 · 6 comments · May be fixed by #556
Open

aten.native_group_norm.default #538

jdh8 opened this issue Dec 2, 2024 · 6 comments · May be fixed by #556
Assignees

Comments

@jdh8
Copy link
Collaborator

jdh8 commented Dec 2, 2024

No description provided.

@swimdi swimdi self-assigned this Dec 3, 2024
@swimdi swimdi linked a pull request Dec 4, 2024 that will close this issue
@swimdi swimdi linked a pull request Dec 4, 2024 that will close this issue
@swimdi
Copy link
Collaborator

swimdi commented Dec 9, 2024

Hi @ayerofieiev-tt now the #556 cannot merged due to test_group_norm.py failed

pytest tests/lowering/normalization/test_group_norm.py

FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape0-32-True] - AssertionError: 0.7651781854059824
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape1-32-True] - AssertionError: 0.7730704167640353
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape2-32-True] - AssertionError: 0.7757072186311633
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape3-32-True] - AssertionError: 0.7739301248305729

And I have the impression that this test can passed before, I don't know why now it failed

And because #556 is implemented based on test_group_norm.py on tt-metal, so I also check it, and find this case is also failed

cd tt-metal
pytest tests/ttnn/unit_tests/operations/test_group_norm.py

FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_height_sharded[num_groups=32-W=32-H=32-C=320-N=1] - AssertionError: 0.6637590850804874
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=2-C=320-H=64-W=64-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984854165794627
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=640-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983983472927303
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=640-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984879175666398
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984376872323262
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984574303213386
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1280-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983521590058813
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1280-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984879110816785
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1920-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983146229245102
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1920-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984574283259387
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=2560-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983983458326283
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=1280-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7648465865679823
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=1280-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7683442917280272
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=2560-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7650540714895019

I not sure why tt-metal CI not check this test, can you help double check test_group_norm.py on tt-metal?

@bbradelTT
Copy link

These seem to be passing in CI all post commit:
https://github.com/tenstorrent/tt-metal/actions/runs/12248006217

I found one of them in:
94_ttnn-unit-tests (wormhole_b0, N150) _ ttnn group 1 wormhole_b0 N150.txt

2024-12-10T03:03:26.7620904Z PASSED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}]

@bbradelTT
Copy link

I found if I ran locally

  • with Debug build there were failures
  • with Releease build everything passed

@bbradelTT
Copy link

I created tenstorrent/tt-metal#15868
I'm not sure when that can be looked at.

As a workaround, you would need to use a release build.

@swimdi
Copy link
Collaborator

swimdi commented Dec 11, 2024

Hi @bbradelTT :
Okok, no wonder I have impression that the case can passed before, I use python package from ./build_metal.sh can reproduce the success (pytest lowering/normalization/test_group_norm.py)

But because pytorch2.0_ttnn is using ttnn from wheel package

https://github.com/tenstorrent/tt-metal/releases/download/v0.53.1-rc15/metal_libs-0.53.1rc15+wormhole.b0-cp38-cp38-linux_x86_64.whl

So I can't decide to use debug or release build, so I think I can't apply this workaround of your suggestion (now the wheel package still failed)

BTW, even I use the release build, these input variation is also failed

#555

You can reproduce by pytest tests/lowering/normalization/test_group_norm.py --runxfail or run these code

import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc

device = ttnn.open_device(device_id=0)

inputs = [
    (1, 320, 32, 32, 32),# pass
    (1, 1280, 16, 16, 32),# pass
    (2, 320, 64, 64, 32),# pass
    (1, 1280, 1, 512, 32),# pass
    (1, 1280, 8, 8, 32), # failed
    (1, 2560, 8, 8, 32), # failed
    (1, 256, 50, 68, 32), # failed
    (1, 256, 25, 34, 32), # failed
    (1, 256, 13, 17, 32), # failed
    (1, 256, 7, 9, 32), # failed
]

for input in inputs:
    N, C, H, W, num_groups = input

    grid_size = ttnn.CoreGrid(y=8, x=8)

    # torch input tensor
    torch_input_tensor = torch.rand((N, C, H, W), dtype=torch.bfloat16)
    torch_weight = torch.ones((C,), dtype=torch.bfloat16)
    torch_bias = torch.rand((C,), dtype=torch.bfloat16)
    torch_output_tensor = torch.nn.functional.group_norm(
        torch_input_tensor, num_groups, weight=torch_weight, bias=torch_bias
    )
    torch_output_tensor = torch_output_tensor.permute(0, 2, 3, 1).view(N, 1, W * H, C)

    # input tensor
    input_tensor = torch_input_tensor.permute(0, 2, 3, 1).view(N, 1, W * H, C)
    input_tensor = ttnn.from_torch(
        input_tensor,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # input mask
    input_mask_tensor = ttnn.create_group_norm_input_mask(C, num_groups, grid_size.y)
    input_mask_tensor = ttnn.from_torch(
        input_mask_tensor,
        dtype=ttnn.DataType.BFLOAT8_B,
        layout=ttnn.TILE_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # gamma/beta
    gamma = ttnn.create_group_norm_weight_bias_rm(torch_weight, C, grid_size.y)
    beta = ttnn.create_group_norm_weight_bias_rm(torch_bias, C, grid_size.y)

    gamma_t = ttnn.from_torch(
        gamma,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )
    beta_t = ttnn.from_torch(
        beta,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # shard config
    grid_coord = ttnn.CoreCoord(grid_size.x - 1, grid_size.y - 1)
    shard_grid = ttnn.CoreRangeSet({ttnn.CoreRange(ttnn.CoreCoord(0, 0), grid_coord)})
    shard_shape = N * H * W // grid_size.x, C // grid_size.y
    shard_spec = ttnn.ShardSpec(shard_grid, shard_shape, ttnn.ShardOrientation.COL_MAJOR, False)
    sharded_mem_config = ttnn.MemoryConfig(
        ttnn.types.TensorMemoryLayout.BLOCK_SHARDED, ttnn.types.BufferType.L1, shard_spec
    )
    input_tensor = ttnn.to_memory_config(input_tensor, sharded_mem_config)

    # groupnorm
    output_tensor = ttnn.group_norm(
        input_tensor,
        num_groups=num_groups,
        input_mask=input_mask_tensor,
        weight=gamma_t,
        bias=beta_t,
        memory_config=sharded_mem_config,
        core_grid=grid_size,
        inplace=False,
    )

    # output tensor
    output_tensor = ttnn.to_memory_config(output_tensor, ttnn.L1_MEMORY_CONFIG)
    output_tensor = ttnn.from_device(output_tensor)
    output_tensor = ttnn.to_torch(output_tensor)

    assert_with_pcc(torch_output_tensor, output_tensor, 0.9997)

@swimdi swimdi moved this from Todo to In Progress in PyTorch 2.0 TT-NN Compiler Dec 12, 2024
@bbradelTT
Copy link

I have a PR up for review: tenstorrent/tt-metal#16093
If you'd like you could try to see if that unblocks you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants