aten.native_group_norm.default #538

jdh8 · 2024-12-02T05:08:20Z

No description provided.

swimdi · 2024-12-09T03:49:45Z

Hi @ayerofieiev-tt now the #556 cannot merged due to test_group_norm.py failed

pytest tests/lowering/normalization/test_group_norm.py

FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape0-32-True] - AssertionError: 0.7651781854059824
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape1-32-True] - AssertionError: 0.7730704167640353
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape2-32-True] - AssertionError: 0.7757072186311633
FAILED tests/lowering/normalization/test_group_norm.py::test_group_norm[input_shape3-32-True] - AssertionError: 0.7739301248305729

And I have the impression that this test can passed before, I don't know why now it failed

And because #556 is implemented based on test_group_norm.py on tt-metal, so I also check it, and find this case is also failed

cd tt-metal
pytest tests/ttnn/unit_tests/operations/test_group_norm.py

FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_height_sharded[num_groups=32-W=32-H=32-C=320-N=1] - AssertionError: 0.6637590850804874
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=2-C=320-H=64-W=64-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984854165794627
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=640-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983983472927303
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=640-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984879175666398
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984376872323262
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984574303213386
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1280-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983521590058813
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1280-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984879110816785
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1920-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983146229245102
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=1920-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7984574283259387
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=2560-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7983983458326283
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=1280-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7648465865679823
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=1280-H=1-W=2048-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7683442917280272
FAILED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout[N=1-C=2560-H=1-W=512-num_groups=32-device_params={'l1_small_size': 0}] - AssertionError: 0.7650540714895019

I not sure why tt-metal CI not check this test, can you help double check test_group_norm.py on tt-metal?

bbradelTT · 2024-12-10T12:57:27Z

These seem to be passing in CI all post commit:
https://github.com/tenstorrent/tt-metal/actions/runs/12248006217

I found one of them in:
94_ttnn-unit-tests (wormhole_b0, N150) _ ttnn group 1 wormhole_b0 N150.txt

2024-12-10T03:03:26.7620904Z PASSED tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=1-C=960-H=1-W=4096-num_groups=32-device_params={'l1_small_size': 0}]

bbradelTT · 2024-12-10T16:04:31Z

I found if I ran locally

with Debug build there were failures
with Releease build everything passed

bbradelTT · 2024-12-10T16:21:11Z

I created tenstorrent/tt-metal#15868
I'm not sure when that can be looked at.

As a workaround, you would need to use a release build.

swimdi · 2024-12-11T07:23:15Z

Hi @bbradelTT :
Okok, no wonder I have impression that the case can passed before, I use python package from ./build_metal.sh can reproduce the success (pytest lowering/normalization/test_group_norm.py)

But because pytorch2.0_ttnn is using ttnn from wheel package

pytorch2.0_ttnn/requirements.txt

Line 9 in 95332f8

    
           https://github.com/tenstorrent/tt-metal/releases/download/v0.53.1-rc15/metal_libs-0.53.1rc15+wormhole.b0-cp38-cp38-linux_x86_64.whl

So I can't decide to use debug or release build, so I think I can't apply this workaround of your suggestion (now the wheel package still failed)

BTW, even I use the release build, these input variation is also failed

#555

You can reproduce by pytest tests/lowering/normalization/test_group_norm.py --runxfail or run these code

import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc

device = ttnn.open_device(device_id=0)

inputs = [
    (1, 320, 32, 32, 32),# pass
    (1, 1280, 16, 16, 32),# pass
    (2, 320, 64, 64, 32),# pass
    (1, 1280, 1, 512, 32),# pass
    (1, 1280, 8, 8, 32), # failed
    (1, 2560, 8, 8, 32), # failed
    (1, 256, 50, 68, 32), # failed
    (1, 256, 25, 34, 32), # failed
    (1, 256, 13, 17, 32), # failed
    (1, 256, 7, 9, 32), # failed
]

for input in inputs:
    N, C, H, W, num_groups = input

    grid_size = ttnn.CoreGrid(y=8, x=8)

    # torch input tensor
    torch_input_tensor = torch.rand((N, C, H, W), dtype=torch.bfloat16)
    torch_weight = torch.ones((C,), dtype=torch.bfloat16)
    torch_bias = torch.rand((C,), dtype=torch.bfloat16)
    torch_output_tensor = torch.nn.functional.group_norm(
        torch_input_tensor, num_groups, weight=torch_weight, bias=torch_bias
    )
    torch_output_tensor = torch_output_tensor.permute(0, 2, 3, 1).view(N, 1, W * H, C)

    # input tensor
    input_tensor = torch_input_tensor.permute(0, 2, 3, 1).view(N, 1, W * H, C)
    input_tensor = ttnn.from_torch(
        input_tensor,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # input mask
    input_mask_tensor = ttnn.create_group_norm_input_mask(C, num_groups, grid_size.y)
    input_mask_tensor = ttnn.from_torch(
        input_mask_tensor,
        dtype=ttnn.DataType.BFLOAT8_B,
        layout=ttnn.TILE_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # gamma/beta
    gamma = ttnn.create_group_norm_weight_bias_rm(torch_weight, C, grid_size.y)
    beta = ttnn.create_group_norm_weight_bias_rm(torch_bias, C, grid_size.y)

    gamma_t = ttnn.from_torch(
        gamma,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )
    beta_t = ttnn.from_torch(
        beta,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.DRAM_MEMORY_CONFIG,
    )

    # shard config
    grid_coord = ttnn.CoreCoord(grid_size.x - 1, grid_size.y - 1)
    shard_grid = ttnn.CoreRangeSet({ttnn.CoreRange(ttnn.CoreCoord(0, 0), grid_coord)})
    shard_shape = N * H * W // grid_size.x, C // grid_size.y
    shard_spec = ttnn.ShardSpec(shard_grid, shard_shape, ttnn.ShardOrientation.COL_MAJOR, False)
    sharded_mem_config = ttnn.MemoryConfig(
        ttnn.types.TensorMemoryLayout.BLOCK_SHARDED, ttnn.types.BufferType.L1, shard_spec
    )
    input_tensor = ttnn.to_memory_config(input_tensor, sharded_mem_config)

    # groupnorm
    output_tensor = ttnn.group_norm(
        input_tensor,
        num_groups=num_groups,
        input_mask=input_mask_tensor,
        weight=gamma_t,
        bias=beta_t,
        memory_config=sharded_mem_config,
        core_grid=grid_size,
        inplace=False,
    )

    # output tensor
    output_tensor = ttnn.to_memory_config(output_tensor, ttnn.L1_MEMORY_CONFIG)
    output_tensor = ttnn.from_device(output_tensor)
    output_tensor = ttnn.to_torch(output_tensor)

    assert_with_pcc(torch_output_tensor, output_tensor, 0.9997)

bbradelTT · 2024-12-17T12:54:22Z

I have a PR up for review: tenstorrent/tt-metal#16093
If you'd like you could try to see if that unblocks you.

jdh8 added this to PyTorch 2.0 TT-NN Compiler Dec 2, 2024

jdh8 moved this to Todo in PyTorch 2.0 TT-NN Compiler Dec 2, 2024

swimdi self-assigned this Dec 3, 2024

swimdi linked a pull request Dec 4, 2024 that will close this issue

Lowering aten.native_group_norm.default #556

Open

swimdi mentioned this issue Dec 9, 2024

group norm accuracy failed #574

Open

bbradelTT mentioned this issue Dec 10, 2024

Some group_norm tests are failing in debug build tenstorrent/tt-metal#15868

Closed

swimdi moved this from Todo to In Progress in PyTorch 2.0 TT-NN Compiler Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aten.native_group_norm.default #538

aten.native_group_norm.default #538

jdh8 commented Dec 2, 2024

swimdi commented Dec 9, 2024

bbradelTT commented Dec 10, 2024

bbradelTT commented Dec 10, 2024

bbradelTT commented Dec 10, 2024

swimdi commented Dec 11, 2024 •

edited

Loading

bbradelTT commented Dec 17, 2024

aten.native_group_norm.default #538

aten.native_group_norm.default #538

Comments

jdh8 commented Dec 2, 2024

swimdi commented Dec 9, 2024

bbradelTT commented Dec 10, 2024

bbradelTT commented Dec 10, 2024

bbradelTT commented Dec 10, 2024

swimdi commented Dec 11, 2024 • edited Loading

bbradelTT commented Dec 17, 2024

swimdi commented Dec 11, 2024 •

edited

Loading