Reshape -> Transpose gives bad PCC #13745

yieldthought · 2024-10-11T12:54:56Z

Describe the bug

    tt_input = tt_input.reshape(1, 2048, 4, 128)
    tt_output= ttnn.transpose(tt_input, 1, 2)

gives 0.0 PCC compared to Torch:

    torch_ref = torch_input.view(1, 2048, 4, 128)
    torch_ref = torch_ref.transpose(1, 2)

To Reproduce

import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc

def test_transpose_with_reshape(device):
    # Create input tensor
    torch_input = torch.rand((1, 1, 2048, 512), dtype=torch.bfloat16)
    
    # TT operations
    tt_input = ttnn.from_torch(
        torch_input,
        dtype=ttnn.DataType.BFLOAT16,
        layout=ttnn.ROW_MAJOR_LAYOUT,
        device=device,
        memory_config=ttnn.L1_MEMORY_CONFIG,
    )
    tt_input = tt_input.reshape(1, 2048, 4, 128)
    tt_output= ttnn.transpose(tt_input, 1, 2)
    
    # Convert back to PyTorch for comparison
    tt_result = ttnn.to_torch(tt_output)
    
    # PyTorch reference operations
    torch_ref = torch_input.view(1, 2048, 4, 128)
    torch_ref = torch_ref.transpose(1, 2)
    
    # Compare results
    assert_with_pcc(torch_ref, tt_result, 0.9999)

Expected behavior
Expected this to match torch behaviour.

Please complete the following environment information:
Internal IRD-supplied N150

Additional context
Seen bringing up Llama 3.2

The text was updated successfully, but these errors were encountered:

sjameelTT · 2024-10-11T19:22:29Z

Have you tried ttnn.reshape(tt_input, (1, 2048, 4, 128))? The tensor.op apis are unreliable atm since they haven't been updated. We will get to updating those once we're done getting full functionality.

sjameelTT · 2024-10-11T19:23:21Z

I tested on my branch sjameel/transpose_pad and it worked with ttnn.reshape I will aim to get that into main soon.

sjameelTT · 2024-11-18T17:53:26Z

@yieldthought is this still an issue on main?

ntarafdar · 2024-11-21T16:23:17Z

@yieldthought can you comment if this is fixed? @sjameelTT just pushed a fix to transpose

yieldthought · 2024-11-25T10:11:05Z

The test as provided in this issue still fails on main:

expected_pytorch_result = tensor([[[[0.0508, 0.3359, 0.5977,  ..., 0.5312, 0.7422, 0.4688],
          [0.5156, 0.5273, 0.5117,  ..., 0.5898, 0.7...406, 0.1172, 0.8828],
          [0.4883, 0.9336, 0.3477,  ..., 0.6133, 0.1484, 0.0469]]]],
       dtype=torch.bfloat16)
actual_pytorch_result = TorchTensor([[[[0.0508, 0.3359, 0.5977,  ..., 0.5312, 0.7422, 0.4688],
               [0.6914, 0.9375, 0.6523,  ..., 0...9, 0.6562],
               [0.4883, 0.9336, 0.3477,  ..., 0.6133, 0.1484, 0.0469]]]],
            dtype=torch.bfloat16)
pcc = 0.9999

    def assert_with_pcc(expected_pytorch_result, actual_pytorch_result, pcc=0.9999):
        assert list(expected_pytorch_result.shape) == list(
            actual_pytorch_result.shape
        ), f"list(expected_pytorch_result.shape)={list(expected_pytorch_result.shape)} vs list(actual_pytorch_result.shape)={list(actual_pytorch_result.shape)}"
        pcc_passed, pcc_message = comp_pcc(expected_pytorch_result, actual_pytorch_result, pcc)
>       assert pcc_passed, construct_pcc_assert_message(pcc_message, expected_pytorch_result, actual_pytorch_result)
E       AssertionError: 0.015757835260159045

tests/ttnn/utils_for_testing.py:30: AssertionError

cglagovichTT · 2024-11-25T14:28:43Z

@ntarafdar can you add this test and arbitrary variants of it to your sweeps? The shapes in this test can be parametrized nicely to give lots of coverage for reshape -> transpose.

In addition, please test {single device, multidevice} x {L1 interleaved, DRAM interleaved} x {RM, TILE}. If any of these configurations is not supported yet, it should assert that.

sjameelTT · 2024-11-25T15:50:01Z

This is a reshape issue, not a transpose issue:

replace:
tt_input = tt_input.reshape(1, 2048, 4, 128)
with
tt_input = ttnn.reshape(tt_input, (1, 2048, 4, 128))

and it will pass. @ntarafdar we should probably make sure that tensor.reshape pybind is the same as the ttnn.reshape...

ntarafdar · 2024-11-25T15:51:31Z

ahh CC: @jvegaTT

tt_input.reshape does a view change and ignores padding...

ntarafdar · 2024-11-25T15:52:53Z

I'm closing this issue, since this is now working with @sjameelTT's suggestion. @cglagovichTT please file a separate issue for additional testing.

cglagovichTT · 2024-11-25T15:53:34Z

I think the issue should stay open since tt_input.reshape(...) still has unexpected behavior. ttnn.reshape(...) is a workaround, so it can stay at P1

ntarafdar · 2024-11-25T15:55:10Z

tt_input.reshape is not the API to use. we are using ttnn api for transpose and a tensor API for reshape.
I agree this is a problem but this is not the problem for this issue

cglagovichTT · 2024-11-25T15:56:28Z

In that case tt_input.reshape should not be a valid call, so that pybind should be removed as part of resolving this issue.
The important thing here is that there's unexpected behavior from the user perspective which is unresolved

ntarafdar · 2024-11-25T15:58:38Z

this might be a more involved change.
A lot of API uses this reshape because it abuses the data format underneath the hood to incorporate padding, which is inappropriately used by developers everywhere.
We can see what needs to be done, but it might be a lot of work for little reward.

cglagovichTT · 2024-11-25T16:09:31Z

I'd say that the reward is high. Unexpected API behavior like this cost developers lots of debug time

uaydonat · 2024-11-25T17:38:03Z

Simple bugs like these are very time consuming and annoying to the customers.
There is no way for a customer/developer to know not to use tt_input.reshape. So, either it should be removed completely or maybe only allow specific use cases (for padding?) we know works, and assert other cases and point to ttnn.reshape.

jvegaTT · 2024-11-26T03:07:47Z

tt_input.reshape just updates the Metadata of the tensor without changing the data. This is what we want sometimes as some reshapes have no changes in the physical locations of the data or if we want to just include some of the padding data from the padded shape into the logical shape. ttnn.reshape will quickly check if it is valid to call tt_input.reshape and does so if possible. If not then it will do the actual reshape with the required data movements.

There are use cases for both of these functions. Would renaming one of them be the solution? Both are extensively utilized all over our code base, it is a very involved change.

cglagovichTT · 2024-11-26T13:37:47Z

It sounds like tensor.reshape is doing a view without validating its inputs. I think a solution would be to change its name to something that indicates that it's an internal unsafe view op, or add input validation and call it view. What do you think?

Also someone can check this, but in Pytorch doesn't tensor.op(...) and torch.op(tensor, ...) usually have the same behavior?

yieldthought · 2024-12-10T16:18:52Z

This test still fails on main with E AssertionError: 0.015991536341495492

### Ticket #13745 ### Problem description Linking Tensor.reshape to ttnn.reshape. Adding Tensor.reshape as an experimental operation under tonne ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12202038977 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12204887897 - [ ] Model regression CI testing passes (if applicable) - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12204897132 - [ ] New/Existing tests provide coverage for changes

prajaramanTT · 2025-01-09T21:21:51Z

@yieldthought Is this still an open issue ? If not, can you please close this ticket ? Thanks.

uaydonat · 2025-01-10T02:48:19Z

on Dec 10, @yieldthought said the unit test was still failing.

@ntarafdar any updates since then? If not, you guys should try the unit test.

yieldthought · 2025-01-13T11:00:41Z

The test in this issue still runs and still fails on main. What did you think the resolution was? Can you add the test code into a unit test on branch and only ping me here when it's passing or when the API's been changed to prevent tensor.reshape if that's the intended resolution?

jvegaTT · 2025-01-13T14:11:40Z

Hi, A fix was merged in a couple of weeks ago but it was subsequently reverted by Raymond as its interaction with a PR pushed just prior to it broke the testing infra. Nour is currently trying to re-push this fix in and is in the awaiting reviewers stage #16377

yieldthought added bug Something isn't working LLM_bug Op Generalization Generalization and relaxations of requirements in Ops llama3 P1 labels Oct 11, 2024

cglagovichTT assigned cglagovichTT, ntarafdar and sjameelTT and unassigned cglagovichTT Oct 11, 2024

ntarafdar closed this as completed Nov 25, 2024

ntarafdar assigned jvegaTT Nov 25, 2024

ntarafdar reopened this Nov 25, 2024

ntarafdar assigned nardoTT Nov 26, 2024

nardoTT added a commit that referenced this issue Dec 4, 2024

#13745:link tensor.reshape to ttnn.reshape

f132c05

nardoTT added a commit that referenced this issue Dec 4, 2024

#13745:modify failed test1

1556034

nardoTT added a commit that referenced this issue Dec 4, 2024

#13745:move tensor.reshape_unsafe to ttnn.experimental

47a3bcd

nardoTT added a commit that referenced this issue Dec 4, 2024

#13745:remove tensor.reshape

1b25116

nardoTT added a commit that referenced this issue Dec 4, 2024

#13745:fix more cpp files

033a48f

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:replace tensor.reshape with tensor.reshape_unsafe

c6dee3b

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:link tensor.reshape to ttnn.reshape

1467253

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:modify failed test1

6d02e3d

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:move tensor.reshape_unsafe to ttnn.experimental

00130c5

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:remove tensor.reshape

7963384

nardoTT added a commit that referenced this issue Dec 5, 2024

#13745:fix more cpp files

a6885c4

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:replace tensor.reshape with tensor.reshape_unsafe

3d714c4

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:link tensor.reshape to ttnn.reshape

8cce10f

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:modify failed test1

6861af6

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:move tensor.reshape_unsafe to ttnn.experimental

e2639e2

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:remove tensor.reshape

96ac708

nardoTT added a commit that referenced this issue Dec 9, 2024

#13745:fix more cpp files

29581d4

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:replace tensor.reshape with tensor.reshape_unsafe

c2b2675

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:link tensor.reshape to ttnn.reshape

bb1d4bc

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:modify failed test1

f833d23

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:move tensor.reshape_unsafe to ttnn.experimental

0a1c7f9

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:remove tensor.reshape

0b3dde6

nardoTT added a commit that referenced this issue Dec 16, 2024

#13745:fix more cpp files

c950c66

nardoTT mentioned this issue Dec 31, 2024

Linking tensor.reshape to ttnn.reshape #16377

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reshape -> Transpose gives bad PCC #13745

Reshape -> Transpose gives bad PCC #13745

yieldthought commented Oct 11, 2024 •

edited

Loading

sjameelTT commented Oct 11, 2024

sjameelTT commented Oct 11, 2024 •

edited

Loading

sjameelTT commented Nov 18, 2024

ntarafdar commented Nov 21, 2024

yieldthought commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024 •

edited

Loading

sjameelTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024 •

edited

Loading

cglagovichTT commented Nov 25, 2024

uaydonat commented Nov 25, 2024

jvegaTT commented Nov 26, 2024

cglagovichTT commented Nov 26, 2024

yieldthought commented Dec 10, 2024

prajaramanTT commented Jan 9, 2025

uaydonat commented Jan 10, 2025

yieldthought commented Jan 13, 2025 •

edited

Loading

jvegaTT commented Jan 13, 2025

Reshape -> Transpose gives bad PCC #13745

Reshape -> Transpose gives bad PCC #13745

Comments

yieldthought commented Oct 11, 2024 • edited Loading

sjameelTT commented Oct 11, 2024

sjameelTT commented Oct 11, 2024 • edited Loading

sjameelTT commented Nov 18, 2024

ntarafdar commented Nov 21, 2024

yieldthought commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024 • edited Loading

sjameelTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024

cglagovichTT commented Nov 25, 2024

ntarafdar commented Nov 25, 2024 • edited Loading

cglagovichTT commented Nov 25, 2024

uaydonat commented Nov 25, 2024

jvegaTT commented Nov 26, 2024

cglagovichTT commented Nov 26, 2024

yieldthought commented Dec 10, 2024

prajaramanTT commented Jan 9, 2025

uaydonat commented Jan 10, 2025

yieldthought commented Jan 13, 2025 • edited Loading

jvegaTT commented Jan 13, 2025

yieldthought commented Oct 11, 2024 •

edited

Loading

sjameelTT commented Oct 11, 2024 •

edited

Loading

cglagovichTT commented Nov 25, 2024 •

edited

Loading

ntarafdar commented Nov 25, 2024 •

edited

Loading

yieldthought commented Jan 13, 2025 •

edited

Loading