Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some group_norm tests are failing in debug build #15868

Closed
bbradelTT opened this issue Dec 10, 2024 · 1 comment
Closed

Some group_norm tests are failing in debug build #15868

bbradelTT opened this issue Dec 10, 2024 · 1 comment
Assignees
Labels
bug Something isn't working op_cat: fused P1

Comments

@bbradelTT
Copy link
Contributor

Repro steps:

  • build debug build
  • run test_group_norm tests. Some testing showed that some subsets of the below may fail (checked on GS and WH)
  • pytest tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid
  • pytest tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid_tile_layout
  • 'pytest tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_height_sharded'

Expected Behaviour:

  • pass

Actual Behaviour:

  • some tests fail

See tenstorrent/pytorch2.0_ttnn#538

Might be related to bad CB handling, e.g. see #15458

@bbradelTT
Copy link
Contributor Author

On GS with Debug:

pytest "tests/ttnn/unit_tests/operations/test_group_norm.py::test_group_norm_with_block_sharded_v2_8x8_grid[N=2-C=320-H=64-W=64-num_groups=32-device_params={'l1_small_size': 0}]"

produces

E       RuntimeError: TT_ASSERT @ ../tt_metal/impl/buffers/circular_buffer_types.cpp:109: false
E       info:
E       Cannot set to globally allocated buffer. Circular buffer size 131072 B exceeds allocated L1 buffer bank size of 98304 B
...
E        --- tt::tt_metal::v0::CircularBufferConfig::set_globally_allocated_address(tt::tt_metal::v0::Buffer const&)
E        --- ttnn::operations::normalization::groupnorm_multi_core_sharded(tt::tt_metal::Tensor const&, std::__1::optional<tt::tt_metal::Tensor const> const&, std::__1::optional<tt::tt_metal::Tensor const> const&, std::__1::optional<tt::tt_metal::Tensor const> const&, tt::tt_metal::Tensor&, float, unsigned int, unsigned int, MathFidelity, tt::tt_metal::DataType, tt::umd::xy_pair, bool)

bbradelTT added a commit that referenced this issue Jan 8, 2025
### Ticket
Link to Github Issue
#15868

### Problem description
- Some group norm tests are failing with Debug Build with an assert
about CB size
- The underlying reason is that the in0 CB was always sized to the
nearest tile, even when the input was row major and therefore smaller

### What's changed
- Set the CB size for in0 based on the per bank size to get the right
size for row major and tile input

### Checklist
- [x] Post commit CI passes across
https://github.com/tenstorrent/tt-metal/actions/runs/12360041457 and
https://github.com/tenstorrent/tt-metal/actions/runs/12374697868 all
sub-jobs pass at least once
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12360044285
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12360048449
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12360046481
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes (in Debug)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working op_cat: fused P1
Projects
None yet
Development

No branches or pull requests

1 participant