v0.54.0-rc2
Pre-release
Pre-release
github-actions
released this
19 Dec 02:02
·
72 commits
to main
since this release
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12404541627
📦 Uncategorized
- #15836: Update reads, writes, and synchronize ttnn apis to take in sub device ids
- PR: #15812
- #13405: TTNN implementation of LENET model
- PR: #13473
- Unvendor nlohmann json
- PR: #15956
- Updated install_dependencies.sh to skip installing additional recommended packages and skip prompting for user input for certain package installations
- PR: #15977
- #0: Fix conv_transpose2d initting wrong compute_kernel_config variant
- PR: #15987
- Fix t3k unit test pipeline
- PR: #15986
- Run matmul based Conv2d with input from DRAM
- PR: #15861
- Add selu sweep
- PR: #15547
- Add TG support to llama3 family
- PR: #15724
- Fix Llama rope scaling factor, improve accuracy
- PR: #15909
- Let ttnn.reshape support 0 volume tensors
- PR: #15289
- #0: Update Llama3 README
- PR: #16006
- #0: Minor fix to Llama3 model config for TG
- PR: #16008
- #13944: Redesign memory packing API
- PR: #15980
- #0: Get rid of run_pre_post_commit_regressions* scripts and split CPP tests as much as we can
- PR: #15968
- Create new FD frequent pipeline to isolate unstable pgm benchmark tests
- PR: #16010
- Revert "#13405: TTNN implementation of LENET model (#13473)"
- PR: #16009
- #0: Dedup code in pytensor using generic lambdas and duck typing
- PR: #15989
- #14353: DRAM Read Alignment for Layernorm
- PR: #15993
- Afuller/fix clang tidy scan
- PR: #16017
- #0: Support arch-specific sfpi releases
- PR: #15831
- Enable too-small-loop-variable check
- PR: #15984
- Remove built cache of previous git commits.
- PR: #15344
- [tt-train] Make tests to open and close device explicitly
- PR: #15982
- Update ttcnn.md
- PR: #16025
- #0: Add bc to docker container for pgm dispatch math
- PR: #16030
- #16012: Revert conv2d changes because of perf regressions, pcc regressions, and increase in runtime
- PR: #16019
- Update ttcnn.md
- PR: #16031
- Enable noexcept-move-ctor check
- PR: #16018
- More updates to ttcnn.md
- PR: #16032
- disable workflow telemetry in prepare-metal-run
- PR: #16034
- Add support for pretty printing Conv2dConfig
- PR: #16027
- [tt-train] TT-train build is broken in main
- PR: #16035
- #0: created interleaved to sharded e2e sweep test
- PR: #16016
- Add support for padding along width dimension to ttnn.pad
- PR: #15985
- Bump umd
- PR: #15967
- #0: Prevent slice from padding up a 0 volume tensor
- PR: #15988
- #0: support unequal ranked inputs for broadcast in binary_ng
- PR: #15957
- #16014: Fix yolo4 e2e perf measurement
- PR: #16044
- Update CODEOWNERS - add experimental CCL section
- PR: #16039
- #15780: div ops debug
- PR: #15992
- Revert "#16012: Revert conv2d changes because of perf regressions, pc…
- PR: #16045
- #13127: Make TensorLayout::compute_physical_shard_shape public
- PR: #16023
- Link Tensor.reshape to ttnn.reshape
- PR: #15669
- #0: Fix merge conflicts originating from #15289
- PR: #16062
- Integrate chunked prefill into t3k Llama3-70B
- PR: #15921
- Bump MagicEnum to v0.9.7
- PR: #16065
- #15944: Fix pybind of create_sub_device_manager_with_fabric to call the correct function.
- PR: #16056
- [tt-train] Add option to disable wandb in examples
- PR: #16069
- Update perf and latest features for llm models (Dec 16)
- PR: #16060
- #16070: Use the same Docker image as built
- PR: #16071
- [tt-train] Bump magic_enum from 0.9.6 to 0.9.7
- PR: #16074
- Update ttcnn.md
- PR: #16077
- #13643: Extend binary-ng math support to match all primitive binary ops.
- PR: #16068
- #14530: remove up front padding from generic reduce
- PR: #16053
- Revert "#0: Fix merge conflicts originating from #15289"
- PR: #16080
- Revert "Link Tensor.reshape to ttnn.reshape"
- PR: #16081
- #15061: Implement multi-device tensor distribution APIs in terms of C++ ttnn tensors
- PR: #15886
- #0: Allow ttnn.pad to pad Tensor to an odd width in row major
- PR: #16079
- #15565 Add unit test to show sharding ttnn.from_torch problems
- PR: #15827
- #14977: conv config to use higher cores.
- PR: #15962
- Revert "#15565 Add unit test to show sharding ttnn.from_torch problems"
- PR: #16086
- [UMD] Removed set_*_params calls and constants
- PR: #15908
- #0: Remove some dead code
- PR: #16084
- Updated installation script
- PR: #16101
- Python -> Python3
- PR: #16063
- Add transpose WH sharded, generalize row major permute when N > 4, and do a minor refactor of ttnn::permute
- PR: #15881
- Adding ND support for tilize/untilize with padding
- PR: #15933
- [Llama3.2-11b vLLM Integration] Add support for paged cross attention, fixes for continuous batching, simplified decode forward call
- PR: #16076
- #0: Enable Local Sweeps and Use a Faster Interprocess Queue
- PR: #16098
- #15601: Implement support for MeshDevice::reshape(..)
- PR: #16029
- Remove setup_core_to_tlb_map
- PR: #16048
- #0: Let sharded_to_interleaved handle interleaved input
- PR: #16116
- #0: separate validation of conv weight and bias.
- PR: #15990
- #0: Minor refactor of pytensor and tensor implementation files
- PR: #16108
- C++ files should not be part of the API of a library
- PR: #16123
- #15857: Forge sweep test
- PR: #15858
- #15857: Unary forge sweep tests
- PR: #15901
- Fix some more namespace pollution caused by
using namespace tt::tt_metal
- PR: #16090
- #15713 Bad Eltwise Binary ZEROACC
- PR: #16094
- #15565 Fix unit test to show sharding ttnn.from_torch problems
- PR: #16088
- Fix paged SDPA decode CB sizing issue
- PR: #16059
- Reland async dispatch with workaround for hang.
- PR: #16121
- #16119: Add forge traces to matmul and reduce sweeps
- PR: #16139
- #10034: Binary shift operators
- PR: #16055
- #0: Remove incorrect memory span assert
- PR: #16136
- Add forge sweeps for slice and transpose
- PR: #16112
- #0: Move memory config serialization in the corresponding header away from types.hpp
- PR: #16151
- #16114: Allow Binarized Programs to be Reused across WH Devices
- PR: #16120
- #0: aligning conv2d transpose as conv
- PR: #16128