Release v0.53.0-rc3 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11097804939

📦 Uncategorized

#12883: Add initial unit tests for N300
- PR: #12922
#12499: Migrate moreh_norm, moreh_norm_backward operations from tt_eager to ttnn
- PR: #12500
#12321: Migrate moreh_bmm, moreh_bmm_backward operations from tt_eager to ttnn
- PR: #12322
Add more eltwise sweeps, add new functions in sweep_framework/utils.py
- PR: #13003
#12690: Port moreh_softmax and moreh_softmax_backward to ttnn
- PR: #12698
#0: Bump falcon7b device perf test because we have a real bump
- PR: #13008
Aliu/tech reports
- PR: #13010
#11332: Move ttnn/examples ttnn/ttnn/examples so we can enable directly calling them for users, but not meant to be part of ttnn API
- PR: #11612
Add sweeps for sign, deg2rad, rad2deg, relu6
- PR: #12994
Revert "#10016: jit_build: link substitutes, tdma_xmov, noc"
- PR: #13009
#12952: Update test_ccl_on_tg.cpp to work on TGG as well as TG
- PR: #12982
[skip ci] #0: ViT report edits
- PR: #13015
#12879: Use () so that workflow_call actually captures the call when we trigger off completed workflow runs and add them to workflows to properly capture
- PR: #13012
[skip ci] #13019 Create remove-stale-branches.yaml
- PR: #13020
#13019 Update remove-stale-branches.yaml
- PR: #13021
Add tiny tile support for Tensor, matmul
- PR: #12908
[skip ci] #13019 Add default recipient
- PR: #13023
build tt metal in docker in CI
- PR: #11923
Revert "build tt metal in docker in CI"
- PR: #13027
[skip ci] #0: ViT tech report
- PR: #13032
Mchiou/11762 build tt metal in docker
- PR: #13033
#13013: Added tests to run in TGG unit tests workflow
- PR: #13016
[skip ci] #13019 Update remove-stale-branches.yaml
- PR: #13025
Mchiou/0 fix docker build storage
- PR: #13042
#11531: Autogenerate API rst stub files, add summary table on API page
- PR: #12075
Add --no-advice to perf report, small fixes
- PR: #13048
preserve fp32 precision
- PR: #12794
#0: Remove unnecessary using declarations
- PR: #13056
#12775: Cleanup docker run action
- PR: #12777
#0: Update to gcc-12.x, take 2
- PR: #12999
#12945: update galaxy/n150 eth dispatch cores
- PR: #13031
#13070: fix SD
- PR: #13073
Update Llama codeowners
- PR: #12116
#0: fix uncaught edge case in page update cache and added it in test suit
- PR: #13074
#12754: Migrate moreh_nll_loss operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12807
#8633:Add TT_Fatal for full and ones op
- PR: #12921
#12985: Expose ttnn::ccl::Topology at python level
- PR: #12988
#12556: Add queue_id and optional output tensors to assign_bw
- PR: #12573
Support for increasing 1-D row major int32 tensors by one
- PR: #12773
#12828: update ttnn matmul doc string
- PR: #13071
Llama 3.1 8b DRAM-sharded matmuls
- PR: #12869
Update perf and latest features for llm models (Sept 23)
- PR: #13064
Work around CSV reporting 64 cores for DRAM-sharded matmuls
- PR: #13108
#0: Fix PCC to correct bound
- PR: #13110
#0: Simplify llrt/memory API
- PR: #13067
#0: Fix caching race
- PR: #13063
#0: Fix merge error with 80d6e48
- PR: #13112
#11004: moreh: use env var for kernel src search path
- PR: #12541
#12328: Fix Llama3.1-8B MLP tests running out of L1
- PR: #13113
#11769: extend support for transposing/permuting bfloat8 tensors on n…
- PR: #13018
#12141: Fixed matmul shape validation issue
- PR: #12989
#0: move BufferType to device kernel accessible location
- PR: #12984
#12658: update sweep export script and create initial graph script
- PR: #13051
#0: ViT on WH
- PR: #13072
[skip ci] Update README.md (ViT on n150)
- PR: #13119
#0: Bump resnet50 ttnn 2cq compile time because it regressed likely due to gcc risc-v upgrade
- PR: #13121
#0: Update WH Resnet compile time threshold
- PR: #13115
Flash decode improvements r2
- PR: #13028
#0: added support for n_heads > 1 for page cache prefill
- PR: #13117
#0: Bump mamba compile time as it's not that important and the model is still performant, need to unblock people…
- PR: #13130
#0: move Layout enum to device accessible location
- PR: #13118
#0: Bump distilbert compile time because it keeps failing on it
- PR: #13135
#13088: Cleanup set-1 unary backward ops
- PR: #13096
#10033: Add forward support for gcd and lcm
- PR: #10241
#13150: Cleanup LCM, GCD Macro
- PR: #13151
Llama3.1 8b demo with tracing
- PR: #13153
#13058: update matmul bias size validation
- PR: #13104
#0: (MINOR) Update to v0.53.0
- PR: #13165
#0: try with python 3.10
- PR: #13168
#13145: Temporarily revert Resnet on Galaxy to use slower config for first conv to avoid hangs
- PR: #13146
#0: Remove unnecessary ProgramDeleter
- PR: #13134
#13127: Switch python get_legacy_shape to shape.with_tile_padding()
- PR: #13124
Add sweeps for remainder, fmod, minimum, maximum, logical_and eltwise ops, rename eltwise sweeps
- PR: #13099
Fix Yolo tests after updating weights shape in conv2d
- PR: #13163
#13172: Use lower python version and cache dependencies
- PR: #13173
#11830: Move l1/dram/pcie alignment into HAL
- PR: #12983
#13014: optimize slice by adding a 4D uint32_t array implementation o…
- PR: #13125
Add llk support for cumsum and transpose_wh_dest with relevant tests
- PR: #12925
Add numeric stable option for softmax
- PR: #13068
#12878: Add links to job and pipeline for CI/CD analytics
- PR: #13183
#0: fix CCL nightly tests
- PR: #13164
#12919: Cleanup set-2 Unary Backward ops
- PR: #13138
#8865: Add sharded tensor support to dispatch profile infra
- PR: #12871
#0: Update CODEOWNERS for ttnn/ttnn/operations/moreh.py
- PR: #13185
#13137: Revise moreh_arange operation
- PR: #13139
#13095: Refactor moreh_nll_loss operations
- PR: #13097
#10439: ttnn implementation of vgg model
- PR: #12511
#13175: Add new category to summary table in sweeps query tool
- PR: #13176
#5174: Disable command buffer FIFOs on BH
- PR: #13079
Update CODEOWNERS
- PR: #13209
Fix demo_trace and add on-device argmax to test_llama_perf
- PR: #13201
#0: fix program caching bug in post_all_gather
- PR: #13224
Do not require test dispatch workflow to run on "in-service" runners
- PR: #12660
Add description to describe typical labels one could use in test dispatch workflow
- PR: #13228
Add an option to split dprint output by risc
- PR: #13131
Add new "choose your own pipeline" workflow
- PR: #13230
#11962: remove uint8 unpack reconfig code
- PR: #13218
Add tg and tgg frequent tests to "Choose your pipeline" workflow
- PR: #13236
Add options to select a subset of pipelines that a user would like to run
- PR: #13237
Update names of perf-models and perf-device-models jobs
- PR: #13238
#13086: Revising moreh_getitem
- PR: #13087
Sweeps: log, log1p, log2, log10
- PR: #13045
#12721: Cleanup set-3 Unary Backward ops
- PR: #13207
#13212: Cleanup set-4 Unary backward ops
- PR: #13214
Add initial (very limited) support for line reduce scatter
- PR: #13133
pack kernel binary memory spans into one
- PR: #12977

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.53.0-rc3

📦 Uncategorized