Release v0.52.0-rc15 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10747521160

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests against wheels
- PR: #11870
#11109: refactoring moreh getitem forward
- PR: #12134
Revert "#11132 Run Post-Commit Python Tests against wheels "
- PR: #12149
#8865: Update reference dispatch times. Move tt_lib to ttnn.
- PR: #11991
#11809: migrate profiler into ttnn
- PR: #11942
Use new resnet test infra for GS
- PR: #12072
#12073: assert out on block and width sharded concat
- PR: #12111
[Bugfix] Fixed wrong capture counter
- PR: #12169
#11717: aligning jit build and collection of srcs used by multiple threads during risc compile
- PR: #12080
#11453: Add options to enable compilation with sanitizers
- PR: #12181
Revert "#11453: Add options to enable compilation with sanitizers"
- PR: #12183
#11453: Add options to enable compilation with sanitizers
- PR: #12185
#11623: Adding workaround for ND BH hang for MatmulMultiCoreMultiDRAMIn0MCastIn1MCast
- PR: #12186
Add UNet Shallow unit tests to post-commit test suite
- PR: #12145
#10808: use sharded concat in yolov4
- PR: #12182
#11858 , #11859: Fix Dockerfile 20.04 and 22.04 sequence and requirements
- PR: #11861
#11720: Enable ttnn binary to be built using rpath as origin
- PR: #11721
add myself to codeowners on docker scripts and installation guide
- PR: #12202
Add GH workflow telemetry into prepare metal action
- PR: #12203
Complex imag and imag_bw, angle_bw TTNN sweeps
- PR: #12143
#11989: Code clean up
- PR: #12008
Support arbitrary prefill lengths in Mixtral demo
- PR: #12002
#5337: Fix Mixtral unit tests
- PR: #12213

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.52.0-rc15

📦 Uncategorized