Release v0.52.0-rc2 · tenstorrent/tt-metal

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10607126656

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.52.0-rc2

📦 Uncategorized