Skip to content

Commit

Permalink
Merge branch 'main' into miacim/dprint/test_for_dprint_tensix_dest_reg
Browse files Browse the repository at this point in the history
  • Loading branch information
miacim authored Oct 10, 2024
2 parents e14293e + ab725ab commit 2029d7d
Show file tree
Hide file tree
Showing 742 changed files with 14,366 additions and 28,467 deletions.
17 changes: 17 additions & 0 deletions .github/actions/ensure-active-weka-mount/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: "Ensure Active Weka Mount"
description: "Make sure weka mount is active"

inputs:
os:
description: 'Runner OS'
required: true

runs:
using: "composite"
steps:
- name: Ensure active weka mount
shell: bash
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
1 change: 1 addition & 0 deletions .github/workflows/build-artifact.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ jobs:
build-artifact:
needs: build-docker-image
if: always()
timeout-minutes: 30
strategy:
matrix:
arch: ${{ fromJson(inputs.arch || '["grayskull", "wormhole_b0", "blackhole"]') }}
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/perf-models-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,7 @@ jobs:
- name: Enable Performance mode
run: |
sudo cpupower frequency-set -g performance
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/t3000-demo-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,7 @@ jobs:
- name: Enable performance mode
run: |
sudo cpupower frequency-set -g performance
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/t3000-frequent-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,7 @@ jobs:
runs-on: ["arch-wormhole_b0", "config-t3000", "in-service", "pipeline-functional"]
steps:
- uses: tenstorrent-metal/metal-workflows/.github/actions/[email protected]
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/t3000-model-perf-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,7 @@ jobs:
- name: Enable performance mode
run: |
sudo cpupower frequency-set -g performance
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/t3000-unit-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ jobs:
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
- uses: ./.github/actions/ensure-active-weka-mount
- uses: actions/download-artifact@v4
with:
name: TTMetal_build_${{ matrix.test-group.arch }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tg-frequent-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
run: tar -xvf ttm_${{ matrix.test-group.arch }}.tar
- uses: ./.github/actions/install-python-deps
- name: Run frequent regression tests
timeout-minutes: 60
timeout-minutes: 90
run: |
source ${{ github.workspace }}/python_env/bin/activate
cd $TT_METAL_HOME
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/tg-model-perf-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,7 @@ jobs:
- name: Enable performance mode
run: |
sudo cpupower frequency-set -g performance
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
6 changes: 1 addition & 5 deletions .github/workflows/tgg-model-perf-tests-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,7 @@ jobs:
- name: Enable performance mode
run: |
sudo cpupower frequency-set -g performance
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: ./.github/actions/ensure-active-weka-mount
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ttnn-post-commit-wrapper.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ jobs:
uses: ./.github/workflows/ttnn-post-commit.yaml
with:
arch: ${{ matrix.test-group.arch}}
runner-label: ${{ matrix.test-group.runner-label}}
runner-label: ${{ matrix.test-group.runner-label }}
52 changes: 51 additions & 1 deletion .github/workflows/ttnn-run-sweeps.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,37 +13,54 @@ on:
- add
- ccl.line_all_gather
- ccl.all_gather_n300
- eltwise.unary.abs.abs_pytorch2
- eltwise.unary.relu.relu
- eltwise.unary.relu.relu_pytorch2
- eltwise.unary.gelu.gelu
- eltwise.unary.gelu.gelu_pytorch2
- eltwise.unary.hardsigmoid.hardsigmoid_pytorch2
- eltwise.unary.leaky_relu.leaky_relu_pytorch2
- eltwise.unary.cos.cos
- eltwise.unary.cos.cos_pytorch2
- eltwise.unary.sin.sin
- eltwise.unary.sin.sin_pytorch2
- eltwise.unary.tril.tril_pytorch2
- eltwise.unary.clamp.clamp
- eltwise.unary.clip.clip
- eltwise.unary.cbrt.cbrt
- eltwise.unary.rsub.rsub
- eltwise.unary.rsub.rsub_pytorch2
- eltwise.unary.rsqrt.rsqrt_pytorch2
- eltwise.unary.rdiv.rdiv
- eltwise.unary.frac.frac
- eltwise.unary.ceil.ceil
- eltwise.unary.ceil.ceil_pytorch2
- eltwise.unary.trunc.trunc
- eltwise.unary.floor.floor
- eltwise.unary.floor.floor_pytorch2
- eltwise.unary.clone.clone
- eltwise.unary.elu.elu
- eltwise.unary.elu.elu_pytorch2
- eltwise.unary.erfc.erfc
- eltwise.unary.exp.exp
- eltwise.unary.exp.exp_pytorch2
- eltwise.unary.exp2.exp2
- eltwise.unary.expm1.expm1
- eltwise.unary.tanh.tanh
- eltwise.unary.tanh.tanh_pytorch2
- eltwise.unary.sign.sign
- eltwise.unary.rad2deg.rad2deg
- eltwise.unary.deg2rad.deg2rad
- eltwise.unary.relu6.relu6
- eltwise.unary.log.log
- eltwise.unary.log.log_pytorch2
- eltwise.unary.log1p.log1p
- eltwise.unary.log2.log2
- eltwise.unary.log10.log10
- eltwise.unary.bitwise.bitwise_and
- eltwise.unary.bitwise.bitwise_left_shift
- eltwise.unary.bitwise.bitwise_not
- eltwise.unary.bitwise.bitwise_not_pytorch2
- eltwise.unary.bitwise.bitwise_or
- eltwise.unary.bitwise.bitwise_right_shift
- eltwise.unary.bitwise.bitwise_xor
Expand All @@ -55,9 +72,10 @@ on:
- eltwise.unary.erfinv.erfinv
- eltwise.unary.i0.i0
- eltwise.unary.silu.silu
- eltwise.unary.silu.silu_pytorch2
- eltwise.unary.glu.glu
- eltwise.unary.lgamma.lgamma
- eltwise.unary.sigmoid.sigmoid
- eltwise.unary.sigmoid.sigmoid_pytorch2
- eltwise.unary.sigmoid_accurate.sigmoid_accurate
- eltwise.unary.tril.tril
- eltwise.unary.triu.triu
Expand All @@ -74,8 +92,28 @@ on:
- eltwise.unary.sinh.sinh
- eltwise.unary.relu_min.relu_min
- eltwise.unary.relu_max.relu_max
- eltwise.unary.softplus.softplus
- eltwise.unary_backward.clamp_bw.clamp_bw
- eltwise.unary_backward.hardtanh_bw.hardtanh_bw
- eltwise.unary_backward.mul_bw.mul_bw
- eltwise.unary_backward.softplus_bw.softplus_bw
- eltwise.unary_backward.threshold_bw.threshold_bw
- eltwise.unary_backward.div_bw.div_bw
- eltwise.unary_backward.log_bw.log_bw
- eltwise.unary_backward.relu6_bw.relu6_bw
- eltwise.unary.lgamma
- eltwise.unary.logit
- eltwise.unary.mish
- eltwise.unary.multigammaln
- eltwise.unary.isfinite
- eltwise.unary.isinf
- eltwise.unary.isnan
- eltwise.unary.isneginf
- eltwise.unary.isposinf
- eltwise.binary.add.add_all_pytorch2
- eltwise.binary.subtract.subtract
- eltwise.binary.multiply.multiply
- eltwise.binary.multiply.mul_tensor_pytorch2
- eltwise.binary.div.div
- eltwise.binary.div_no_nan.div_no_nan
- eltwise.binary.logical_or.logical_or_
Expand All @@ -90,21 +128,33 @@ on:
- eltwise.binary.remainder.remainder
- eltwise.binary.squared_difference.squared_difference
- eltwise.binary.squared_difference_output.squared_difference_output
- eltwise.binary.remainder.remainder_scalar_pytorch2
- eltwise.binary.bcast.bcast_h_sharded
- eltwise.binary.bcast.bcast
- eltwise.binary.eq.eq_scalar_pytorch2
- eltwise.binary.gt.gt_scalar_pytorch2
- eltwise.binary.le.le_tensor_pytorch2
- eltwise.binary.fmod.fmod
- eltwise.binary.floor_divide.floor_divide_pytorch2
- eltwise.binary.logaddexp.logaddexp
- eltwise.binary.ldexp.ldexp
- eltwise.binary.hypot.hypot
- eltwise.binary.xlogy.xlogy
- eltwise.composite.binary.addalpha.addalpha
- eltwise.composite.binary.subalpha.subalpha
- eltwise.composite.binary.minimum.minimum
- eltwise.composite.binary.minimum.minimum_pytorch2
- eltwise.composite.binary.maximum.maximum
- eltwise.composite.binary.maximum.maximum_pytorch2
- eltwise.composite.binary.pow.pow_pytorch2
- eltwise.composite.binary.pow.pow_scalar_pytorch2
- eltwise.composite.binary.pow.pow_tensor_pytorch2
- eltwise.ternary.addcmul.addcmul
- eltwise.ternary.addcdiv.addcdiv
- eltwise.ternary.mac.mac
- eltwise.ternary.lerp
- eltwise.ternary.where.where
- eltwise.ternary.where.where_pytorch2
- matmul.full.matmul_default_block_sharded
- matmul.full.matmul_default_height_sharded
- matmul.full.matmul_default_interleaved
Expand Down
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
cmake_minimum_required(VERSION 3.16)
cmake_policy(VERSION 3.16)

# Sanity check, forgetting to clone submodules is a common omission and results in a poor error message
if (NOT EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/tt_metal/third_party/umd/CMakeLists.txt")
message(FATAL_ERROR "Missing submodules. Run: git submodule update --init --recursive")
endif()

############################################
# Project setup
############################################
Expand Down
8 changes: 4 additions & 4 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ tests/ttnn/distributed/ @cfjchu @ayerofieiev-tt @dmakoviichuk-tt
/models/ @tt-rkim @uaydonat
/models/*/**
models/conv_on_device_utils*.py @mywoodstock @shwetankTT @sankarmanoj-tt
functional_*/ @eyonland @patrickroberts @yan-zaretskiy @cfjchu @xanderchin
models/demos @eyonland @patrickroberts @yan-zaretskiy @cfjchu @xanderchin
functional_*/ @uaydonat @esmalTT
models/demos @uaydonat @tt-rkim
models/demos/metal_BERT_large_11 @tt-aho @TT-BrianLiu
models/demos/wormhole @uaydonat @eyonland
models/demos/wormhole @uaydonat @tt-rkim
models/demos/t3000 @uaydonat
models/demos/falcon7b_common @skhorasganiTT @djordje-tt @uaydonat
models/demos/wormhole/mamba @esmalTT @uaydonat @kpaigwar
Expand All @@ -145,7 +145,7 @@ models/demos/t3000/llama3_70b @cglagovichTT @uaydonat @johanna-rock-tt @djordje-
models/demos/t3000/mixtral8x7b @yieldthought @mtairum @uaydonat
models/demos/tg/llama3_70b @cglagovichTT @uaydonat @johanna-rock-tt @djordje-tt @kpaigwar
models/demos/tg/falcon7b @skhorasganiTT @djordje-tt @uaydonat
models/demos/grayskull @uaydonat @eyonland
models/demos/grayskull @uaydonat @tt-rkim
models/demos/**/*resnet* @mywoodstock @shwetankTT @tt-aho
models/experimental/functional_unet @esmalTT @uaydonat @mywoodstock
models/perf/ @uaydonat @tt-rkim
Expand Down
20 changes: 10 additions & 10 deletions METALIUM_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ kernel:
namespace NAMESPACE {
void MAIN {
mm_init();
acquire_dst(tt::DstMode::Tile);
acquire_dst();
cb_wait_front(tt::CB::c_in0, /* number of tiles */ 1);
cb_wait_front(tt::CB::c_in1, /* number of tiles */ 1);
Expand All @@ -140,7 +140,7 @@ void MAIN {
pack_tile(0, tt::CB::c_out0);
cb_push_back(tt::CB::c_out0, /* number of tiles */ 1);
release_dst(tt::DstMode::Tile);
release_dst();
}
} // namespace NAMESPACE
```
Expand All @@ -149,7 +149,7 @@ It takes two matrix tiles from `tt::CB::c_in0` and `tt::CB::c_in0` L1 and
conducts a single-tile matrix multiplication. Finally, it packs the result to
`tt::CB::c_out0`.

Note that tile registers are acquired by `acquire_dst(..)`, but actually we can
Note that tile registers are acquired by `acquire_dst()`, but actually we can
use `tile_regs_..()` functions for the more fine-grained tile register lock
mechanism. At the end of this section, we will explain more details.

Expand Down Expand Up @@ -226,26 +226,26 @@ inline __attribute__((always_inline)) void cb_wait_front(uint32_t cbid, uint32_t
}
```

Another interesting function is `acquire_dst(tt::DstMode mode)`:
Another interesting function is `acquire_dst()`:
* The UNPACK kernel has an empty one:
```
inline __attribute__((always_inline)) void acquire_dst(tt::DstMode mode) {
inline __attribute__((always_inline)) void acquire_dst() {
;
;
}
```
* The MATH kernel waits for DEST available:
```
inline __attribute__((always_inline)) void acquire_dst(tt::DstMode mode) {
inline __attribute__((always_inline)) void acquire_dst() {
( llk_math_wait_for_dest_available() );
;
}
```
* The UNPACK kernel waits for the end of MATH kernel:
```
inline __attribute__((always_inline)) void acquire_dst(tt::DstMode mode) {
inline __attribute__((always_inline)) void acquire_dst() {
;
( llk_packer_wait_for_math_done() );
Expand All @@ -254,14 +254,14 @@ inline __attribute__((always_inline)) void acquire_dst(tt::DstMode mode) {

[Its implementation](https://github.com/tenstorrent/tt-metal/blob/6d4951a20ca4c392888f924f038ae0780a8cc656/tt_metal/include/compute_kernel_api/reg_api.h#L28-L32) matches the preprocessed code:
```
ALWI void acquire_dst(tt::DstMode mode) {
ALWI void acquire_dst() {
MATH(( llk_math_wait_for_dest_available() ));
PACK(( llk_packer_wait_for_math_done() ));
}
```

Based on the implementation of `acquire_dst(..)`, if we use it, we can guess it
Based on the implementation of `acquire_dst()`, if we use it, we can guess it
executes UNPACK, MATH, PACK in order, which will help you to follow the
execution order and instructions that actually run on each kernel.

Expand Down Expand Up @@ -292,7 +292,7 @@ ALWI void tile_regs_release() {
}
```

We can replace `acquire_dst(..)` and `release_dst(..)` from the above example
We can replace `acquire_dst()` from the above example
with `tile_regs_..()` functions like:
```
namespace NAMESPACE {
Expand Down
Loading

0 comments on commit 2029d7d

Please sign in to comment.