Skip to content

Commit

Permalink
Merge branch 'main' into bfilipovicTT/sharded-sweeps-1
Browse files Browse the repository at this point in the history
  • Loading branch information
bfilipovicTT authored Dec 20, 2024
2 parents 6aa8e2f + 4512b9f commit 48d9f6b
Show file tree
Hide file tree
Showing 445 changed files with 16,638 additions and 3,728 deletions.
1 change: 1 addition & 0 deletions .github/actions/install-python-deps/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ runs:
cache-dependency-path: |
tt_metal/python_env/requirements-dev.txt
docs/requirements-docs.txt
tests/sweep_framework/requirements-sweeps.txt
pyproject.toml
create_venv.sh
install-cmd: ./create_venv.sh
2 changes: 2 additions & 0 deletions .github/workflows/blackhole-post-commit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ jobs:
build-docker-image:
uses: ./.github/workflows/build-docker-artifact.yaml
secrets: inherit
with:
os: "ubuntu-22.04-amd64"
build-artifact:
needs: build-docker-image
uses: ./.github/workflows/build-artifact.yaml
Expand Down
17 changes: 10 additions & 7 deletions .github/workflows/perf-device-models-impl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ name: "[internal] Single-card Device perf regressions impl"

on:
workflow_call:
inputs:
os:
required: false
type: string
default: "ubuntu-20.04"

jobs:
device-perf:
Expand All @@ -22,22 +27,20 @@ jobs:
LD_LIBRARY_PATH: ${{ github.workspace }}/build/lib
runs-on: ${{ matrix.test-info.runs-on }}
steps:
- uses: tenstorrent/tt-metal/.github/actions/checkout-with-submodule-lfs@main
- name: Ensure weka mount is active
run: |
sudo systemctl restart mnt-MLPerf.mount
sudo /etc/rc.local
ls -al /mnt/MLPerf/bit_error_tests
- uses: tenstorrent/tt-metal/.github/actions/checkout-with-submodule-lfs@main
- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
- uses: actions/download-artifact@v4
- uses: ./.github/actions/prepare-metal-run
with:
name: TTMetal_build_${{ matrix.test-info.arch }}_profiler
- name: Extract files
run: tar -xvf ttm_${{ matrix.test-info.arch }}.tar
- uses: ./.github/actions/install-python-deps
- name: Run device performance regressions
arch: ${{ matrix.test-info.arch }}
is_profiler: 'true'
- name: ${{ matrix.test-group.name }} tests
timeout-minutes: ${{ matrix.test-info.timeout }}
run: |
source python_env/bin/activate
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/perf-device-models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ jobs:
uses: ./.github/workflows/build-artifact.yaml
with:
tracy: true
os: "ubuntu-20.04-amd64"
secrets: inherit
device-perf:
needs: build-artifact-profiler
Expand Down
15 changes: 15 additions & 0 deletions .github/workflows/ttnn-run-sweeps.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,17 @@ on:
- eltwise.unary.hardsigmoid.hardsigmoid_pytorch2
- eltwise.unary.leaky_relu.leaky_relu_pytorch2
- eltwise.unary.abs.abs
- eltwise.unary.abs.abs_forge
- eltwise.unary.cos.cos
- eltwise.unary.cos.cos_pytorch2
- eltwise.unary.cos.cos_forge
- eltwise.unary.sin.sin
- eltwise.unary.sin.sin_pytorch2
- eltwise.unary.sin.sin_forge
- eltwise.unary.tril.tril_pytorch2
- eltwise.unary.clamp.clamp
- eltwise.unary.clamp.clamp_sharded
- eltwise.unary.clamp.clamp_forge
- eltwise.unary.clamp.clamp_pytorch2
- eltwise.unary.clamp.clamp_min_pytorch2
- eltwise.unary.clip.clip
Expand All @@ -44,6 +48,7 @@ on:
- eltwise.unary.rsub.rsub
- eltwise.unary.rsub.rsub_pytorch2
- eltwise.unary.rsqrt.rsqrt_pytorch2
- eltwise.unary.rsqrt.rsqrt_forge
- eltwise.unary.rdiv.rdiv
- eltwise.unary.frac.frac
- eltwise.unary.frac.frac_sharded
Expand All @@ -53,17 +58,20 @@ on:
- eltwise.unary.trunc.trunc_sharded
- eltwise.unary.floor.floor
- eltwise.unary.floor.floor_sharded
- eltwise.unary.floor.floor_forge
- eltwise.unary.floor.floor_pytorch2
- eltwise.unary.clone.clone
- eltwise.unary.elu.elu
- eltwise.unary.elu.elu_pytorch2
- eltwise.unary.erfc.erfc
- eltwise.unary.exp.exp
- eltwise.unary.exp.exp_forge
- eltwise.unary.exp.exp_pytorch2
- eltwise.unary.exp2.exp2
- eltwise.unary.expm1.expm1
- eltwise.unary.tanh.tanh
- eltwise.unary.tanh.tanh_pytorch2
- eltwise.unary.tanh.tanh_forge
- eltwise.unary.atanh.atanh
- eltwise.unary.atan.atan
- eltwise.unary.sign.sign
Expand All @@ -72,9 +80,11 @@ on:
- eltwise.unary.relu6.relu6
- eltwise.unary.log.log
- eltwise.unary.log.log_pytorch2
- eltwise.unary.log.log_forge
- eltwise.unary.log1p.log1p
- eltwise.unary.log2.log2
- eltwise.unary.log10.log10
- eltwise.unary.sqrt.sqrt_forge
- eltwise.unary.bitwise.bitwise_and
- eltwise.unary.bitwise.bitwise_left_shift
- eltwise.unary.bitwise.bitwise_not
Expand All @@ -85,9 +95,11 @@ on:
- eltwise.unary.log_sigmoid.log_sigmoid
- eltwise.unary.logical_not.logical_not_
- eltwise.unary.logical_not.logical_not
- eltwise.unary.logical_not.logical_not_forge
- eltwise.unary.logical_not.logical_not_output
- eltwise.unary.logical_not.logical_not_pytorch2
- eltwise.unary.neg.neg_pytorch2
- eltwise.unary.neg.neg_forge
- eltwise.unary.erf.erf
- eltwise.unary.erfinv.erfinv
- eltwise.unary.i0.i0
Expand Down Expand Up @@ -184,6 +196,7 @@ on:
- eltwise.unary.lgamma.lgamma
- eltwise.unary.lgamma.lgamma_sharded
- eltwise.unary.logit.logit
- eltwise.unary.logit.logit_forge
- eltwise.unary.logit.logit_sharded
- eltwise.unary.mish.mish
- eltwise.unary.mish.mish_sharded
Expand Down Expand Up @@ -326,10 +339,12 @@ on:
- data_movement.concat.concat_pytorch2
- data_movement.slice.slice_pytorch2_rm
- data_movement.slice.slice_pytorch2_tiled
- data_movement.slice.slice_forge
- data_movement.permute.permute
- data_movement.permute.permute_pytorch2_tiled
- data_movement.permute.permute_pytorch2_rm
- data_movement.transpose.transpose_pytorch2
- data_movement.transpose.transpose_forge
- data_movement.transpose.transpose_interleaved
- data_movement.transpose.t_pytorch2
- data_movement.copy.copy
Expand Down
9 changes: 1 addition & 8 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -253,18 +253,11 @@ add_link_options(
if(ENABLE_CODE_TIMERS)
add_compile_definitions(TT_ENABLE_CODE_TIMERS)
endif()
if(ENABLE_TRACY)
add_compile_definitions(TRACY_ENABLE)
add_compile_options(-fno-omit-frame-pointer)
add_link_options(-rdynamic)
endif()
include(tracy)

############################################################################################################################
# Build subdirectories
############################################################################################################################
if(ENABLE_TRACY)
include(tracy)
endif()

add_subdirectory(tt_metal)
add_subdirectory(ttnn)
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,17 @@
| [Falcon 7B](./models/demos/wormhole/falcon7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 71 | 17.6 | 26 | 563.2 | [v0.53.0-rc44](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc44) | |
| [Mistral 7B](./models/demos/wormhole/mistral7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | | 9.9 | 25 | 316.8 | [v0.51.0-rc28](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc28) | |
| [Mamba 2.8B](./models/demos/wormhole/mamba) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 48 | 12.3 | 41 | 393.6 | [v0.51.0-rc26](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc26) | |
| [Llama 3.1 8B](./models/demos/llama3) | 1 | [n150](https://tenstorrent.com/hardware/wormhole) | 202 | 28.6 | 23 | 28.6 | [v0.53.1-rc7](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc7) | |
| [Llama 3.2 1B](./models/demos/llama3) | 1 | [n150](https://tenstorrent.com/hardware/wormhole) | 71 | 90.8 | 160 | 90.8 | [v0.53.1-rc7](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc7) | |
| [Llama 3.2 3B](./models/demos/llama3) | 1 | [n150](https://tenstorrent.com/hardware/wormhole) | 112 | 49.1 | 60 | 49.1 | [v0.53.1-rc7](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc7) | |
| [Llama 3.1 8B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 151 | 22.8 | 23 | 729.6 | [v0.53.1-rc23](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc23) | |
| [Llama 3.2 1B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 56 | 55.1 | 160 | 1763.2 | [v0.53.1-rc23](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc23) | |
| [Llama 3.2 3B](./models/demos/llama3) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 96 | 35.0 | 60 | 1120.0 | [v0.53.1-rc23](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc23) | |
| [Falcon 7B (DP=8)](./models/demos/t3000/falcon7b) | 256 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 97 | 14.6 | 26 | 3737.6 | [v0.53.0-rc44](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc44) | |
| [Llama 3.1 70B (TP=8)](./models/demos/t3000/llama3_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 190 | 15.1 | 20 | 483.2 | [v0.53.0-rc36](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc36) | [384f179](https://github.com/tenstorrent/vllm/tree/384f1790c3be16e1d1b10de07252be2e66d00935) |
| [Falcon 40B (TP=8)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | 169.6 | [v0.53.1-rc7](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc7) | |
| [Falcon 40B (TP=8)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | 169.6 | [v0.53.1-rc23](https://github.com/tenstorrent/tt-metal/tree/v0.53.1-rc23) | |
| [Mixtral 8x7B (TP=8)](./models/demos/t3000/mixtral8x7b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 230 | 14.6 | 33 | 467.2 | [v0.53.0-rc44](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc44) | |
| [Falcon 7B (DP=32)](./models/demos/tg/falcon7b) | 1024 | [Galaxy](https://tenstorrent.com/hardware/galaxy) | 242 | 4.4 | 26 | 4505.6 | [v0.53.0-rc33](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc33) | |
| [Llama 3.1 70B (DP=4, TP=8)](./models/demos/t3000/llama3_70b) | 128 | [Galaxy](https://tenstorrent.com/hardware/galaxy) | 190 | 14.3 | 20 | 1835.5 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) | |

> **Last Update:** December 7, 2024
> **Last Update:** December 16, 2024
>
> **Notes:**
>
Expand Down Expand Up @@ -113,7 +113,7 @@ Get started with [simple kernels](https://docs.tenstorrent.com/tt-metalium/lates
- [Data Formats](./tech_reports/data_formats/data_formats.md) (updated Sept 7th)
- [Reconfiguring Data Formats](./tech_reports/data_formats/reconfig_data_format.md) (updated Oct 17th)
- [Handling special floating-point numbers](./tech_reports/Handling_Special_Value/special_values.md) (updated Oct 5th)
- [Allocator](./tech_reports/memory/allocator.md) (Updated Oct 30th)
- [Allocator](./tech_reports/memory/allocator.md) (Updated Dec 19th)
- [Tensor Layouts](./tech_reports/tensor_layouts/tensor_layouts.md) (updated Sept 6th)
- [Saturating DRAM Bandwidth](./tech_reports/Saturating_DRAM_bandwidth/Saturating_DRAM_bandwidth.md) (updated Sept 6th)
- [Flash Attention on Wormhole](./tech_reports/FlashAttention/FlashAttention.md) (updated Sept 6th)
Expand Down
18 changes: 13 additions & 5 deletions cmake/tracy.cmake
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
# Built as outlined in Tracy documentation (pg.12)
set(TRACY_HOME ${PROJECT_SOURCE_DIR}/tt_metal/third_party/tracy)

if(NOT ENABLE_TRACY)
# Stub Tracy::TracyClient to provide the headers which themselves provide stubs
add_library(TracyClient INTERFACE)
add_library(Tracy::TracyClient ALIAS TracyClient)
target_include_directories(TracyClient SYSTEM INTERFACE ${TRACY_HOME}/public)
return()
endif()

add_subdirectory(${TRACY_HOME})

set_target_properties(
TracyClient
PROPERTIES
EXCLUDE_FROM_ALL
TRUE
)

set_target_properties(
TracyClient
PROPERTIES
LIBRARY_OUTPUT_DIRECTORY
"${PROJECT_BINARY_DIR}/lib"
ARCHIVE_OUTPUT_DIRECTORY
Expand All @@ -22,6 +26,10 @@ set_target_properties(
"tracy"
)

target_compile_definitions(TracyClient PUBLIC TRACY_ENABLE)
target_compile_options(TracyClient PUBLIC -fno-omit-frame-pointer)
target_link_options(TracyClient PUBLIC -rdynamic)

# Our current fork of tracy does not have CMake support for these subdirectories
# Once we update, we can change this
include(ExternalProject)
Expand Down
16 changes: 15 additions & 1 deletion dependencies/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ endif()
# magic_enum : https://github.com/Neargye/magic_enum
############################################################################################################################

CPMAddPackage(NAME magic_enum GITHUB_REPOSITORY Neargye/magic_enum GIT_TAG v0.9.6)
CPMAddPackage(NAME magic_enum GITHUB_REPOSITORY Neargye/magic_enum GIT_TAG v0.9.7)

############################################################################################################################
# fmt : https://github.com/fmtlib/fmt
Expand All @@ -97,3 +97,17 @@ CPMAddPackage(NAME pybind11 GITHUB_REPOSITORY pybind/pybind11 GIT_TAG b8f28551cc
############################################################################################################################

CPMAddPackage(NAME json GITHUB_REPOSITORY nlohmann/json GIT_TAG v3.9.1)

############################################################################################################################
# xtensor : https://github.com/xtensor-stack/xtensor
############################################################################################################################

CPMAddPackage(NAME xtl GITHUB_REPOSITORY xtensor-stack/xtl GIT_TAG 0.7.7 OPTIONS "XTL_ENABLE_TESTS OFF")
CPMAddPackage(NAME xtensor GITHUB_REPOSITORY xtensor-stack/xtensor GIT_TAG 0.25.0 OPTIONS "XTENSOR_ENABLE_TESTS OFF")
CPMAddPackage(
NAME xtensor-blas
GITHUB_REPOSITORY xtensor-stack/xtensor-blas
GIT_TAG 0.21.0
OPTIONS
"XTENSOR_ENABLE_TESTS OFF"
)
2 changes: 1 addition & 1 deletion dockerfile/ubuntu-20.04-amd64.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TT-METAL UBUNTU 20.04 AMD64 DOCKERFILE
FROM ubuntu:20.04
FROM public.ecr.aws/ubuntu/ubuntu:20.04

ARG DEBIAN_FRONTEND=noninteractive
ENV DOXYGEN_VERSION=1.9.6
Expand Down
2 changes: 1 addition & 1 deletion dockerfile/ubuntu-22.04-amd64.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# TT-METAL UBUNTU 22.04 AMD64 DOCKERFILE
FROM ubuntu:22.04
FROM public.ecr.aws/ubuntu/ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
ARG UBUNTU_VERSION=22.04
Expand Down
2 changes: 1 addition & 1 deletion infra/VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# This is solely for helping with version management via our release system
# as of this writing
# change
v0.53.0
v0.54.0
3 changes: 1 addition & 2 deletions install_dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ FLAVOR=`grep '^ID=' /etc/os-release | awk -F= '{print $2}' | tr -d '"'`
VERSION=`grep '^VERSION_ID=' /etc/os-release | awk -F= '{print $2}' | tr -d '"'`
MAJOR=${VERSION%.*}
ARCH=`uname -m`
DEBIAN_FRONTEND="noninteractive"

usage()
{
Expand Down Expand Up @@ -122,7 +121,7 @@ install()
prep_ubuntu

echo "Installing packages..."
apt-get install -y --no-install-recommends "${UB_LIST[@]}"
DEBIAN_FRONTEND="noninteractive" apt-get install -y --no-install-recommends "${UB_LIST[@]}"
fi
}

Expand Down
6 changes: 6 additions & 0 deletions models/MODEL_UPDATES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
>
> Please refer to the front-page [README](../README.md) for the latest verified release for each model.
## December 16, 2024

### [Llama 3.1/3.2](demos/llama3)
- Added support for batch size 32 and the maximum context length (131072 tokens).
- Added full hardware compatibilty for the 1B/3B/8B/11B/70B models (all models are now compatible with N150, N300, QuietBox, Galaxy except for 70B which is only supported on QuietBox and Galaxy due to its large size).

## December 2, 2024

### [Llama 3.1/3.2](demos/llama3)
Expand Down
2 changes: 1 addition & 1 deletion models/demos/convnet_mnist/tests/test_performance.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def test_perf_device_bare_metal_convnet_mnist(batch_size, expected_perf):
subdir = "ttnn_convnet_mnist"
num_iterations = 1
margin = 0.03
expected_perf = 1753.5 if is_grayskull() else 2705.5
expected_perf = 1800 if is_grayskull() else 2800.5

command = f"pytest tests/ttnn/integration_tests/convnet_mnist/test_convnet_mnist.py"
cols = ["DEVICE FW", "DEVICE KERNEL", "DEVICE BRISC KERNEL"]
Expand Down
24 changes: 8 additions & 16 deletions models/demos/llama3/demo/simple_vision_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,22 +189,14 @@ def test_llama_multimodal_demo_text(
position_id = prefill_lens + gen_idx
next_token_tensor = next_tokens.reshape(max_batch_size, 1)

if enable_trace:
logits = generator.easy_trace(
position_id,
next_token_tensor,
batch_xattn_masks,
batch_text_masks,
xattn_caches,
)
else:
logits = generator.decode_forward(
position_id,
next_token_tensor,
batch_xattn_masks,
batch_text_masks,
xattn_caches,
)
logits = generator.decode_forward(
position_id,
next_token_tensor,
batch_xattn_masks,
batch_text_masks,
xattn_caches,
enable_trace=enable_trace,
)

next_tokens, next_texts = sampler(logits)
# Update next token
Expand Down
Loading

0 comments on commit 48d9f6b

Please sign in to comment.