Enabling op model interface for constraints and L1 usage. #1554

mbezuljTT · 2024-12-10T16:41:00Z

This PR plumbs OpModelInterface to the underlying tt-metal op queries for validation and L1 memory consumption.

TTNNOpModelInterface.td getOpConstraints takes input(s) and output TTNNLayoutAttr and returns a tuple of three values:

A boolean indicating if the op is legal for the given input/output layouts.
If the op is legal, a tuple of three values representing the op memory L1 usage estimate in bytes.
- The first value is the CB L1 peak allocation in bytes.
- The second value is the Tensor L1 peak allocation in bytes.
- The third value is the Output L1 buffer allocation in bytes.
If the op is illegal, a string describing the failure.

TTNNOpModelInterface.cpp implements hooks to the wrapper library 'TTNNOpModelLib' (where metal API is). Per each op, implementation takes

tensor shapes (llvm::ArrayRef<>) from its operands,
worker grid (used for virtual to physical cores conversion),
op specific params (like softmax dimension), and
with layouts TTNNLayoutsAttr
and pass them to the wrapper library TTNNOpModelLib.

TTNNOpModelLib converts mlir structures to metal structures, and calls into underlying 'tt-metal' op interface.

Underlying tt-metal op interface ::ttnn::graph::query_op_constraints(..) consumes a target op (e.g. 'ttnn::relu') and it's arguments in the order of op implemented ::invoke function that we are targeting.

Implemented SingletonDeviceContext to avoid constant opening/closing device. This class should ensure opened device is a mockup device when it's implemented on the tt-metal side (tenstorrent/tt-metal#14000)

Added 3 types of unit tests:

TestConversion - tests conversion of the MLIR to TTNN types
TestOpModelLib - tests interface to metal API
TestOpModelInterface - tests interface built in metal ops

Due to differences in tt-metal and LLVM project setups (compiler standard, exceptions) these are implemented as the place Google unit test. Unlike other unit tests that are also Google unit tests but wrapped into LLVM (and invoked using llvm-lit).

As these tests require TT hardware (until mockup device is implemented), changed Build tt-mlir op_model flavour to use n300 runners.

Additionally, wired op model interface in the ShardSolver; mnist_sharded.mlir compiles and runs. @odjuricicTT confirmed found solution is the one we expected.

Internal doc describing more details can be found here

lib/OpModel/TTNN/TTNNOpModelLib.cpp

lib/Dialect/TTNN/Analysis/ShardSolver.cpp

mbezuljTT · 2024-12-11T10:32:23Z

Q: if validation fails, we get an exception message from tt-metal, should we wire this back to the API caller? shall we make IsLegal returns a tuple<bool, optionalstd::string> ? @nobradovictt @odjuricicTT

test/unittests/OpModel/TestOpModelLib.cpp

odjuricicTT · 2024-12-11T11:26:06Z

Q: if validation fails, we get an exception message from tt-metal, should we wire this back to the API caller? shall we make IsLegal returns a tuple<bool, optionalstd::string> ?

I think it does make sense in the long run, we had something similar on Buda. Tho i don't know what is the scope of a change like this? We should definitely prioritize having something working e2e first.

mbezuljTT · 2024-12-11T12:08:40Z

Q: if validation fails, we get an exception message from tt-metal, should we wire this back to the API caller? shall we make IsLegal returns a tuple<bool, optionalstd::string> ?

I think it does make sense in the long run, we had something similar on Buda. Tho i don't know what is the scope of a change like this? We should definitely prioritize having something working e2e first.

It's a simple change. The error message is still a human readable string, unusable to the compiler, but maybe usable to the people running/debugging compiler. Moving to program-friendly error message would be much harder problem.

test/unittests/OpModel/TTNN/Lib/TestOpModelLib.cpp

.github/workflows/build-and-test.yml

odjuricicTT · 2024-12-25T11:26:23Z

The third value is the Output L1 buffer allocation in bytes.

@mbezuljTT What will the third value be if the op is DPS (the output tensor is pre-allocated and passed in as an arg)?

odjuricicTT

Went trought half the changes, will continue after lunch.

.github/workflows/build-and-test.yml

.github/actions/build-tt-mlir-action/action.yaml

mbezuljTT · 2024-12-25T12:29:22Z

The third value is the Output L1 buffer allocation in bytes.

@mbezuljTT What will the third value be if the op is DPS (the output tensor is pre-allocated and passed in as an arg)?

Op query functions implemented at TTNNOpModelLib are not DPS, therefore, it would be size of the output tensor anyway. however, peak usage might be wrong in this case as it might include output tensor size (which is probably not what you want).

when Ops really become DPS, you would want to change how op is invoked in the TTNNOpModelLib.cpp to DPS as well. When you do that, third value would become zero, but you can use another graph capture around create_device_tensor for output allocation to get it's size.

odjuricicTT

Looks good! Thanks for pushing this all the way :)

I have a few more details to look over tomorrow.

lib/Dialect/TTNN/Analysis/ShardSolver.cpp

odjuricicTT · 2024-12-25T14:31:03Z

lib/Dialect/TTNN/Analysis/ShardSolver.cpp

+    for (uint32_t i = 0; i < numOperands; i++) {
+      auto operand = consumerOp->getOperand(i);
+      auto input = mlir::cast<RankedTensorType>(operand.getType());
+
+      if ((inputUnderCheckFound == false) &&
+          (inputUnderCheck.getShape() == input.getShape())) {
+        // this is the input we are checking compatibility for
+        inputUnderCheckFound = true;
+        inputLayouts.push_back(producerLayout);
+      } else {
+        // this is the other input that we DRAM interleave
+
+        // what if it is tilized already?
+        auto elementType =
+            TileType::get(consumerOp->getContext(), input.getElementType());
+
+        auto layout = TTNNLayoutAttr::get(
+            consumerOp->getContext(), input.getShape(), elementType,
+            BufferType::DRAM, workerGrid,
+            TensorMemoryLayoutAttr::get(consumerOp->getContext(),
+                                        TensorMemoryLayout::Interleaved));
+        inputLayouts.push_back(layout);
+      }
+    }


As discussed offline, a bit of cleanup is needed here. I'll do this in a follow up PR.

lib/Dialect/TTNN/Analysis/ShardSolver.cpp

CI fixes

nobradovictt reviewed Dec 11, 2024

View reviewed changes

lib/OpModel/TTNN/TTNNOpModelLib.cpp Outdated Show resolved Hide resolved

nobradovictt reviewed Dec 11, 2024

View reviewed changes

lib/Dialect/TTNN/Analysis/ShardSolver.cpp Show resolved Hide resolved

odjuricicTT reviewed Dec 11, 2024

View reviewed changes

test/unittests/OpModel/TestOpModelLib.cpp Outdated Show resolved Hide resolved

nobradovictt reviewed Dec 13, 2024

View reviewed changes

test/unittests/OpModel/TTNN/Lib/TestOpModelLib.cpp Outdated Show resolved Hide resolved

mbezuljTT force-pushed the mbezulj/2411-opmodel-plumbing-mnist-ops branch 6 times, most recently from b63a86a to 99e5c29 Compare December 24, 2024 16:30

mbezuljTT marked this pull request as ready for review December 24, 2024 17:18

mbezuljTT requested review from sdjordjevicTT, svuckovicTT, mtopalovicTT, jserbedzijaTT, jnie-TT, azecevicTT, nsmithtt, mrakitaTT, vmilosevic and tapspatel as code owners December 24, 2024 17:18

vmilosevic reviewed Dec 24, 2024

View reviewed changes

.github/workflows/build-and-test.yml Show resolved Hide resolved

vmilosevic approved these changes Dec 24, 2024

View reviewed changes

odjuricicTT reviewed Dec 25, 2024

View reviewed changes

.github/workflows/build-and-test.yml Outdated Show resolved Hide resolved

.github/actions/build-tt-mlir-action/action.yaml Show resolved Hide resolved

odjuricicTT approved these changes Dec 25, 2024

View reviewed changes

mbezuljTT added 22 commits December 30, 2024 14:06

refactor toFlatbuffer CoreRangeSet to be utilized elsewhere

b091386

Fix CoreRangeSet conversion

2ee795e

nitfix for readability

678cddc

fixes after rebase

f6ac00e

SingleDeviceCreateVirtualToPhysicalLayoutMap

c20dda8

fix CI

1027ee2

sharded mnist compiles with optimizer

b46bed3

nit fixes

cd4ae31

op model test

009c2c1

fix CI tests

4d87a2a

fix tidy

6d27ff5

Update op model test

73104ef

CI fixes

set TT_METAL_HOME and ARCH_NAME at build time.

627cdca

cleanup

af2779b

copy build with different container settings

89eef0c

try composite action

33cdafd

enable_op_model -> enable-op-model

f55a633

addressing comments

8ce2839

addressing comments

4415a25

OSError

2dca507

fix build

9b501fa

fix toCoreRangeSet and virtual/physical grids

7c8b8a5

mbezuljTT force-pushed the mbezulj/2411-opmodel-plumbing-mnist-ops branch from bcfb884 to 7c8b8a5 Compare December 30, 2024 14:06

mbezuljTT added 5 commits December 30, 2024 14:13

fix TTMLIR_ENABLE_OPMODEL=OFF

1091199

addressing clang-tidy

029dfdc

fix CI

7a8dbbb

fix SYSTEM_DESC_PATH

e72144f

set SYSTEM_DESC_PATH only when system_desc exists

47c9539

vmilosevic merged commit 3745a88 into main Dec 31, 2024
20 checks passed

vmilosevic deleted the mbezulj/2411-opmodel-plumbing-mnist-ops branch December 31, 2024 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling op model interface for constraints and L1 usage. #1554

Enabling op model interface for constraints and L1 usage. #1554

mbezuljTT commented Dec 10, 2024 •

edited

Loading

mbezuljTT commented Dec 11, 2024

odjuricicTT commented Dec 11, 2024

mbezuljTT commented Dec 11, 2024

odjuricicTT commented Dec 25, 2024

odjuricicTT left a comment

mbezuljTT commented Dec 25, 2024

odjuricicTT left a comment

odjuricicTT Dec 25, 2024

Enabling op model interface for constraints and L1 usage. #1554

Enabling op model interface for constraints and L1 usage. #1554

Conversation

mbezuljTT commented Dec 10, 2024 • edited Loading

mbezuljTT commented Dec 11, 2024

odjuricicTT commented Dec 11, 2024

mbezuljTT commented Dec 11, 2024

odjuricicTT commented Dec 25, 2024

odjuricicTT left a comment

Choose a reason for hiding this comment

mbezuljTT commented Dec 25, 2024

odjuricicTT left a comment

Choose a reason for hiding this comment

odjuricicTT Dec 25, 2024

Choose a reason for hiding this comment

mbezuljTT commented Dec 10, 2024 •

edited

Loading