Add sharding support in ttnn backend #541

jnie-TT · 2024-08-29T03:34:55Z

Related to #450 as this enables multi-core runs if sharding is feasible.

Related to #518 as this may rely on the compiler to generate legal memory layouts in the future.

Added sharding support in ttnn backend, bootstrapped sharding attrs in compiler. Currently since the compiler cannot dynamically generate legal memory layouts, runtime will infer the memory layout for tensors - if tensor resides in l1 and its shard shape is divisible by tile shape, then create sharded memory config, else use interleaved as before.

Added a couple of sharding tests.

nobradovictt · 2024-08-29T08:18:24Z

Thanks for making this change @jnie-TT, right at the moment when we need it! Minor comments left.

nobradovictt · 2024-08-29T07:43:28Z

lib/Dialect/TT/IR/TTOpsTypes.cpp

@@ -604,14 +611,24 @@ LayoutAttr LayoutAttr::withElementType(::mlir::MLIRContext *context,
                                       Type elementType) {
  return LayoutAttr::get(
      context, getLinear(), getOobVal(), getGrid(),
-      buildMemRef(context, getShardShape(), elementType, getMemorySpace()));
+      buildMemRef(context, getShardShape(), elementType, getMemorySpace()),
+      getMemlayout());


Nit: getMemLayout would be alligned with rest of attrs.

nobradovictt · 2024-08-29T07:49:21Z

lib/Dialect/TTNN/IR/TTNNOps.cpp

@@ -27,6 +30,26 @@ ::mlir::LogicalResult mlir::tt::ttnn::ToMemoryConfigOp::verify() {
  if (not outputLayout) {
    return emitOpError("Output tensor type missing layout attribute");
  }
+
+  // This will always be false for now until the compiler optimizer supports it


Could you provide a bit more information why? Ok Optimizer does not generate it today(row/height sharding), but would something break if it would?

It wouldn't break anything as long as the generated memory layout is correct. This is a pass in the verifier to make sure that the compiler is generating the correct memory layout. Currently it checks that if the memory layout is sharded it must be block sharded, and the shard shape must divide the tile shape evenly (which is asserted in ttnn).

Then we should make verifier process only if (outputLayout.getMemlayout() ==
::mlir::tt::TensorMemoryLayout::BlockSharded) and not throw error for others? Or you meant it as a forcing function when height shard is generated to come here and update code with checks for other types?

nobradovictt · 2024-08-29T08:13:11Z

runtime/lib/ttnn/program.cpp

+  ::ttnn::Tensor result;
+  if (isOnHost(inputTensor)) {
+    result =
+        updateLayoutAndDataType(inputTensor, targetDataTypeTTNN, false, true);


Could you name const params via comments at least true /* paramName */?

Just as a side note this is waiting on compiler support as well - ideally we want to determine whether we want to tilize based on the tile shape. However seems that currently the tile shape is always 1x1.

runtime/lib/ttnn/program.cpp

nobradovictt · 2024-08-29T08:16:39Z

test/ttmlir/Silicon/TTNN/sharded/simple_relu_sharded.mlir

+    // CHECK: %[[C:.*]] = "ttnn.empty"[[C:.*]]
+    %0 = tensor.empty() : tensor<256x512xf32>
+    // CHECK: %[[C:.*]] = "ttnn.relu"[[C:.*]]
+    %1 = "ttir.relu"(%arg0, %0) <{operandSegmentSizes = array<i32: 1, 1>, operand_constraints = [#any_device, #any_device]}> : (tensor<256x512xf32>, tensor<256x512xf32>) -> tensor<256x512xf32>


Where do we check that layout is indeed sharded?

Currently it's inferred by the runtime, later the runtime will just check the TensorMemoryLayout. But since the TensoryMemoryLayout is always UnDef for now, the runtime will check the shard shape and memory space - if the shard shape divide the tile shape evenly and the memory space is l1, then the runtime will shard the tensor implicitly. This is a hacky workaround until the compiler can generate the correct memory layouts

nobradovictt · 2024-08-29T08:30:16Z

Related to #450 as this enables multi-core runs if sharding is feasible.

Related to #518 as this may rely on the compiler to generate legal memory layouts in the future.

Added sharding support in ttnn backend, bootstrapped sharding attrs in compiler. Currently since the compiler cannot dynamically generate legal memory layouts, runtime will infer the memory layout for tensors - if tensor resides in l1 and its shard shape is divisible by tile shape, then create sharded memory config, else use interleaved as before.

Added a couple of sharding tests.

We can run multicore in DRAM interleaved mode as well?

nobradovictt · 2024-08-29T09:28:27Z

Related to #450 as this enables multi-core runs if sharding is feasible.

Related to #518 as this may rely on the compiler to generate legal memory layouts in the future.

Added sharding support in ttnn backend, bootstrapped sharding attrs in compiler. Currently since the compiler cannot dynamically generate legal memory layouts, runtime will infer the memory layout for tensors - if tensor resides in l1 and its shard shape is divisible by tile shape, then create sharded memory config, else use interleaved as before.

Added a couple of sharding tests.

With current default block sharding mode, will we perform any tensor deallocation or they will remain till end of execution? Is there any open issue to provide support to compiler for tensor alloc/dealloc?

jnie-TT · 2024-08-29T15:43:26Z

Related to #450 as this enables multi-core runs if sharding is feasible.
Related to #518 as this may rely on the compiler to generate legal memory layouts in the future.
Added sharding support in ttnn backend, bootstrapped sharding attrs in compiler. Currently since the compiler cannot dynamically generate legal memory layouts, runtime will infer the memory layout for tensors - if tensor resides in l1 and its shard shape is divisible by tile shape, then create sharded memory config, else use interleaved as before.
Added a couple of sharding tests.

We can run multicore in DRAM interleaved mode as well?

That's correct. In fact, we try to run it on the whole compute_with_storage_grid for unary ops.

jnie-TT · 2024-08-29T15:46:49Z

Related to #450 as this enables multi-core runs if sharding is feasible.
Related to #518 as this may rely on the compiler to generate legal memory layouts in the future.
Added sharding support in ttnn backend, bootstrapped sharding attrs in compiler. Currently since the compiler cannot dynamically generate legal memory layouts, runtime will infer the memory layout for tensors - if tensor resides in l1 and its shard shape is divisible by tile shape, then create sharded memory config, else use interleaved as before.
Added a couple of sharding tests.

With current default block sharding mode, will we perform any tensor deallocation or they will remain till end of execution? Is there any open issue to provide support to compiler for tensor alloc/dealloc?

The tensors are by default stored until the end of execution (end of submit, when tensorPool goes out of scope). I agree that we should perform dynamic alloc dealloc once the tensors are not needed anymore (maybe add a dealloc op on the compiler side), or else we could run out of L1 space if we have a chain of ops. I don't know if there's an active effort for supporting this ATM, I can create an issue to track it so we don't forget.

Update: Created issue #553

rpavlovicTT · 2024-09-02T16:15:03Z

lib/Dialect/TTMetal/Transforms/Passes.cpp

+    auto [isLayoutChange, isGridChange, isFormatChange, isMemorySpaceChange,
+          isMemoryLayoutChange] = op.compoundComponents();


please create a struct encompassing all these changes instead of expanding tuple

rpavlovicTT · 2024-09-02T16:15:45Z

lib/Dialect/TTIR/IR/TTIROps.cpp

@@ -32,7 +32,7 @@ ::mlir::LogicalResult mlir::tt::ttir::ToLayoutOp::verify() {
  return success();
 }

-std::tuple<bool, bool, bool, bool>
+std::tuple<bool, bool, bool, bool, bool>


see comment below and change this function to return created struct

rpavlovicTT · 2024-09-02T16:19:05Z

include/ttmlir/Dialect/TTIR/IR/TTIROps.td

@@ -90,7 +90,7 @@ def TTIR_ToLayoutOp : TTIR_Op<"to_layout", [DestinationStyleOpInterface, TTIROpI
        // return {OperandConstraint::Any, OperandConstraint::Any};
      }
      // Returns a tuple of booleans indicating if the op changes layout, grid, format, or memory space.


update comment

nsmithtt · 2024-09-02T17:24:56Z

lib/Dialect/TT/IR/TTOpsTypes.cpp

@@ -573,6 +573,13 @@ mlir::Type LayoutAttr::getScalarElementType() const {
  return elementType;
 }

+bool LayoutAttr::isSharded() const {


I think we need to call this hasShardedTensorMemoryLayout, because isSharded has a very different meaning for direct to metal path.

nsmithtt · 2024-09-02T17:27:57Z

include/ttmlir/Dialect/TT/IR/TTOpsEnums.td

+def TT_WidthSharded : I32EnumAttrCase<"WidthSharded", 4, "width_sharded">;
+def TT_BlockSharded : I32EnumAttrCase<"BlockSharded", 5, "block_sharded">;
+
+def TT_TensorMemoryLayout : I32EnumAttr<"TensorMemoryLayout", "TT TensorMemoryLayout",


@rpavlovicTT / @nobradovictt for direct to metal path we will not be using this enum at all, in fact we should always assert it's UndefLayout. It just seemed like more work/more complicated to specialize the LayoutAttr for TTNN in some way, esp since the optimizer will be written against TTIR dialect and this is one of the main optimizations. I wonder if there's another way we could solve this issue though, it's definitely going to lead to confusion. Perhaps at least we should name this enum TTNNTensorMemoryLayout despite it living in the TT dialect.

The truly "proper" way to solve this would be to make LayoutAttr an interface, where we can instantiate either a TTNN based layout or a direct to metal based layout, it's just a decent size refactor for a single enum. It's one of those things though where we could keep making exceptions and end up in a situation where we need the refactor anyway.

Maybe we can decide on proper refactor for this when we introduce D Metal backend support to Optimizer. In the meantime as you suggested we can assert in DMetal pipeline that this is UndefLayout, I am even fine with renaming it to TTNNTensorMemoryLayout for now.

I agree, we can go forward with renaming the attribute. However, I'd definitely refactor it as soon as we find dev cycles for it. We should strive for not leaving too much debt behind.

Created issue #596 to track this

nsmithtt · 2024-09-02T17:29:18Z

include/ttmlir/Dialect/TT/IR/TTOpsEnums.td

@@ -72,6 +72,26 @@ def TT_MemorySpace : I32EnumAttr<"MemorySpace", "TT MemorySpace",
  let cppNamespace = "::mlir::tt";
 }

+def TT_UndefLayout : I32EnumAttrCase<"UndefLayout", 0, "undef_layout">; // For host tensors


Maybe we should call this None to match flatbuffer enum definition. Also this comment "For host tensors" isn't strictly true, it'll also always be None/Undef for direct to metal path.

…trap.

jnie-TT requested review from kmabeeTT, AleksKnezevic, sdjordjevicTT, rpavlovicTT, nobradovictt, mrakitaTT, tapspatel and nsmithtt as code owners August 29, 2024 03:34

jnie-TT force-pushed the jnie/ttnn_sharding_rebased branch from 74636f4 to ad2d754 Compare August 29, 2024 03:37

nobradovictt approved these changes Aug 29, 2024

View reviewed changes

odjuricicTT mentioned this pull request Aug 29, 2024

Add V0 tensor layout generation #518

Merged

odjuricicTT approved these changes Aug 30, 2024

View reviewed changes

rpavlovicTT reviewed Sep 2, 2024

View reviewed changes

nsmithtt reviewed Sep 2, 2024

View reviewed changes

jnie-TT force-pushed the jnie/ttnn_sharding_rebased branch 4 times, most recently from 7dfffc6 to 8b46052 Compare September 3, 2024 02:15

jnie-TT added 5 commits September 3, 2024 14:16

#450: TTNN backend sharding support, compiler sharding initial boots…

7270318

…trap.

Add to op constraint, propagate layout from compiler, update tests

94f85f4

Add default device memory layout since should be undef always for metal

61fa439

Use none_layout, rename isSharded to hasShardedTensorMemoryLayout

e5822b8

Wrap compound components in struct instead of tuple

3262f11

jnie-TT force-pushed the jnie/ttnn_sharding_rebased branch 2 times, most recently from 67a1a30 to d7e10a1 Compare September 3, 2024 15:38

Eltwise sharding tests

04612f9

jnie-TT force-pushed the jnie/ttnn_sharding_rebased branch from d7e10a1 to 04612f9 Compare September 3, 2024 16:10

jnie-TT merged commit c75811b into main Sep 3, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sharding support in ttnn backend #541

Add sharding support in ttnn backend #541

jnie-TT commented Aug 29, 2024 •

edited

Loading

nobradovictt commented Aug 29, 2024

nobradovictt Aug 29, 2024

nobradovictt Aug 29, 2024

jnie-TT Aug 29, 2024

nobradovictt Aug 30, 2024

nobradovictt Aug 29, 2024

jnie-TT Aug 29, 2024

nobradovictt Aug 29, 2024

jnie-TT Aug 29, 2024

nobradovictt commented Aug 29, 2024

nobradovictt commented Aug 29, 2024

jnie-TT commented Aug 29, 2024

jnie-TT commented Aug 29, 2024 •

edited

Loading

rpavlovicTT Sep 2, 2024

rpavlovicTT Sep 2, 2024

rpavlovicTT Sep 2, 2024

nsmithtt Sep 2, 2024

nsmithtt Sep 2, 2024

nsmithtt Sep 2, 2024

nobradovictt Sep 3, 2024

rpavlovicTT Sep 3, 2024

jnie-TT Sep 3, 2024

nsmithtt Sep 2, 2024

		auto [isLayoutChange, isGridChange, isFormatChange, isMemorySpaceChange,
		isMemoryLayoutChange] = op.compoundComponents();

Add sharding support in ttnn backend #541

Add sharding support in ttnn backend #541

Conversation

jnie-TT commented Aug 29, 2024 • edited Loading

nobradovictt commented Aug 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nobradovictt commented Aug 29, 2024

nobradovictt commented Aug 29, 2024

jnie-TT commented Aug 29, 2024

jnie-TT commented Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnie-TT commented Aug 29, 2024 •

edited

Loading

jnie-TT commented Aug 29, 2024 •

edited

Loading