Support lowering add and mul through ttir.generic metal backend #468

nsmithtt · 2024-08-22T02:14:17Z

The core of this change is generating a loop nest from arith on tensors, consider the following ttir.generic body:

  ^bb0(%arg2: tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>>, %arg3, %arg4):
    %8 = arith.addf %arg2, %arg3 : tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>>
    "ttir.yield"(%8) : (tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>>) -> ()
  })

Into a loop nest using the scf dialect:

  "ttkernel.binary_op_init_common"(%arg2, %arg3, %arg4)
  "ttkernel.add_tiles_init"(%arg2, %arg3)
  %8 = scf.for %arg5 = %c0_i32 to %c2_i32 step %c1_i32 iter_args(%arg6 = %c0_i32) -> (i32)  : i32 {
    %9 = scf.for %arg7 = %c0_i32 to %c4_i32 step %c1_i32 iter_args(%arg8 = %arg6) -> (i32)  : i32 {
      "ttkernel.tile_regs_acquire"() : () -> ()
      "ttkernel.add_tiles"(%arg2, %arg3, %arg8, %arg8, %c0_i32)
      "ttkernel.tile_regs_commit"() : () -> ()
      "ttkernel.tile_regs_wait"() : () -> ()
      "ttkernel.pack_tile"(%c0_i32, %arg4, %arg8)
      "ttkernel.tile_regs_release"() : () -> ()
      %10 = arith.addi %arg8, %c1_i32 : i32
      scf.yield %10 : i32
    }
    scf.yield %9 : i32
  }
  "ttkernel.return"() : () -> ()

nsmithtt · 2024-08-26T15:33:27Z

If someone has a chance to take a look at this review that'd be great!

lib/Dialect/TTMetal/Transforms/Passes.cpp

rpavlovicTT · 2024-08-27T10:53:46Z

Nice work! Looks good to me, minor comments.

rpavlovicTT · 2024-08-28T08:27:40Z

lib/Dialect/TTMetal/Transforms/Passes.cpp

+
+    // Build the inner loop compute / unpack / pack
+    {
+      Value output = computeBlock->getArgument(numDPSInputs);


Here we don't do push_back/pop_front from CBs, did you intentionally skip it?

Yes intentionally skipped, currently only works in buffer mode alias. When we add support for buffer mode stream we'll need to generate the cb push/pops for the streaming inputs.

The core of this change is generating a loop nest from arith on tensors, consider the following `ttir.generic` body: ^bb0(%arg2: tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>>, %arg3, %arg4): %8 = arith.addf %arg2, %arg3 : tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>> "ttir.yield"(%8) : (tensor<64x128xf32, #tt.buffer<memref<2x4x!tt.tile<32x32, f32>, #l1_>, alias>>) -> () }) Into a loop nest using the scf dialect: "ttkernel.binary_op_init_common"(%arg2, %arg3, %arg4) "ttkernel.add_tiles_init"(%arg2, %arg3) %8 = scf.for %arg5 = %c0_i32 to %c2_i32 step %c1_i32 iter_args(%arg6 = %c0_i32) -> (i32) : i32 { %9 = scf.for %arg7 = %c0_i32 to %c4_i32 step %c1_i32 iter_args(%arg8 = %arg6) -> (i32) : i32 { "ttkernel.tile_regs_acquire"() : () -> () "ttkernel.add_tiles"(%arg2, %arg3, %arg8, %arg8, %c0_i32) "ttkernel.tile_regs_commit"() : () -> () "ttkernel.tile_regs_wait"() : () -> () "ttkernel.pack_tile"(%c0_i32, %arg4, %arg8) "ttkernel.tile_regs_release"() : () -> () %10 = arith.addi %arg8, %c1_i32 : i32 scf.yield %10 : i32 } scf.yield %9 : i32 } "ttkernel.return"() : () -> ()

nsmithtt requested review from sdjordjevicTT, rpavlovicTT, mrakitaTT and nobradovictt as code owners August 22, 2024 02:14

rpavlovicTT reviewed Aug 27, 2024

View reviewed changes

rpavlovicTT approved these changes Aug 28, 2024

View reviewed changes

nsmithtt force-pushed the nsmith/kernel11 branch from 3027752 to 9396436 Compare August 28, 2024 21:14

nsmithtt merged commit 82c079b into main Aug 29, 2024
13 checks passed

nsmithtt deleted the nsmith/kernel11 branch August 29, 2024 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support lowering add and mul through ttir.generic metal backend #468

Support lowering add and mul through ttir.generic metal backend #468

nsmithtt commented Aug 22, 2024 •

edited

Loading

nsmithtt commented Aug 26, 2024

rpavlovicTT commented Aug 27, 2024

rpavlovicTT Aug 28, 2024

nsmithtt Aug 28, 2024

Support lowering add and mul through ttir.generic metal backend #468

Support lowering add and mul through ttir.generic metal backend #468

Conversation

nsmithtt commented Aug 22, 2024 • edited Loading

nsmithtt commented Aug 26, 2024

rpavlovicTT commented Aug 27, 2024

rpavlovicTT Aug 28, 2024

Choose a reason for hiding this comment

nsmithtt Aug 28, 2024

Choose a reason for hiding this comment

nsmithtt commented Aug 22, 2024 •

edited

Loading