Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metal Direct] Implement div op #780

Merged
merged 3 commits into from
Oct 1, 2024
Merged

Conversation

pjanevskiTT
Copy link
Contributor

Fixes #535

Implement Div op for tt-metal backend.

Important notes for review

  • Div op is implemented using recip + mul. Input 1 of the div is going to go through the recip using DST register, and then input 0 and recip input 1 are multiplied to get the div.

  • To avoid intermediate buffers, input 1 CB is used to pack recip tiles of the original input 1. Then input 0 and input 1 are regularly used as inputs for multiply part

  • More code is moved to convert compute init and convert compute functions, since the code that was outside of those functions was mostly for eltwise ops, and we need to do recip now as well, we cannot keep that code outside of functions

Generated kernel

#include <cstdint>
#include "compute_kernel_api/common.h"
#include "compute_kernel_api/tilize.h"
#include "compute_kernel_api/untilize.h"
#include "compute_kernel_api/eltwise_binary.h"
#include "compute_kernel_api/tile_move_copy.h"
#include "compute_kernel_api/eltwise_unary/sfpu_split_includes.h"
#include "compute_kernel_api/eltwise_unary/recip.h"
namespace NAMESPACE {
void kernel_main() {
  ::tt::CB v1 = ::tt::CB::c_in0;
  ::tt::CB v2 = ::tt::CB::c_in1;
  ::tt::CB v3 = ::tt::CB::c_out0;
  binary_op_init_common(v1, v2, v3);
  mul_tiles_init_f();
  int32_t v4 = 0;
  int32_t v5 = 0;
  int32_t v6 = 2;
  int32_t v7 = 1;
  int32_t v8;
  v8 = v4;
  for (int32_t v9 = v5; v9 < v6; v9 += v7) {
    int32_t v10 = 0;
    int32_t v11 = 4;
    int32_t v12 = 1;
    int32_t v13;
    v13 = v8;
    for (int32_t v14 = v10; v14 < v11; v14 += v12) {
      int32_t v15 = 0;
      int32_t v16 = 1;
      copy_tile_to_dst_init_short();
      cb_reserve_back(v2, v16);
      tile_regs_acquire();
      recip_tile_init();
      copy_tile(v2, v15, v15);
      recip_tile(v15);
      tile_regs_commit();
      tile_regs_wait();
      pack_tile(v15, v2, v15);
      tile_regs_release();
      cb_push_back(v2, v16);
      cb_wait_front(v2, v16);
      mul_tiles_init(v1, v2);
      int32_t v17 = 0;
      tile_regs_acquire();
      mul_tiles(v1, v2, v13, v17, v17);
      tile_regs_commit();
      tile_regs_wait();
      pack_tile(v17, v3, v13);
      tile_regs_release();
      int32_t v18 = 1;
      cb_pop_front(v2, v18);
      int32_t v19 = 1;
      uint32_t v20 = (uint32_t) v13;
      uint32_t v21 = (uint32_t) v19;
      uint32_t v22 = v20 + v21;
      int32_t v23 = (int32_t) v22;
      v13 = v23;
    };
    int32_t v24 = 0;
    v8 = v13;
  }
  return;
}

void MAIN { kernel_main(); }
}

Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Minor comments inline.

ttrt-artifacts/system_desc.ttsys Outdated Show resolved Hide resolved
lib/Dialect/TTKernel/IR/TTKernelOps.cpp Show resolved Hide resolved
@nsmithtt
Copy link
Contributor

Clang tidy is a little pedantic:

/__w/tt-mlir/tt-mlir/lib/Dialect/TTKernel/IR/TTKernelOps.cpp:8:1: error: #includes are not sorted properly [llvm-include-order,-warnings-as-errors]
    8 | #include "mlir/IR/BuiltinOps.h"
      | ^        ~~~~~~~~~~~~~~~~~~~~~~
      |          "mlir/Dialect/SCF/IR/SCF.h"
    9 | #include "ttmlir/Dialect/TT/IR/TT.h"
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |          "mlir/IR/BuiltinOps.h"
   10 | #include "ttmlir/Dialect/TTKernel/IR/TTKernel.h"
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |          "ttmlir/Dialect/TT/IR/TT.h"
   11 | #include "ttmlir/Dialect/TTMetal/IR/TTMetalOps.h"
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |          "ttmlir/Dialect/TTKernel/IR/TTKernel.h"
   12 | #include "mlir/Dialect/SCF/IR/SCF.h"
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |          "ttmlir/Dialect/TTMetal/IR/TTMetalOps.h"

include/ttmlir/Dialect/TTKernel/IR/TTKernelOps.td Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
@pjanevskiTT pjanevskiTT force-pushed the pjanevski/generic_metal_div branch 4 times, most recently from 60de972 to 99e1b20 Compare September 24, 2024 18:47
@pjanevskiTT
Copy link
Contributor Author

@kmitrovicTT I have rebased on main, since I have changed few things about flow of generating compute init and compute, take a look at that changes

@pjanevskiTT pjanevskiTT force-pushed the pjanevski/generic_metal_div branch from 99e1b20 to 55afd59 Compare September 25, 2024 06:37
include/ttmlir/Dialect/TTKernel/IR/TTKernelOps.td Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
lib/Conversion/TTIRToTTMetal/TTIRToTTMetal.cpp Outdated Show resolved Hide resolved
@pjanevskiTT pjanevskiTT force-pushed the pjanevski/generic_metal_div branch from 6c62439 to 20c455e Compare September 26, 2024 10:29
@pjanevskiTT pjanevskiTT force-pushed the pjanevski/generic_metal_div branch from 4bea10c to 6f052fe Compare October 1, 2024 06:48
@pjanevskiTT pjanevskiTT merged commit f983235 into main Oct 1, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metal Direct] Eltwise div
4 participants