-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hoist mechanism for specific TTIR ops, with placeholder analysis pass #1586
Conversation
cbded69
to
08b45f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vwellsTT, this looks great, couple of things we should add:
- Let's add more context to the PR description. Let's call out that this is part _ of ?? parts with some additional sentences describing that this is part of CPU fallback path. It's probably worth having a top level statement that outlines the whole plan broken into parts and then stating this PR tackles part _ of above plan.
- We need a directed test proving that the hoist pass hoisted correctly
08b45f6
to
d2e3e8a
Compare
} | ||
} | ||
|
||
// If no CPU module exists, create one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intention of splitting the IR into two different modules? Do we need this split? Currently, we have a very simple IR structure with one module and many functions, hence all our passes are working on a single module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the hoisted funcs will be lowered via a separate pipeline, so I don't think we want any of the existing passes running on them. @nsmithtt originally proposed splitting CPU funcs into new module I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the intention for splitting into a new module is that we can have a 1-1 mapping of module to device target type. E.g. the main module is tagged with a system desc, i.e. all the functions within that module are intended to run on device/TTNN framework, a CPU module is tagged with a CPU target triple so organizationally it's a bit easier to read / discover that all functions inside that module are intended for CPU.
On top of that, there are two other reasons:
- The logic for lowering to llvm / compiling to dylib becomes much simpler because we just hand over the CPU module op as the root node for conversion, whereas if we had CPU and TTNN entry point functions just freely together in the same module, we'd have to run a cleanup pass and remove the TTNN functions so that LLVM lowering passes don't run on them.
- As far as we can tell, iree does it this way too.
43cb4bb
to
929317f
Compare
cf95320
to
b1ff4af
Compare
@@ -0,0 +1,37 @@ | |||
// RUN: ttmlir-opt --ttir-cpu-hoist-transform="hoist-locations=add_op1,add_op2,add_op3,add_op4" %s | tee /tmp/output.mlir | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I like that a lot more. Then I don't need specific param => I can have single analysis pass, vs "manual" and "automatic", etc. I will sync w/ Nick on this
auto localFunc = llvm::dyn_cast_or_null<func::FuncOp>( | ||
sourceModule.lookupSymbol(functionName)); | ||
|
||
// if we have not already emitted an equivalent function call, perform this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments should start with a capital letter, we have some recently introduced guidelines, here. Please update comments throughout the change to adhere to the guidelines.
|
||
func.func @add(%arg0: tensor<32x32xbf16>, %arg1: tensor<32x32xbf16>) -> tensor<32x32xbf16> { | ||
%0 = tensor.empty() : tensor<32x32xbf16> | ||
// CHECK: %[[C:.*]] = call @hoisted_ttir.add_32x32xbf16_32x32xbf16_32x32xbf16_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace [[C:.*]]
with {{.*}}
if the captured text isn't used afterwards.
} | ||
|
||
// generate unique name base on operation type + argument tensors dims & types | ||
static llvm::SmallString<16> generateHoistedFuncName(mlir::Operation *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up generating 2 equal names if the parameters are equal, should we append a rand string to fn name to "guarantee" uniqueness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, but I think that's desirable. Below, I do a check to see if a func of the name already exists, and if so skip emitting new func, and simply call the same func, to avoid generating duplicate functions. In my testcase, I exercise this logic as well (2 of the 4 calls are same signature, only 3 funcs are generated)
@@ -0,0 +1,37 @@ | |||
// RUN: ttmlir-opt --ttir-cpu-hoist-transform="hoist-locations=add_op1,add_op2,add_op3,add_op4" %s | tee /tmp/output.mlir | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove tee
6ef45fd
to
64011b2
Compare
64011b2
to
f7e800b
Compare
Goal: The end-to-end goal is to integrate a path to compile and execute specific ops or sets of ops on the CPU.
Context:
The entire task will be split into (tentatively) 7 PRs, as follows:
This PR represents the 1st point above. Here, we build hoisting (placeholder) analysis + transform pass to mark specific ops to be hoisted, and then actually pull them into separate functions in a new "cpu" module + replace the original op with a call to the func.
Example: