-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge kernels in existing XCLBIN #418
base: main
Are you sure you want to change the base?
Conversation
6791d86
to
aaec27e
Compare
f5d34a8
to
5f49f35
Compare
b9c1259
to
34c47e3
Compare
2a09422
to
a11e419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool that the e2e test works! Just to confirm: these matmuls form 3 different dispatches?
compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.cpp
Outdated
Show resolved
Hide resolved
@@ -350,11 +368,24 @@ LogicalResult AIETargetBackend::serializeExecutable( | |||
{ | |||
SmallVector<StringRef> cmdEnvRefs{cmdEnv.begin(), cmdEnv.end()}; | |||
int result = llvm::sys::ExecuteAndWait(cmdArgs[0], cmdArgs, cmdEnvRefs); | |||
if (result != 0) | |||
if (result != 0 && AttemptingMerge) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we detect before hand if the same kernel appears twice? Just because it'll be better to know why we failed. My concern is that we'll stop being able to merge for some other reason, and not know about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I correctly understand the case you are thinking about but if I understand correctly you are thinking some kernel cannot be merged in the existing XCLBIN and if a later kernel is same as the one that couldnt be we will just not be able to merge that either?
If what I interpret is correct then a few reasons why this is not of concern.
- IREE has deduplication passes after dispatch formation so same kernel is never appearing twice.
- The success of the merge is independent of the content in the kernel. The only case that I am aware of when it will fail is when we exceed the number of PDIs that xclbin supports. We have one PDI per kernel for now until we get to super kernels. I believe the PDI limit is 16. So my idea here is the 17th kernel will fail and then start a new xclbin, and the process repeats for the 33rd kernel and so on. And as far as the xclbin utility is concerned all 33 of these kernels (PDIs) could have same content and that fact wouldn't matter. A bit hard to test this though and we arent at that scale anyway now so havent worked on testing that.
Yes that is correct. |
On further investigating I realized that with the wheel the CI is using, no actual merging is taking place, you can see this error in the CI test added
This causes the merging to fail and it goes through the regular new xclbin per dispatch path. |
This PR uses a new xclbin generation tool from Xilinx/mlir-aie#1561 that allows us to merge kernel PDIs into the same XCLBIN.
A mlir-aie wheel bump is needed to use the tool.
An e2e example which forms multiple (three) dispatches is added.