-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800
base: main
Are you sure you want to change the base?
[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800
Conversation
762fa08
to
764d8e6
Compare
Could you add some motivation/comments/logic so it is easy to follow? |
I've added a description to the PR, but I'm not sure if that's what you're requesting? Motivation: memref.expand_shape and memref.collapse_shape enter when I add Would you like more comments in the code? I am happy to add these if so. |
// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 100, 10, 1] | ||
// src of dma cpy: | ||
// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 10, 50, 1] | ||
func.func @multidim_without_expand() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the test name with keyword pack/unpack. Same for the tests below.
if (size.value() % innerTiles[i] != 0) { | ||
std::optional<int64_t> maybeSize = | ||
getConstantIntValue(sizes[innerDimsPos[i]]); | ||
assert(maybeSize.has_value() && "size expected to be constant here."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to emit an error too to be consistent with the strides below.
std::optional<int64_t> stride = | ||
getConstantIntValue(strides[innerDimsPos[i]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect here the same name format (maybeStrides) as the previous. And then take stride = maybeStride.value()
|
||
OpBuilder builder(subviewOp.getContext()); | ||
builder.setInsertionPoint(subviewOp); | ||
offsets = getIndexOpFoldResultSum(builder, subviewOp.getLoc(), offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add some comments here or above the utility function of how offsets are generated.
} | ||
|
||
// Starting from the allocation, update the offsets, sizes, and strides. | ||
for (auto iter = chain.rbegin(); iter != chain.rend(); ++iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with your method first going up the chain to find the definingOp and then going down the chain to update the address. But I think a recursion could make the codes cleaner.
rewriter.eraseOp(op); | ||
if (failed(mlir::verify(src)) || failed(mlir::verify(dst))) { | ||
return failure(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, left over from debugging.
if (failed(mlir::verify(packOrUnackOp))) { | ||
return failure(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, why is this needed?
SmallVector<OpFoldResult> &sizes, | ||
SmallVector<OpFoldResult> &strides) { | ||
MLIRContext *ctx = expandShapeOp.getContext(); | ||
auto reassociationIndices = expandShapeOp.getReassociationIndices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use auto
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally didn't use auto
here, because IMO SmallVector<SmallVector<long, 2>, 4>
is uninformative. Why pollute code (and brain) with the detail that someone years ago thought that 4 groups of size 2 seemed like a good choice for static sizes?
SmallVector<OpFoldResult> &offsets, | ||
SmallVector<OpFoldResult> &sizes, | ||
SmallVector<OpFoldResult> &strides) { | ||
auto reassociationIndices = collapseOp.getReassociationIndices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use auto here.
sizes.push_back(getAsIndexOpFoldResult(ctx, dim)); | ||
} | ||
|
||
// Offsets - merge reassocation groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put some explanation of how offsets are decided here and other places too.
Yes, I think more comments and explanations are needed, especially for the utility function (e.g. |
|
||
auto add = [&](Value v, IntegerAttr attr) { | ||
if (attr.getInt() == 0) return v; | ||
return builder.create<arith::AddIOp>(loc, v, getConstant(attr.getInt())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's an easier way / existing utility in upstream that can do this, instead of creating new ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wondered the same thing.
764d8e6
to
86ed927
Compare
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp
Show resolved
Hide resolved
@yzhang93 thanks for your review. Please don't review this again until I remove the [WIP]. This is the "large PR" which I said could be eliminated if I could get "linalg-fold-unit-extent-dims" to not create collapse_shape and expand_shape ops. I'm experimenting with 'useRankReducingSlices = true' in that pass now as suggested by Mahesh to see what happens, if that works I might abandon this PR |
This PR extends
iree-amdaie-convert-to-dma
to handle more situations.The goal of the pass
iree-amdaie-convert-to-dma
is to convertiree_linalg_ext.pack
,iree_linalg_ext.unpack
andlinalg.copy
operations intoamdaie.dma_cpy_nd
operations. Thelinalg.copy
ops are of no concern in this PR, as they are converted to pack and unpack ops very early in this pass. The logic foriree_linalg_ext.unpack
is essentially the same as foriree_linalg_ext.pack
, so I will only discussiree_linalg_ext.pack
in the next paragraphs.There are 2 main differences between
iree_linalg_ext.pack
andamdaie.dma_cpy_nd
which the pass needs to handle.The first is that operands of
amdaie.dma_cpy_nd
areamdaie.logicalobjectfifo
, which are essentially justmemref.alloc
ops (tied to a set of tiles). The operationiree_linalg_ext.pack
on the other hand has operands which are not as 'directly' connected tomemref.alloc
ops, as they can can bememref.subview
s of allocations, or indeed any arbitrary chain ofmemref.subview
,memref.expand_shape
,memref.collapse_shape
, etc. The pass therefore needs to find thememref.alloc
at the start of the chain which ultimately defines the operand of theiree_linalg_ext.pack
, and build theamdaie.dma_cpy_nd
based on thatmemref.alloc
. Before this PR, it was assumed that the chain connecting amemref.alloc
toiree_linalg_ext.pack
was at most a singlememref.subview
op. This PR extends this to a chain of any length. It avoids recursion. Please see lit test for examples.The second is that
amdaie.dma_cpy_nd
hasoffsets
,sizes
andstrides
. These need to be derived from theiree_linalg_ext.pack
and all of the operations in the chain from thememref.alloc
to theiree_linalg_ext.pack
. With this PR, each of the operationsmemref.subview
,memref.collapse_shape
andmemref.expand_shape
in a chain frommemref.alloc
toiree_linalg_ext.pack
has specific logic for modifyingoffsets
,sizes
andstrides
. Vectorsoffsets
,sizes
andstrides
are initialized at thememref.alloc
, and then each op in the chain to theiree_linalg_ext.pack
mutates them. And the they are mutated one last time based on theiree_linalg_ext.pack
ops permutation and tile sizes.Please see the lit tests for examples of these modifications.