[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

newling · 2024-09-25T18:27:40Z

This PR extends iree-amdaie-convert-to-dma to handle more situations.

The goal of the pass iree-amdaie-convert-to-dma is to convert iree_linalg_ext.pack, iree_linalg_ext.unpack and linalg.copy operations into amdaie.dma_cpy_nd operations. The linalg.copy ops are of no concern in this PR, as they are converted to pack and unpack ops very early in this pass. The logic for iree_linalg_ext.unpack is essentially the same as for iree_linalg_ext.pack, so I will only discuss iree_linalg_ext.pack in the next paragraphs.

There are 2 main differences between iree_linalg_ext.pack and amdaie.dma_cpy_nd which the pass needs to handle.

The first is that operands of amdaie.dma_cpy_nd are amdaie.logicalobjectfifo, which are essentially just memref.alloc ops (tied to a set of tiles). The operation iree_linalg_ext.pack on the other hand has operands which are not as 'directly' connected to memref.alloc ops, as they can can be memref.subviews of allocations, or indeed any arbitrary chain of memref.subview, memref.expand_shape, memref.collapse_shape, etc. The pass therefore needs to find the memref.alloc at the start of the chain which ultimately defines the operand of the iree_linalg_ext.pack, and build the amdaie.dma_cpy_nd based on that memref.alloc. Before this PR, it was assumed that the chain connecting a memref.alloc to iree_linalg_ext.pack was at most a single memref.subview op. This PR extends this to a chain of any length. It avoids recursion. Please see lit test for examples.

The second is that amdaie.dma_cpy_nd has offsets, sizes and strides. These need to be derived from the iree_linalg_ext.pack and all of the operations in the chain from the memref.alloc to the iree_linalg_ext.pack. With this PR, each of the operations memref.subview, memref.collapse_shape and memref.expand_shape in a chain from memref.alloc to iree_linalg_ext.pack has specific logic for modifying offsets, sizes and strides. Vectors offsets, sizes and strides are initialized at the memref.alloc, and then each op in the chain to the iree_linalg_ext.pack mutates them. And the they are mutated one last time based on the iree_linalg_ext.pack ops permutation and tile sizes.

Please see the lit tests for examples of these modifications.

yzhang93 · 2024-09-26T18:16:17Z

Could you add some motivation/comments/logic so it is easy to follow?

newling · 2024-09-26T19:14:41Z

Could you add some motivation/comments/logic so it is easy to follow?

I've added a description to the PR, but I'm not sure if that's what you're requesting?

Motivation: memref.expand_shape and memref.collapse_shape enter when I add linalg::populateFoldUnitExtentDimsPatterns(ps, options); which is needed to get vectorization working for convolution.

Would you like more comments in the code? I am happy to add these if so.

yzhang93 · 2024-09-26T23:23:21Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/convert_to_dma.mlir

+// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 100, 10, 1]
+// src of dma cpy:
+// CHECK-SAME: [0, 0, 0, 0] [20, 5, 10, 10] [500, 10, 50, 1]
+func.func @multidim_without_expand() {


I would change the test name with keyword pack/unpack. Same for the tests below.

yzhang93 · 2024-09-26T23:29:07Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

-    if (size.value() % innerTiles[i] != 0) {
+    std::optional<int64_t> maybeSize =
+        getConstantIntValue(sizes[innerDimsPos[i]]);
+    assert(maybeSize.has_value() && "size expected to be constant here.");


I would expect this to emit an error too to be consistent with the strides below.

yzhang93 · 2024-09-26T23:31:02Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

    std::optional<int64_t> stride =
        getConstantIntValue(strides[innerDimsPos[i]]);


I would expect here the same name format (maybeStrides) as the previous. And then take stride = maybeStride.value()

yzhang93 · 2024-09-27T00:04:49Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+
+  OpBuilder builder(subviewOp.getContext());
+  builder.setInsertionPoint(subviewOp);
+  offsets = getIndexOpFoldResultSum(builder, subviewOp.getLoc(), offsets,


It's better to add some comments here or above the utility function of how offsets are generated.

yzhang93 · 2024-09-27T00:04:54Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+  }
+
+  // Starting from the allocation, update the offsets, sizes, and strides.
+  for (auto iter = chain.rbegin(); iter != chain.rend(); ++iter) {


I'm okay with your method first going up the chain to find the definingOp and then going down the chain to update the address. But I think a recursion could make the codes cleaner.

yzhang93 · 2024-09-27T00:05:02Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

-  rewriter.eraseOp(op);
+  if (failed(mlir::verify(src)) || failed(mlir::verify(dst))) {
+    return failure();
+  }


Why is this needed?

My bad, left over from debugging.

yzhang93 · 2024-09-27T00:05:05Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+  if (failed(mlir::verify(packOrUnackOp))) {
+    return failure();
+  }


Same question, why is this needed?

yzhang93 · 2024-09-27T00:06:16Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+                                    SmallVector<OpFoldResult> &sizes,
+                                    SmallVector<OpFoldResult> &strides) {
+  MLIRContext *ctx = expandShapeOp.getContext();
+  auto reassociationIndices = expandShapeOp.getReassociationIndices();


Don't use auto here.

I intentionally didn't use auto here, because IMO SmallVector<SmallVector<long, 2>, 4> is uninformative. Why pollute code (and brain) with the detail that someone years ago thought that 4 groups of size 2 seemed like a good choice for static sizes?

yzhang93 · 2024-09-27T00:11:39Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+                                      SmallVector<OpFoldResult> &offsets,
+                                      SmallVector<OpFoldResult> &sizes,
+                                      SmallVector<OpFoldResult> &strides) {
+  auto reassociationIndices = collapseOp.getReassociationIndices();


Don't use auto here.

yzhang93 · 2024-09-27T00:14:59Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+    sizes.push_back(getAsIndexOpFoldResult(ctx, dim));
+  }
+
+  // Offsets - merge reassocation groups.


I would put some explanation of how offsets are decided here and other places too.

yzhang93 · 2024-09-27T00:20:16Z

Would you like more comments in the code? I am happy to add these if so.

Yes, I think more comments and explanations are needed, especially for the utility function (e.g. getLinearCombination) and how to determine the offsets/strides for expand/collapse op.

yzhang93 · 2024-09-27T00:23:08Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

+
+  auto add = [&](Value v, IntegerAttr attr) {
+    if (attr.getInt() == 0) return v;
+    return builder.create<arith::AddIOp>(loc, v, getConstant(attr.getInt()))


I'm wondering if there's an easier way / existing utility in upstream that can do this, instead of creating new ops.

Yeah, I wondered the same thing.

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp

newling · 2024-09-27T18:03:55Z

@yzhang93 thanks for your review. Please don't review this again until I remove the [WIP]. This is the "large PR" which I said could be eliminated if I could get "linalg-fold-unit-extent-dims" to not create collapse_shape and expand_shape ops. I'm experimenting with 'useRankReducingSlices = true' in that pass now as suggested by Mahesh to see what happens, if that works I might abandon this PR

newling requested review from MaheshRavishankar, nirvedhmeshram, yzhang93, Abhishek-Varma and jtuyls as code owners September 25, 2024 18:27

newling force-pushed the support_collapse_and_expand_in_dma_conversion branch from 762fa08 to 764d8e6 Compare September 25, 2024 20:06

newling changed the title ~~[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand~~ [AMDAIEConvertToDma] Support memref shape collapse/expand Sep 25, 2024

yzhang93 requested changes Sep 27, 2024

View reviewed changes

yzhang93 reviewed Sep 27, 2024

View reviewed changes

newling added 2 commits September 26, 2024 20:41

squashem

085d517

address review comments

86ed927

newling force-pushed the support_collapse_and_expand_in_dma_conversion branch from 764d8e6 to 86ed927 Compare September 27, 2024 04:54

newling commented Sep 27, 2024

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEConvertToDma.cpp Show resolved Hide resolved

fix offset calculation

8446b6a

newling changed the title ~~[AMDAIEConvertToDma] Support memref shape collapse/expand~~ [WIP][AMDAIEConvertToDma] Support memref shape collapse/expand Sep 27, 2024

newling marked this pull request as draft October 7, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

newling commented Sep 25, 2024 •

edited

Loading

yzhang93 commented Sep 26, 2024 •

edited

Loading

newling commented Sep 26, 2024

yzhang93 Sep 26, 2024

yzhang93 Sep 26, 2024

yzhang93 Sep 26, 2024

yzhang93 Sep 27, 2024

yzhang93 Sep 27, 2024

yzhang93 Sep 27, 2024

newling Sep 27, 2024

yzhang93 Sep 27, 2024

yzhang93 Sep 27, 2024

newling Sep 27, 2024

yzhang93 Sep 27, 2024

yzhang93 Sep 27, 2024

yzhang93 commented Sep 27, 2024

yzhang93 Sep 27, 2024

newling Sep 27, 2024

newling commented Sep 27, 2024

		std::optional<int64_t> stride =
		getConstantIntValue(strides[innerDimsPos[i]]);

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

Are you sure you want to change the base?

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

Conversation

newling commented Sep 25, 2024 • edited Loading

yzhang93 commented Sep 26, 2024 • edited Loading

newling commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhang93 commented Sep 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

newling commented Sep 27, 2024

newling commented Sep 25, 2024 •

edited

Loading

yzhang93 commented Sep 26, 2024 •

edited

Loading