[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

Yun-Fly · 2024-05-17T09:52:29Z

Track issue: Support fine-grain fusion #54

Matmul fusion with element-wise/reduce/broadcast ops.
Pre-op and post-op fusion(a.k.a. producer and consumer fusion respectively).
Multi-consumer and multi-producer support.
Support multiple level tiling and fusion.
Add flexible options to control the boundary of iterative fusion.
Enable default tiling when no op is tiled before fusion.
Cost-model to determine whether to fuse or not.

lib/gc/Transforms/AnyTilableFusion.cpp

ZhennanQin · 2024-06-03T08:46:19Z

#122 is to update LLVM version. Please rebase after that PR is merged.

test/gc/Transform/fine-grained-fusion.mlir

lib/gc/Transforms/FineGrainedFusion.cpp

dchigarev · 2024-07-30T10:27:03Z

@Yun-Fly thanks for your PR!

I've been able to test your pass in the GPU pipeline (linalg->xegpu->gpu exe) by replacing tile-consumer-and-fuse-producers pass from tpp-mlir repo and everything works just fine (if change the default tiling value in the code to something greater than 16).

What prevents us from changing 1 in this line to a value from a parameter to the pass?

graph-compiler/lib/gc/Transforms/IterativeTilingAndFusion.cpp

Line 625 in 111d276

defaultTileSize[en] = rewriter.getIndexAttr(1);

Yun-Fly · 2024-07-30T12:38:21Z

@Yun-Fly thanks for your PR!

I've been able to test your pass in the GPU pipeline (linalg->xegpu->gpu exe) by replacing tile-consumer-and-fuse-producers pass from tpp-mlir repo and everything works just fine (if change the default tiling value in the code to something greater than 16).

Glad to hear this progress!

What prevents us from changing 1 in this line to a value from a parameter to the pass?

graph-compiler/lib/gc/Transforms/IterativeTilingAndFusion.cpp

Line 625 in 111d276

defaultTileSize[en] = rewriter.getIndexAttr(1);

Yes, its a quite great point! Firstly, I would like to share some context with you:

In CPU pipeline, this fusion pass is placed after another deep-tiled-matmul pass. In another word, we expects matmuls already tiled by best config(a.k.a tileSize).
As a fallback, the code line you pointed out is default tileSize for all kinds of linalg ops in case of no op is tiled before fusion as your said. However, in general, it is hard to designate specific tileSize to a certain op by a value from a parameter to the pass. E.g. if there exist two matmul, how do we pass different tileSize to them? In MLIR, although it maybe customized by transform.match in UT, it is not feasible in automatic pass.

I would still try to add a parameter to pass tileSize for certain kind of linalg op. i.e. -tile-size={"matmul", {32, 32}}. But in real case(like MLP), different shapes of matmuls may prefer different tileSize for better performance. And that is why we have a standalone pass to tile matmul before fusion in CPU pipeline. Would you like to share your plan about this in GPU pipeline?

kurapov-peter

@Yun-Fly, could you please add test cases that showcase the described scenarios? This would really help in understanding pre/post conditions.

Yun-Fly · 2024-07-31T03:00:44Z

I would still try to add a parameter to pass tileSize for certain kind of linalg op. i.e. -tile-size={"matmul", {32, 32}}. But in real case(like MLP), different shapes of matmuls may prefer different tileSize for better performance. And that is why we have a standalone pass to tile matmul before fusion in CPU pipeline. Would you like to share your plan about this in GPU pipeline?

Hi, @dchigarev . I have added new parameter called default-tile-size. You can specify tileSize by gc-opt -iterative-tiling-and-fusion="default-tile-size=matmul:{32,32}". This behavior will affect all matmuls in your input .mlir, which sounds similar to TPP. The difference is that we can also designate different tileSize for other kinds of ops except for contraction op, saying ="default-tile-size=matmul:{32,32},reduce:{16,16}". NOTE that, all of these default tileSize only make sense when the certain kind of op need self tiling rather than fused with any other already tiled op/group.

Yun-Fly · 2024-07-31T03:05:27Z

@Yun-Fly, could you please add test cases that showcase the described scenarios? This would really help in understanding pre/post conditions.

Sure, I will arrange them later. pre/post-op fusion is as known as producer/consumer fusion respectively.

dchigarev · 2024-07-31T07:07:35Z

@Yun-Fly

I have added new parameter called default-tile-size.

Thank you! I think this will be enough for our first gpu-pipeline prototype

kurapov-peter

The first pack of the comments/questions. Tried to go inside-out on the tiling using interface.

lib/gc/Transforms/IterativeTilingAndFusion.cpp

test/gc/Transform/iterative-tiling-and-fusion.mlir

lib/gc/Transforms/TilingUsingInterfaceX.cpp

lib/gc/Transforms/TilingUsingInterfaceX.h

lib/gc/Transforms/TilingUsingInterfaceX.cpp

kurapov-peter

Second portion.

lib/gc/Transforms/IterativeTilingAndFusion.cpp

kurapov-peter · 2024-08-01T10:39:13Z

lib/gc/Transforms/IterativeTilingAndFusion.cpp

+      FailureOr<int64_t> cstTileSizes =
+          ValueBoundsConstraintSet::computeConstantBound(
+              presburger::BoundType::UB, tileSizes[resultExpr.index()], nullptr,
+              true);


Suggested change

FailureOr<int64_t> cstTileSizes =

ValueBoundsConstraintSet::computeConstantBound(

presburger::BoundType::UB, tileSizes[resultExpr.index()], nullptr,

true);

FailureOr<int64_t> cstTileSizes =

ValueBoundsConstraintSet::computeConstantBound(

presburger::BoundType::UB, tileSizes[resultExpr.index()], /*stopCondition=*/nullptr,

/*closedUB=*/true);

Btw, are there cases when we know the tile sizes statically?

We assume all the tileSize is static rather than dymamic. BTW, do we need to support dynamic shape at this moment?

It's OK to start with the static case. We'll need the dynamic as well of course.

kurapov-peter · 2024-08-01T10:45:05Z

lib/gc/Transforms/IterativeTilingAndFusion.cpp

+      if (!cstIterDomain || failed(cstTileSizes) ||
+          cstIterDomain != cstTileSizes)


What does this check exactly?

As the name of this filter(noTilingOnReductionFilter) said, we need to ensure there exist no tiling on any reduction dimension. Otherwise, it will lead to wrong calculation result. I.e. tileSize should equal to IterationDomain. Certainly, if either of them is not dynamic, they are incomparable until RUNTIME , so just return failure for such case.

kurapov-peter · 2024-08-01T12:28:29Z

lib/gc/Transforms/IterativeTilingAndFusion.cpp

+    if (!failed(cstSize) && cstInnerSize) {
+      if (*cstSize % *cstInnerSize == 0)
+        continue;


Is this to cover some weird uneven tiling/generic case? I mean, what's the reason a tail check is not sufficient?

As the name of this filter(exactTilingOnPackUnPackFilter) said, uneven tiling/generic case is not expected so far because it will involve more complex imperfect tiling case.

lib/gc/Transforms/IterativeTilingAndFusion.cpp

kurapov-peter · 2024-08-01T13:52:24Z

lib/gc/Transforms/IterativeTilingAndFusion.cpp

+  MLIRContext *ctx;
+};
+
+using OpTileSizeMap = std::unordered_map<std::string, SmallVector<int64_t>>;


Wouldn't it be easier and safer to use the type ID for it?

Yeah, I agree. But the problem is that the default tileSize is passed by argument, like gc-opt -iterative-tiling-and-fusion="default-tile-size=matmul:{32,32}". By MLIR parser, they will be converted to std::string in fact.

lib/gc/Transforms/IterativeTilingAndFusion.cpp

kurapov-peter

Thanks, looks better now. It this still needs even more testing. All the minor things we can clean up later on.

Yun-Fly added the WIP work in progress label May 17, 2024

Yun-Fly requested a review from ZhennanQin May 17, 2024 09:52

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 1fa34f4 to 5cc9bca Compare May 17, 2024 11:48

BRUCE11111 reviewed May 20, 2024

View reviewed changes

lib/gc/Transforms/AnyTilableFusion.cpp Outdated Show resolved Hide resolved

BRUCE11111 reviewed May 20, 2024

View reviewed changes

lib/gc/Transforms/AnyTilableFusion.cpp Outdated Show resolved Hide resolved

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from af801cc to d130996 Compare May 31, 2024 09:14

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 9e6fbc6 to 8ef6702 Compare June 3, 2024 13:20

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 4 times, most recently from ee371dc to 5de318f Compare July 8, 2024 09:02

Yun-Fly linked an issue Jul 9, 2024 that may be closed by this pull request

Support fine-grain fusion #54

Closed

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 3 times, most recently from 9a4d734 to 938c66f Compare July 15, 2024 02:31

Yun-Fly changed the title ~~[Transform][Fusion] enable fine-grained fusion based on diffusion~~ [Transform][Fusion] enable fine-grained fusion by forward and backward slice Jul 15, 2024

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 4 times, most recently from 17c88aa to 42c24b0 Compare July 22, 2024 07:21

Yun-Fly changed the title ~~[Transform][Fusion] enable fine-grained fusion by forward and backward slice~~ [Transform][Fusion] enable fine-grained fusion in forward and backward fashion Jul 22, 2024

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 42c24b0 to 8d37c20 Compare July 22, 2024 07:36

Yun-Fly added 7 commits July 29, 2024 01:10

init diffusion

8ede55c

support fuse consumer into innerMost ConsumerAnchor

5563efd

add coordination on multi-level anchor

e95a500

rebase

a23f1b3

fix clang check

cb1d6f9

support reduce and multi-consumers

1e92bb2

sync to latest upstream PR

2dd910e

Yun-Fly requested review from Menooker, kurapov-peter, BRUCE11111, zhczhong, yifeizh2, ciyongch and AshburnLee July 30, 2024 02:24

zhczhong reviewed Jul 30, 2024

View reviewed changes

fix comment and rename pass name

111d276

dchigarev mentioned this pull request Jul 30, 2024

Bring tile-consumer-and-fuse-producers pass from TPP #194

Closed

kurapov-peter reviewed Jul 30, 2024

View reviewed changes

add default tileSize option to pass

d0c456f

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from e839fb2 to d0c456f Compare July 31, 2024 06:07

dchigarev mentioned this pull request Jul 31, 2024

Make linalg->xegpu->gpu_exe pipeline working #193

Closed

5 tasks

kurapov-peter reviewed Jul 31, 2024

View reviewed changes

dchigarev mentioned this pull request Jul 31, 2024

Aling 'linalg-to-xegpu' pass with patched XeGPU dialect #201

Merged

add FileCheck and fix comment

3ad0e30

kurapov-peter reviewed Aug 1, 2024

View reviewed changes

zhczhong mentioned this pull request Aug 5, 2024

Centralize target description query through DLTI and add verifier pass #210

Merged

fix second portion comment

4dd5214

Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 443100f to 4dd5214 Compare August 5, 2024 08:20

kurapov-peter approved these changes Aug 5, 2024

View reviewed changes

dchigarev approved these changes Aug 5, 2024

View reviewed changes

kurapov-peter merged commit 3be8dec into main Aug 5, 2024
4 checks passed

kurapov-peter deleted the yunfei/fine_grained_fusion branch August 5, 2024 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

Yun-Fly commented May 17, 2024 •

edited

Loading

ZhennanQin commented Jun 3, 2024 •

edited

Loading

dchigarev commented Jul 30, 2024

Yun-Fly commented Jul 30, 2024

kurapov-peter left a comment

Yun-Fly commented Jul 31, 2024

Yun-Fly commented Jul 31, 2024 •

edited

Loading

dchigarev commented Jul 31, 2024

kurapov-peter left a comment

kurapov-peter left a comment

kurapov-peter Aug 1, 2024

Yun-Fly Aug 5, 2024

kurapov-peter Aug 5, 2024

kurapov-peter Aug 1, 2024

Yun-Fly Aug 5, 2024

kurapov-peter Aug 1, 2024

Yun-Fly Aug 5, 2024

kurapov-peter Aug 1, 2024

Yun-Fly Aug 5, 2024

kurapov-peter left a comment

		if (!cstIterDomain \|\| failed(cstTileSizes) \|\|
		cstIterDomain != cstTileSizes)

[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

Conversation

Yun-Fly commented May 17, 2024 • edited Loading

ZhennanQin commented Jun 3, 2024 • edited Loading

dchigarev commented Jul 30, 2024

Yun-Fly commented Jul 30, 2024

kurapov-peter left a comment

Choose a reason for hiding this comment

Yun-Fly commented Jul 31, 2024

Yun-Fly commented Jul 31, 2024 • edited Loading

dchigarev commented Jul 31, 2024

kurapov-peter left a comment

Choose a reason for hiding this comment

kurapov-peter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kurapov-peter left a comment

Choose a reason for hiding this comment

Yun-Fly commented May 17, 2024 •

edited

Loading

ZhennanQin commented Jun 3, 2024 •

edited

Loading

Yun-Fly commented Jul 31, 2024 •

edited

Loading