Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform][Fusion] Introduce an iterative tiling and fusion pass in forward and backward fashion #87

Merged
merged 15 commits into from
Aug 5, 2024

Conversation

Yun-Fly
Copy link
Contributor

@Yun-Fly Yun-Fly commented May 17, 2024

Track issue: Support fine-grain fusion #54

  • Matmul fusion with element-wise/reduce/broadcast ops.
  • Pre-op and post-op fusion(a.k.a. producer and consumer fusion respectively).
  • Multi-consumer and multi-producer support.
  • Support multiple level tiling and fusion.
  • Add flexible options to control the boundary of iterative fusion.
  • Enable default tiling when no op is tiled before fusion.
  • Cost-model to determine whether to fuse or not.

@Yun-Fly Yun-Fly added the WIP work in progress label May 17, 2024
@Yun-Fly Yun-Fly requested a review from ZhennanQin May 17, 2024 09:52
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 1fa34f4 to 5cc9bca Compare May 17, 2024 11:48
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from af801cc to d130996 Compare May 31, 2024 09:14
@ZhennanQin
Copy link
Contributor

ZhennanQin commented Jun 3, 2024

#122 is to update LLVM version. Please rebase after that PR is merged.

@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 9e6fbc6 to 8ef6702 Compare June 3, 2024 13:20
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 4 times, most recently from ee371dc to 5de318f Compare July 8, 2024 09:02
@Yun-Fly Yun-Fly linked an issue Jul 9, 2024 that may be closed by this pull request
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 3 times, most recently from 9a4d734 to 938c66f Compare July 15, 2024 02:31
@Yun-Fly Yun-Fly changed the title [Transform][Fusion] enable fine-grained fusion based on diffusion [Transform][Fusion] enable fine-grained fusion by forward and backward slice Jul 15, 2024
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch 4 times, most recently from 17c88aa to 42c24b0 Compare July 22, 2024 07:21
@Yun-Fly Yun-Fly changed the title [Transform][Fusion] enable fine-grained fusion by forward and backward slice [Transform][Fusion] enable fine-grained fusion in forward and backward fashion Jul 22, 2024
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 42c24b0 to 8d37c20 Compare July 22, 2024 07:36
test/gc/Transform/fine-grained-fusion.mlir Outdated Show resolved Hide resolved
lib/gc/Transforms/FineGrainedFusion.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/FineGrainedFusion.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/FineGrainedFusion.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/FineGrainedFusion.cpp Outdated Show resolved Hide resolved
@dchigarev
Copy link
Contributor

@Yun-Fly thanks for your PR!

I've been able to test your pass in the GPU pipeline (linalg->xegpu->gpu exe) by replacing tile-consumer-and-fuse-producers pass from tpp-mlir repo and everything works just fine (if change the default tiling value in the code to something greater than 16).

What prevents us from changing 1 in this line to a value from a parameter to the pass?

defaultTileSize[en] = rewriter.getIndexAttr(1);

@Yun-Fly
Copy link
Contributor Author

Yun-Fly commented Jul 30, 2024

@Yun-Fly thanks for your PR!

I've been able to test your pass in the GPU pipeline (linalg->xegpu->gpu exe) by replacing tile-consumer-and-fuse-producers pass from tpp-mlir repo and everything works just fine (if change the default tiling value in the code to something greater than 16).

Glad to hear this progress!

What prevents us from changing 1 in this line to a value from a parameter to the pass?

defaultTileSize[en] = rewriter.getIndexAttr(1);

Yes, its a quite great point! Firstly, I would like to share some context with you:

  1. In CPU pipeline, this fusion pass is placed after another deep-tiled-matmul pass. In another word, we expects matmuls already tiled by best config(a.k.a tileSize).
  2. As a fallback, the code line you pointed out is default tileSize for all kinds of linalg ops in case of no op is tiled before fusion as your said. However, in general, it is hard to designate specific tileSize to a certain op by a value from a parameter to the pass. E.g. if there exist two matmul, how do we pass different tileSize to them? In MLIR, although it maybe customized by transform.match in UT, it is not feasible in automatic pass.

I would still try to add a parameter to pass tileSize for certain kind of linalg op. i.e. -tile-size={"matmul", {32, 32}}. But in real case(like MLP), different shapes of matmuls may prefer different tileSize for better performance. And that is why we have a standalone pass to tile matmul before fusion in CPU pipeline. Would you like to share your plan about this in GPU pipeline?

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yun-Fly, could you please add test cases that showcase the described scenarios? This would really help in understanding pre/post conditions.

@Yun-Fly
Copy link
Contributor Author

Yun-Fly commented Jul 31, 2024

I would still try to add a parameter to pass tileSize for certain kind of linalg op. i.e. -tile-size={"matmul", {32, 32}}. But in real case(like MLP), different shapes of matmuls may prefer different tileSize for better performance. And that is why we have a standalone pass to tile matmul before fusion in CPU pipeline. Would you like to share your plan about this in GPU pipeline?

Hi, @dchigarev . I have added new parameter called default-tile-size. You can specify tileSize by gc-opt -iterative-tiling-and-fusion="default-tile-size=matmul:{32,32}". This behavior will affect all matmuls in your input .mlir, which sounds similar to TPP. The difference is that we can also designate different tileSize for other kinds of ops except for contraction op, saying ="default-tile-size=matmul:{32,32},reduce:{16,16}". NOTE that, all of these default tileSize only make sense when the certain kind of op need self tiling rather than fused with any other already tiled op/group.

@Yun-Fly
Copy link
Contributor Author

Yun-Fly commented Jul 31, 2024

@Yun-Fly, could you please add test cases that showcase the described scenarios? This would really help in understanding pre/post conditions.

Sure, I will arrange them later. pre/post-op fusion is as known as producer/consumer fusion respectively.

@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from e839fb2 to d0c456f Compare July 31, 2024 06:07
@dchigarev
Copy link
Contributor

@Yun-Fly

I have added new parameter called default-tile-size.

Thank you! I think this will be enough for our first gpu-pipeline prototype

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first pack of the comments/questions. Tried to go inside-out on the tiling using interface.

lib/gc/Transforms/IterativeTilingAndFusion.cpp Outdated Show resolved Hide resolved
test/gc/Transform/iterative-tiling-and-fusion.mlir Outdated Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.h Outdated Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Show resolved Hide resolved
lib/gc/Transforms/TilingUsingInterfaceX.cpp Show resolved Hide resolved
Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second portion.

Comment on lines +134 to +137
FailureOr<int64_t> cstTileSizes =
ValueBoundsConstraintSet::computeConstantBound(
presburger::BoundType::UB, tileSizes[resultExpr.index()], nullptr,
true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FailureOr<int64_t> cstTileSizes =
ValueBoundsConstraintSet::computeConstantBound(
presburger::BoundType::UB, tileSizes[resultExpr.index()], nullptr,
true);
FailureOr<int64_t> cstTileSizes =
ValueBoundsConstraintSet::computeConstantBound(
presburger::BoundType::UB, tileSizes[resultExpr.index()], /*stopCondition=*/nullptr,
/*closedUB=*/true);

Btw, are there cases when we know the tile sizes statically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume all the tileSize is static rather than dymamic. BTW, do we need to support dynamic shape at this moment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to start with the static case. We'll need the dynamic as well of course.

Comment on lines +138 to +139
if (!cstIterDomain || failed(cstTileSizes) ||
cstIterDomain != cstTileSizes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this check exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name of this filter(noTilingOnReductionFilter) said, we need to ensure there exist no tiling on any reduction dimension. Otherwise, it will lead to wrong calculation result. I.e. tileSize should equal to IterationDomain. Certainly, if either of them is not dynamic, they are incomparable until RUNTIME , so just return failure for such case.

Comment on lines +199 to +201
if (!failed(cstSize) && cstInnerSize) {
if (*cstSize % *cstInnerSize == 0)
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to cover some weird uneven tiling/generic case? I mean, what's the reason a tail check is not sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name of this filter(exactTilingOnPackUnPackFilter) said, uneven tiling/generic case is not expected so far because it will involve more complex imperfect tiling case.

MLIRContext *ctx;
};

using OpTileSizeMap = std::unordered_map<std::string, SmallVector<int64_t>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier and safer to use the type ID for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. But the problem is that the default tileSize is passed by argument, like gc-opt -iterative-tiling-and-fusion="default-tile-size=matmul:{32,32}". By MLIR parser, they will be converted to std::string in fact.

lib/gc/Transforms/IterativeTilingAndFusion.cpp Outdated Show resolved Hide resolved
lib/gc/Transforms/IterativeTilingAndFusion.cpp Outdated Show resolved Hide resolved
@Yun-Fly Yun-Fly force-pushed the yunfei/fine_grained_fusion branch from 443100f to 4dd5214 Compare August 5, 2024 08:20
Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks better now. It this still needs even more testing. All the minor things we can clean up later on.

@kurapov-peter kurapov-peter merged commit 3be8dec into main Aug 5, 2024
4 checks passed
@kurapov-peter kurapov-peter deleted the yunfei/fine_grained_fusion branch August 5, 2024 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support fine-grain fusion
6 participants