-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aling 'linalg-to-xegpu' pass with patched XeGPU dialect #201
Conversation
616c9e1
to
716af02
Compare
loc, vecLoadType, tile, vnniAxisAttr, transpose, | ||
loc, vecLoadType, tile, packedAttr, transpose, transpose_bit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vnniAxis
->packedAttr
: instead of a vnni axis (0, 1) specify "packed" attribute that's equivalent ofvnni_axis=0
transpose_bit
: allows to transpose data while loading. Isn't used by this lowering pass
@@ -1057,7 +1076,7 @@ static LogicalResult createDPASKernel(linalg::LinalgOp linalgOp, | |||
|
|||
// Load A sub-tiles. | |||
SmallVector<Value> loadVecA = | |||
loadNdDescTiles(rewriter, loc, tilesA, readCacheHint, vnniConfA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vnniConfA
can't be used during loading since vnniAxis=1
is now longer supported. However we still need this config to compute proper tiles for xegpu.dpas
later in the code.
aeada62
to
435b520
Compare
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
435b520
to
2778459
Compare
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
f78f6d2
to
829b9d4
Compare
// CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [0, 0] | ||
// CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [%c0, %c0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imex doesn't support constant offsets (see intel/mlir-extensions#815)
@@ -63,7 +63,7 @@ func.func @matmul(%arg0: memref<32x32xf16>, %arg1: memref<32x32xf16>, %arg2: mem | |||
|
|||
// Extract DPAS-sized chunks from larger loaded tile A. | |||
// Tile B is already in the correct shape. | |||
// CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x8x2xf16> to vector<512xf16> | |||
// CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x16xf16> to vector<512xf16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not load the A
matrix via vnni_axis=1
anymore (see packed_attr
)
The IMEX changes are merged in Menooker:dev. |
Signed-off-by: dchigarev <[email protected]>
@@ -8,8 +8,8 @@ if (NOT DEFINED IMEX_INCLUDES) | |||
|
|||
# TODO: Change to main https://github.com/intel/mlir-extensions when all the | |||
# required functionality is merged. | |||
gc_fetch_content(imex 496b240093b5e132b60c5ee69878300fe69be300 https://github.com/Menooker/mlir-extensions | |||
SET IMEX_CHECK_LLVM_VERSION=ON IMEX_ENABLE_L0_RUNTIME=0 | |||
gc_fetch_content(imex d5bbd635dee500b8cff138686833bacfac5ade78 https://github.com/Menooker/mlir-extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the latest commit in dev
branch
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
cl_platform_id platform; // OpenCL platform | ||
cl_device_id device; // device ID | ||
CL_SAFE_CALL(clGetPlatformIDs(1, &platform, NULL)); | ||
CL_SAFE_CALL(clGetDeviceIDs(platform, *devtype, 1, &device, NULL)); | ||
return device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic searched for a device of the requested type only in one platform (and couldn't find any GPU on my machine). Rewritten the logic to iterate over all available platforms and return a first suitable device
Closes #192
This PR updates
linalg-to-xegpu
pass to make it compatible withxegpu-to-vc-func
pass from IMEX.The PR also adds a simple e2e test for
linalg->xegpu->gpu exe
pipeline.