Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aling 'linalg-to-xegpu' pass with patched XeGPU dialect #201

Merged
merged 13 commits into from
Aug 7, 2024

Conversation

dchigarev
Copy link
Contributor

@dchigarev dchigarev commented Jul 30, 2024

Closes #192

This PR updates linalg-to-xegpu pass to make it compatible with xegpu-to-vc-func pass from IMEX.

The PR also adds a simple e2e test for linalg->xegpu->gpu exe pipeline.

Comment on lines 745 to 762
loc, vecLoadType, tile, vnniAxisAttr, transpose,
loc, vecLoadType, tile, packedAttr, transpose, transpose_bit,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. vnniAxis -> packedAttr: instead of a vnni axis (0, 1) specify "packed" attribute that's equivalent of vnni_axis=0
  2. transpose_bit: allows to transpose data while loading. Isn't used by this lowering pass

@@ -1057,7 +1076,7 @@ static LogicalResult createDPASKernel(linalg::LinalgOp linalgOp,

// Load A sub-tiles.
SmallVector<Value> loadVecA =
loadNdDescTiles(rewriter, loc, tilesA, readCacheHint, vnniConfA);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vnniConfA can't be used during loading since vnniAxis=1 is now longer supported. However we still need this config to compute proper tiles for xegpu.dpas later in the code.

@dchigarev dchigarev marked this pull request as ready for review July 31, 2024 15:06
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
cmake/imex.cmake Outdated Show resolved Hide resolved
// CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [0, 0]
// CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [%c0, %c0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imex doesn't support constant offsets (see intel/mlir-extensions#815)

@@ -63,7 +63,7 @@ func.func @matmul(%arg0: memref<32x32xf16>, %arg1: memref<32x32xf16>, %arg2: mem

// Extract DPAS-sized chunks from larger loaded tile A.
// Tile B is already in the correct shape.
// CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x8x2xf16> to vector<512xf16>
// CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x16xf16> to vector<512xf16>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not load the A matrix via vnni_axis=1 anymore (see packed_attr)

@Menooker
Copy link

Menooker commented Aug 2, 2024

The IMEX changes are merged in Menooker:dev.

@@ -8,8 +8,8 @@ if (NOT DEFINED IMEX_INCLUDES)

# TODO: Change to main https://github.com/intel/mlir-extensions when all the
# required functionality is merged.
gc_fetch_content(imex 496b240093b5e132b60c5ee69878300fe69be300 https://github.com/Menooker/mlir-extensions
SET IMEX_CHECK_LLVM_VERSION=ON IMEX_ENABLE_L0_RUNTIME=0
gc_fetch_content(imex d5bbd635dee500b8cff138686833bacfac5ade78 https://github.com/Menooker/mlir-extensions
Copy link
Contributor Author

@dchigarev dchigarev Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to the latest commit in dev branch

include/gc/Transforms/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines -132 to -136
cl_platform_id platform; // OpenCL platform
cl_device_id device; // device ID
CL_SAFE_CALL(clGetPlatformIDs(1, &platform, NULL));
CL_SAFE_CALL(clGetDeviceIDs(platform, *devtype, 1, &device, NULL));
return device;
Copy link
Contributor Author

@dchigarev dchigarev Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old logic searched for a device of the requested type only in one platform (and couldn't find any GPU on my machine). Rewritten the logic to iterate over all available platforms and return a first suitable device

@kurapov-peter kurapov-peter merged commit dd1a80d into intel:main Aug 7, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make linalg-to-xegpu pass compatible with the patched XeGPU dialect from IMEX
4 participants