Add OpenCL runtime #191

Menooker · 2024-07-29T08:17:46Z

This PR introduces a OCL stream wrapper, and the implementation of gpux dialect's and upstream runtime with OCL.

IMEX runtime wrapper

If context and device is passed with nullptr, we will return a global stream object. The stream object is responsible for releasing the context. If both are given, we "borrow" the context and we don't release the context at the destructor of the stream object.

Future work:

IMEX upstream has a problem in gpux lowering. So we actually do nothing at gpuStreamDestroy. Need to fix it when the gpux issue is fixed.
gpux dialect should expose a way to pass context and queue, so that we can borrow them in our stream object.
The lifetime of the cl_program and cl_kernel is managed by the stream object in a std::vector<....>. The stream object itself is not thread-safe.
Shall we cache the cl_program and cl_kernel? Currently, every time we load GPU module and get kernel, we create a new instance of cl_program and cl_kernel. They can actually be cached to ease the load of host-side.

Upstream runtime wrapper

We use the same stream object class as used in IMEX wrapper. In upstream ROCM/Cuda/Sycl wrapper, they use raw queue pointer as the type of queue parameter in mgpu* APIS (like sycl::queue*).
Reasonale on why we still need a wrapped stream object over OCL cl_queue:

some of the OCL APIs needs cl_context, cl_device as well as cl_queue. And the mgpu* wrappers only passes a single parameter for queue.
we are using USM extension of OCL. The USM related APIs are not directly visible to users as normal C-library funcs. They need to queried from (cl_context -> cl_device -> cl_platform -> query extensions`). So we need to cache the extension function table for a "queue" object.
setting parameters of cl_kernel might take some time. We might be able to cache cl_kernel and their parameters in the wrapped stream object.

Future work & fix-me:

mgpuModuleLoad and mgpuModuleGetFunction does not have queue in the function parameters. However, OCL APIs needs the device and context. We are using a "thread_local" trick to pass the previously used queue as the context for the OCL API. It should be OK in most cases for single-thread & single-stream cases, but it is error-prone.
Check if dynamic shared memory works for mgpuLaunchKernel
We assume that all parameters passed to a GPU kernel has the same size of void*. Is it safe for OCL?

Menooker · 2024-07-29T08:20:34Z

lib/gc/Transforms/GPU/LinalgToXeGPU.cpp

    vecLoadType = getVnniVector(tileType.getShape(), tileType.getElementType(),
                                *vnniConf);
  }

  SmallVector<Value> loadVec;
  for (auto tile : loadTiles) {
    auto loadOp = rewriter.create<xegpu::LoadNdOp>(
-        loc, vecLoadType, tile, vnniAxisAttr, transpose,
+        loc, vecLoadType, tile, vnniAxisAttr, transpose, nullptr,


@dchigarev Sorry I am late for the party. The IMEX's XeGPU dialect interfaces are a bit different from the upstream one. If we compile GC+IMEX+Patched LLVM, the compiler complains here. Here in the PR, I have updated the code, just to make the compiler happy, but not yet pass the UTs related to this part of code.

@Menooker thanks, I'm good with your current changes.

I'm aware that the current linalg-to-xegpu pass is incompatible with the patched XeGPU dialect from IMEX. I'm currently working on a separate PR to make them compatible.

p.s. an issue to track (#192)

Good to know that you are aware of the issue. :)

I just realized that this breaks the normal build for GPU. I think we should ifdef it or smth.

I think we shouldn't include this pass in CPU builds at all (if -DGC_USE_GPU=1)

I think we shouldn't include this pass in CPU builds at all (if -DGC_USE_GPU=1)

I agree with you. Actually these changes in my PR will break the CPU CI. I will skip build the pass in cmake for CPU-only mode.

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt

AndreyPavlenko · 2024-07-30T13:31:40Z

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt

+if(NOT CXX_HAS_FRTTI_FLAG)
+    message(FATAL_ERROR "CXX compiler does not accept flag -frtti")
+endif()
+target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti)


Suggested change

target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti)

target_compile_options (GcOpenclRuntime PUBLIC -fexceptions -frtti)

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt

AndreyPavlenko · 2024-07-30T13:33:32Z

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt

+    )
+
+message(STATUS "OpenCL Libraries: ${OpenCL_LIBRARIES}")
+target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES})


Suggested change

target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES})

target_link_libraries(GcOpenclRuntime PUBLIC ${OpenCL_LIBRARIES})

Thanks, changed. Also use PUBLIC as suggested.

kurapov-peter · 2024-07-30T15:00:54Z

README.md

+ * make sure the OpenCL runtime is installed in your system. You can either
+  install using OS-provided package (Ubuntu 22.04)
+```sh
+sudo apt install -y intel-opencl-icd opencl-c-headers


Isn't the dev package needed btw?

I was following this link https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html

Seems like intel-opencl-icd is for the libraries and opencl-c-headers for the headers?

AndreyPavlenko · 2024-07-30T15:19:50Z

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt

+
+if(NOT OpenCL_FOUND)
+    message(FATAL_ERROR "OpenCL not found.")
+endif()


Suggested change

if(NOT OpenCL_FOUND)

message(FATAL_ERROR "OpenCL not found.")

endif()

This is redundant when the REQUIRED option is used.

Thanks. Removed now.

Menooker · 2024-07-31T06:02:46Z

Hi @kurapov-peter , I have updated this PR to support upstream-style mgpu* wrapper APIs.

Menooker · 2024-07-31T06:39:21Z

include/gc/Transforms/Passes.td

@@ -32,7 +32,7 @@ def ConvertOneDNNGraphToLinalg : Pass<"convert-onednn-graph-to-linalg"> {
  ];
 }

-
+#ifdef GC_USE_GPU


@dchigarev Please help to view this change. Thx!. This is to totally disable xegpu passes when GPU is OFF. Just to make compiler & CI happy for CPU mode.

I'm good with this change

init

e3d32fa

Menooker added the WIP work in progress label Jul 29, 2024

Menooker requested review from kurapov-peter and dchigarev July 29, 2024 08:17

Menooker commented Jul 29, 2024

View reviewed changes

simplify

ad9752c

Menooker removed the WIP work in progress label Jul 29, 2024

Menooker changed the title ~~[WIP] Add OpenCL runtime~~ Add OpenCL runtime Jul 29, 2024

Menooker linked an issue Jul 29, 2024 that may be closed by this pull request

Add OpenCL runtime #190

Closed

kurapov-peter approved these changes Jul 29, 2024

View reviewed changes

dchigarev approved these changes Jul 29, 2024

View reviewed changes

AndreyPavlenko reviewed Jul 30, 2024

View reviewed changes

kurapov-peter reviewed Jul 30, 2024

View reviewed changes

AndreyPavlenko reviewed Jul 30, 2024

View reviewed changes

This was referenced Jul 30, 2024

Initial GPU pipeline #202

Open

Add ocloc and opencl runtime to CI #203

Closed

Mei, Yijie added 3 commits July 31, 2024 04:05

add upstream wrappers

38c270d

fix comments

6abb056

fix cmake

f9d5cb7

skip xegpu if not GC_USE_GPU

2da86f2

Menooker commented Jul 31, 2024

View reviewed changes

dchigarev self-requested a review July 31, 2024 06:46

dchigarev approved these changes Jul 31, 2024

View reviewed changes

kurapov-peter merged commit 0d1d9c6 into main Jul 31, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenCL runtime #191

Add OpenCL runtime #191

Menooker commented Jul 29, 2024 •

edited

Loading

Menooker Jul 29, 2024

dchigarev Jul 29, 2024 •

edited

Loading

Menooker Jul 29, 2024

kurapov-peter Jul 30, 2024

dchigarev Jul 30, 2024

Menooker Jul 31, 2024

AndreyPavlenko Jul 30, 2024

AndreyPavlenko Jul 30, 2024

Menooker Jul 31, 2024

kurapov-peter Jul 30, 2024

Menooker Jul 31, 2024

AndreyPavlenko Jul 30, 2024

Menooker Jul 31, 2024

Menooker commented Jul 31, 2024

Menooker Jul 31, 2024

dchigarev Jul 31, 2024

	target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti)
	target_compile_options (GcOpenclRuntime PUBLIC -fexceptions -frtti)

	target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES})
	target_link_libraries(GcOpenclRuntime PUBLIC ${OpenCL_LIBRARIES})

Add OpenCL runtime #191

Add OpenCL runtime #191

Conversation

Menooker commented Jul 29, 2024 • edited Loading

IMEX runtime wrapper

Upstream runtime wrapper

Choose a reason for hiding this comment

dchigarev Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Menooker commented Jul 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Menooker commented Jul 29, 2024 •

edited

Loading

dchigarev Jul 29, 2024 •

edited

Loading