Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenCL runtime #191

Merged
merged 6 commits into from
Jul 31, 2024
Merged

Add OpenCL runtime #191

merged 6 commits into from
Jul 31, 2024

Conversation

Menooker
Copy link

@Menooker Menooker commented Jul 29, 2024

This PR introduces a OCL stream wrapper, and the implementation of gpux dialect's and upstream runtime with OCL.

IMEX runtime wrapper

If context and device is passed with nullptr, we will return a global stream object. The stream object is responsible for releasing the context. If both are given, we "borrow" the context and we don't release the context at the destructor of the stream object.

Future work:

  • IMEX upstream has a problem in gpux lowering. So we actually do nothing at gpuStreamDestroy. Need to fix it when the gpux issue is fixed.
  • gpux dialect should expose a way to pass context and queue, so that we can borrow them in our stream object.
  • The lifetime of the cl_program and cl_kernel is managed by the stream object in a std::vector<....>. The stream object itself is not thread-safe.
  • Shall we cache the cl_program and cl_kernel? Currently, every time we load GPU module and get kernel, we create a new instance of cl_program and cl_kernel. They can actually be cached to ease the load of host-side.

Upstream runtime wrapper

We use the same stream object class as used in IMEX wrapper. In upstream ROCM/Cuda/Sycl wrapper, they use raw queue pointer as the type of queue parameter in mgpu* APIS (like sycl::queue*).
Reasonale on why we still need a wrapped stream object over OCL cl_queue:

  • some of the OCL APIs needs cl_context, cl_device as well as cl_queue. And the mgpu* wrappers only passes a single parameter for queue.
  • we are using USM extension of OCL. The USM related APIs are not directly visible to users as normal C-library funcs. They need to queried from (cl_context -> cl_device -> cl_platform -> query extensions`). So we need to cache the extension function table for a "queue" object.
  • setting parameters of cl_kernel might take some time. We might be able to cache cl_kernel and their parameters in the wrapped stream object.

Future work & fix-me:

  • mgpuModuleLoad and mgpuModuleGetFunction does not have queue in the function parameters. However, OCL APIs needs the device and context. We are using a "thread_local" trick to pass the previously used queue as the context for the OCL API. It should be OK in most cases for single-thread & single-stream cases, but it is error-prone.
  • Check if dynamic shared memory works for mgpuLaunchKernel
  • We assume that all parameters passed to a GPU kernel has the same size of void*. Is it safe for OCL?

@Menooker Menooker added the WIP work in progress label Jul 29, 2024
vecLoadType = getVnniVector(tileType.getShape(), tileType.getElementType(),
*vnniConf);
}

SmallVector<Value> loadVec;
for (auto tile : loadTiles) {
auto loadOp = rewriter.create<xegpu::LoadNdOp>(
loc, vecLoadType, tile, vnniAxisAttr, transpose,
loc, vecLoadType, tile, vnniAxisAttr, transpose, nullptr,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchigarev Sorry I am late for the party. The IMEX's XeGPU dialect interfaces are a bit different from the upstream one. If we compile GC+IMEX+Patched LLVM, the compiler complains here. Here in the PR, I have updated the code, just to make the compiler happy, but not yet pass the UTs related to this part of code.

Copy link
Contributor

@dchigarev dchigarev Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Menooker thanks, I'm good with your current changes.

I'm aware that the current linalg-to-xegpu pass is incompatible with the patched XeGPU dialect from IMEX. I'm currently working on a separate PR to make them compatible.

p.s. an issue to track (#192)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that you are aware of the issue. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this breaks the normal build for GPU. I think we should ifdef it or smth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't include this pass in CPU builds at all (if -DGC_USE_GPU=1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't include this pass in CPU builds at all (if -DGC_USE_GPU=1)

I agree with you. Actually these changes in my PR will break the CPU CI. I will skip build the pass in cmake for CPU-only mode.

@Menooker Menooker removed the WIP work in progress label Jul 29, 2024
@Menooker Menooker changed the title [WIP] Add OpenCL runtime Add OpenCL runtime Jul 29, 2024
@Menooker Menooker linked an issue Jul 29, 2024 that may be closed by this pull request
lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt Outdated Show resolved Hide resolved
if(NOT CXX_HAS_FRTTI_FLAG)
message(FATAL_ERROR "CXX compiler does not accept flag -frtti")
endif()
target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti)
target_compile_options (GcOpenclRuntime PUBLIC -fexceptions -frtti)

lib/gc/ExecutionEngine/OpenCLRuntime/CMakeLists.txt Outdated Show resolved Hide resolved
)

message(STATUS "OpenCL Libraries: ${OpenCL_LIBRARIES}")
target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES})
target_link_libraries(GcOpenclRuntime PUBLIC ${OpenCL_LIBRARIES})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed. Also use PUBLIC as suggested.

* make sure the OpenCL runtime is installed in your system. You can either
install using OS-provided package (Ubuntu 22.04)
```sh
sudo apt install -y intel-opencl-icd opencl-c-headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the dev package needed btw?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following this link https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html

Seems like intel-opencl-icd is for the libraries and opencl-c-headers for the headers?

Comment on lines 2 to 5

if(NOT OpenCL_FOUND)
message(FATAL_ERROR "OpenCL not found.")
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(NOT OpenCL_FOUND)
message(FATAL_ERROR "OpenCL not found.")
endif()

This is redundant when the REQUIRED option is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed now.

@Menooker
Copy link
Author

Hi @kurapov-peter , I have updated this PR to support upstream-style mgpu* wrapper APIs.

@@ -32,7 +32,7 @@ def ConvertOneDNNGraphToLinalg : Pass<"convert-onednn-graph-to-linalg"> {
];
}


#ifdef GC_USE_GPU
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchigarev Please help to view this change. Thx!. This is to totally disable xegpu passes when GPU is OFF. Just to make compiler & CI happy for CPU mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change

@dchigarev dchigarev self-requested a review July 31, 2024 06:46
@kurapov-peter kurapov-peter merged commit 0d1d9c6 into main Jul 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenCL runtime
4 participants