-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenCL runtime #191
Add OpenCL runtime #191
Conversation
vecLoadType = getVnniVector(tileType.getShape(), tileType.getElementType(), | ||
*vnniConf); | ||
} | ||
|
||
SmallVector<Value> loadVec; | ||
for (auto tile : loadTiles) { | ||
auto loadOp = rewriter.create<xegpu::LoadNdOp>( | ||
loc, vecLoadType, tile, vnniAxisAttr, transpose, | ||
loc, vecLoadType, tile, vnniAxisAttr, transpose, nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchigarev Sorry I am late for the party. The IMEX's XeGPU dialect interfaces are a bit different from the upstream one. If we compile GC+IMEX+Patched LLVM, the compiler complains here. Here in the PR, I have updated the code, just to make the compiler happy, but not yet pass the UTs related to this part of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that you are aware of the issue. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this breaks the normal build for GPU. I think we should ifdef it or smth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't include this pass in CPU builds at all (if -DGC_USE_GPU=1
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't include this pass in CPU builds at all (if
-DGC_USE_GPU=1
)
I agree with you. Actually these changes in my PR will break the CPU CI. I will skip build the pass in cmake for CPU-only mode.
if(NOT CXX_HAS_FRTTI_FLAG) | ||
message(FATAL_ERROR "CXX compiler does not accept flag -frtti") | ||
endif() | ||
target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_compile_options (opencl-runtime PUBLIC -fexceptions -frtti) | |
target_compile_options (GcOpenclRuntime PUBLIC -fexceptions -frtti) |
) | ||
|
||
message(STATUS "OpenCL Libraries: ${OpenCL_LIBRARIES}") | ||
target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_link_libraries(opencl-runtime PRIVATE ${OpenCL_LIBRARIES}) | |
target_link_libraries(GcOpenclRuntime PUBLIC ${OpenCL_LIBRARIES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed. Also use PUBLIC
as suggested.
* make sure the OpenCL runtime is installed in your system. You can either | ||
install using OS-provided package (Ubuntu 22.04) | ||
```sh | ||
sudo apt install -y intel-opencl-icd opencl-c-headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the dev
package needed btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following this link https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html
Seems like intel-opencl-icd
is for the libraries and opencl-c-headers
for the headers?
|
||
if(NOT OpenCL_FOUND) | ||
message(FATAL_ERROR "OpenCL not found.") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(NOT OpenCL_FOUND) | |
message(FATAL_ERROR "OpenCL not found.") | |
endif() |
This is redundant when the REQUIRED option is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Removed now.
Hi @kurapov-peter , I have updated this PR to support upstream-style |
@@ -32,7 +32,7 @@ def ConvertOneDNNGraphToLinalg : Pass<"convert-onednn-graph-to-linalg"> { | |||
]; | |||
} | |||
|
|||
|
|||
#ifdef GC_USE_GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchigarev Please help to view this change. Thx!. This is to totally disable xegpu passes when GPU is OFF. Just to make compiler & CI happy for CPU mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this change
This PR introduces a OCL stream wrapper, and the implementation of gpux dialect's and upstream runtime with OCL.
IMEX runtime wrapper
If context and device is passed with nullptr, we will return a global stream object. The stream object is responsible for releasing the context. If both are given, we "borrow" the context and we don't release the context at the destructor of the stream object.
Future work:
gpuStreamDestroy
. Need to fix it when the gpux issue is fixed.context
andqueue
, so that we can borrow them in our stream object.std::vector<....>
. The stream object itself is not thread-safe.Upstream runtime wrapper
We use the same stream object class as used in IMEX wrapper. In upstream ROCM/Cuda/Sycl wrapper, they use raw queue pointer as the type of
queue
parameter inmgpu*
APIS (likesycl::queue*
).Reasonale on why we still need a wrapped stream object over OCL
cl_queue
:cl_context
,cl_device
as well ascl_queue
. And themgpu*
wrappers only passes a single parameter forqueue
.cl_kernel
might take some time. We might be able to cachecl_kernel
and their parameters in the wrapped stream object.Future work & fix-me:
mgpuModuleLoad
andmgpuModuleGetFunction
does not havequeue
in the function parameters. However, OCL APIs needs the device and context. We are using a "thread_local" trick to pass the previously usedqueue
as the context for the OCL API. It should be OK in most cases for single-thread & single-stream cases, but it is error-prone.mgpuLaunchKernel
void*
. Is it safe for OCL?