OpenCL bindings for Julia
Julia interface for the OpenCL parallel computation API
This package aims to be a complete solution for OpenCL programming in Julia, similar in scope to PyOpenCL for Python. It provides a high level api for OpenCL to make programing GPU's and multicore CPU's much less onerous.
OpenCL.jl provides access to OpenCL API versions 1.0, 1.1, 1.2 and 2.0.
- PyOpenCL by Andreas Klockner
- oclpb by Sean Ross
- Boost.Compute by Kyle Lutz
- rust-opencl
OpenCL.jl has had contributions from many developers.
- Julia
v"0.4.x"
is supported on therelease-0.4
branch and the OpenCL.jl versionsv"0.4.x"
. Only bug-fixes will be applied. - Julia
v"0.5.x"
is supported on themaster
branch and the OpenCL.jl versionsv"0.5.x"
. - Julia
v"0.6.x"
is experimentally supported on themaster
branch and the OpenCL.jl versionsv"0.5.x"
.
- Julia
v"0.3.x"
was supported on OpenCL.jl versionsv"0.3.x"
. It should still be installable and work.
- Install an OpenCL driver. If you use OSX, OpenCL is already available
- Checkout the packages from the Julia repl
Pkg.add("OpenCL")
- OpenCL will be installed in your
.julia
directory cd
into your.julia
directory to run the tests and try out the examples- To update to the latest development version, from the Julia repl:
Pkg.update()
using LinearAlgebra
using OpenCL
const sum_kernel = "
__kernel void sum(__global const float *a,
__global const float *b,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
"
a = rand(Float32, 50_000)
b = rand(Float32, 50_000)
device, ctx, queue = cl.create_compute_context()
a_buff = cl.Buffer(Float32, ctx, (:r, :copy), hostbuf=a)
b_buff = cl.Buffer(Float32, ctx, (:r, :copy), hostbuf=b)
c_buff = cl.Buffer(Float32, ctx, :w, length(a))
p = cl.Program(ctx, source=sum_kernel) |> cl.build!
k = cl.Kernel(p, "sum")
queue(k, size(a), nothing, a_buff, b_buff, c_buff)
r = cl.read(queue, c_buff)
if isapprox(norm(r - (a+b)), zero(Float32))
@info "Success!"
else
@error "Norm should be 0.0f"
end
Here's a rough translation between the OpenCL API in C to this Julia version. Optional arguments are indicated by [name?]
(see clCreateBuffer
, for example). For a quick reference to the C version, see the Khronos quick reference card or the online specification.
Whenever there's an cl.info(object, :info_name)
, you can also call info_name(object)
. This will give the same information, but is documented and type stable.
C | Julia | Notes |
---|---|---|
clGetPlatformIDs |
cl.platforms() |
|
clGetPlatformInfo |
cl.info(platform, :symbol) |
Platform info: :profile , :version , :name , :vendor , :extensions |
clGetDeviceIDs |
cl.devices() , cl.devices(platform) , cl.devices(:type) |
Device types: :all , :cpu , :gpu , :accelerator , :custom , :default |
clGetDeviceInfo |
cl.info(device, :symbol) |
Device info: :driver_version , :version , :profile , :extensions , :platform , :name , :device_type , :has_image_support , :queue_properties , :has_queue_out_of_order_exec , :has_queue_profiling , :has_native_kernel , :vendor_id , :max_compute_units , :max_work_item_size , :max_clock_frequency , :address_bits , :max_read_image_args , :max_write_image_args , :global_mem_size , :max_mem_alloc_size , :max_const_buffer_size , :local_mem_size , :has_local_mem , :host_unified_memory , :available , :compiler_available , :max_work_group_size , :max_work_item_dims , :max_parameter_size , :profiling_timer_resolution , :max_image2d_shape , :max_image3d_shape |
clCreateContext |
cl.context(queue) , cl.context(CLMemObject) , cl.context(CLArray) |
|
clReleaeContext |
cl.release! |
C | Julia | Notes |
---|---|---|
clCreateBuffer |
cl.Buffer(type, context, [length?]; [hostbuf?]) , cl.Buffer(type, context, flags, [length?]; [hostbuf?]) |
Memory flags: :rw , :r , :w , :use , :alloc , :copy |
clEnqueueCopyBuffer |
cl.copy!(queue, destination, source) |
|
clEnqueueFillBuffer |
cl.enqueue_fill_buffer(queue, buffer, pattern, offset, nbytesm wait_for) |
|
clEnqueueReadBuffer |
cl.enqueue_read_buffer(queue, buffer, hostbuf, dev_offset, wait_for, is_blocking) |
|
clEnqueueWriteBuffer |
cl.enqueue_write_buffer(queue, buffer, hostbuf, byte_count, offset, wait_for, is_blocking) |
C | Julia | Notes |
---|---|---|
clCreateProgramWithSource |
cl.Program(ctx; source) |
|
clCreateProgramWithBinaries |
cl.Program(ctx; binaries) |
|
clReleaseProgram |
cl.release! |
|
clBuildProgram |
cl.build!(progrm, options) |
|
clGetProgramInfo |
cl.info(program, :symbol) |
Program info: :reference_count , :devices , :context , :num_devices , :source , :binaries , :build_log , :build_status |
C | Julia | Notes |
---|---|---|
clCreateKernel |
cl.Kernel(program, "kernel_name") |
|
clGetKernelInfo |
cl.info(kernel, :symbol) |
Kernel info: :name , :num_args , :reference_count , :program , :attributes |
clEnqueueNDRangeKernel |
cl.enqueue_kernel(queue, kernel, global_work_size) , cl.enqueue_kernel(queue, kernel, global_work_size, local_work_size; global_work_offset, wait_on) |
|
clSetKernelArg |
cl.set_arg!(kernel, idx, arg) |
idx starts at 1 |
clCreateUserEvent |
cl.UserEvent(ctx; retain) |
|
clGetEventInfo |
cl.info(event, :symbol) |
Event info: :context , :command_queue , :reference_count , :command_type , :status , :profile_start , :profile_end , :profile_queued , :profile_submit , :profile_duration |
clWaitForEvents |
cl.wait(event) , cl.wait(events) |
|
clEnqueueMarkerWithWaitList |
cl.enqueue_marker_with_wait_list(queue, wait_for) |
|
clEnqueueBarrierWithWaitList |
cl.enqueue_barrier_with_wait_list(queue, wait_for) |