-
Notifications
You must be signed in to change notification settings - Fork 2
Laboratory 5
The objectives of the current laboratory session are the following:
- Fundamentals of OpenCL API:
- Environment setup
- Host <-> Device communication
- Simple kernel programming
- Host-Device synchronization basics
- Basic kernel programming
- Gaussian Blurring
- Sobel Edge Detection
Before starting with the execises, we will install the CImg Library. The CImg Library is a small and open-source C++ toolkit for image processing. The library can be downloaded from: http://cimg.eu/files/CImg_latest.zip. You can also find more documentation and tutorials at http://cimg.eu/
In order to use this library you should include the following code in your .cpp file:
#include "CImg.h" // Path to the previous download directory
using namespace cimg_library;
Let's start by writing the following simple code:
#include "CImg.h"
using namespace cimg_library;
int main() {
CImg<unsigned char> img(640,400,1,3); // Define a 640x400 color image with 8 bits per color component
img.fill(0); // Set pixel values to 0 (color : black)
unsigned char purple[] = { 255,0,255 }; // Define a purple color
img.draw_text(100,100,"Hello World",purple); // Draw a purple "Hello world" at coordinates (100,100)
img.display("My first CImg code"); // Display the image in a display window
return 0;
}
In order to compile you need the following order
$prompt> g++ source.cpp -o executable -I[PATH_TO_CIMG] -lm -lpthread -lX11
Next, we show a code that will load an image, dump some statistics about the loaded image and finally accessing to a given pixel.
#define cimg_use_jpeg
#include <iostream>
#include "CImg.h"
using namespace cimg_library;
int main(){
CImg<unsigned char> img("image.jpg"); // Load image file "image.jpg" at object img
std::cout << "Image width: " << img.width() << "Image height: " << img.height() << "Number of slices: " << img.depth() << "Number of channels: " << img.spectrum() << std::endl; //dump some characteristics of the loaded image
int i = XXX;
int j = XXX;
std::cout << std::hex << (int) img(i, j, 0, 0) << std::endl; //print pixel value for channel 0 (red)
std::cout << std::hex << (int) img(i, j, 0, 1) << std::endl; //print pixel value for channel 1 (green)
std::cout << std::hex << (int) img(i, j, 0, 2) << std::endl; //print pixel value for channel 2 (blue)
img.display("My first CImg code"); // Display the image in a display window
return 0;
}
In order to compile you need the following order_
$prompt> g++ source.cpp -o executable -I[PATH_TO_CIMG] -lm -lpthread -lX11 -ljpeg
Task. Write a code that reads an image from a file, dumps image data into an array and prompts the value of a chosen pixel in the screen. There is an example of a image in the repository, in any case, you can chose whatever image you want.
Task. Extend the previous work by modifying the original image with a crossing blue line in the middle of the image and check the result by displaying it on the screen.
In this step, we present the basic OpenCL API to write code in order to: setup the environment, communicate host<->device, simple kernel launching, and synchronization basics.
Along this section, the following resources will be helpfull:
- Abbreviated reference card of the OpenCL API version 2.1 (https://www.khronos.org/files/opencl21-reference-guide.pdf)
- Reference manual of the OpenCL API version 2.1 (https://www.khronos.org/registry/OpenCL/specs/opencl-2.1.pdf)
In order to setup the environment, from the command line we have to execute the script at /opt/intel/oneapi/setvars.sh
.
$prompt> . /opt/intel/oneapi/setvars.sh
After that, you can check with the clinfo
command that some devices have been found in the system.
In this section, we will develop a basic program to set up the basic OpenCL environment and to familiarize with the API for that purpose. To begin with, please have a look to basic_environ.c
. In that file, you should find an "almost" complete scheme of an OpenCL program until the creation of the command_queue
. The first task is completing the code in basic_environ.c
, start by completing each appearance of "/***???***/". Pay attention to the selection of the device to use. Next, complete this basic OpenCL program and dump on the screen more information about the platform and the devices.
In order to compile you need the following order
$prompt> g++ source.cpp -o executable -lOpenCL
Question. How many platforms has this computing system? How many devices has each one of these platforms?
Once the following steps from 1 to 4 are completed (they refer to the creation of a command_queue for the selected device), in this section we will proceed with the basics to create a new kernel to be launched in the devices. Let's proceed with the following steps, but take into account that some portions of the code must be completed /***???***/:
- Create a new file with the kernel you would like to execute into the device. It can be named for example
kernel.cl
This kernel simply calculates the power of two of the elements of an input vector. Feel free to write any other simple kernel you would like to test.
__kernel void pow_of_two(
__global float *in,
__global float *out,
const unsigned int count){
int i = get_global_id(/***???***/);
if(i < count){
out[i] = in[i] * in[i];
}
}
- Add code to the file
basic_environ.c
to load the source code ofkernel.cl
in an array of characters to be processed by the OpenCL runtime. In particular, we will use the procedure clCreateProgramWithSource(). Consider the following code:
// Calculate size of the file
FILE *fileHandler = fopen(/***???***/, "r");
fseek(/***???***/, 0, SEEK_END);
size_t fileSize = ftell(fileHandler);
rewind(fileHandler);
// read kernel source into buffer
char * sourceCode = (char*) malloc(fileSize + 1);
sourceCode[fileSize] = '\0';
fread(sourceCode, sizeof(char), fileSize, fileHandler);
fclose(fileHandler);
// create program from buffer
program = clCreateProgramWithSource(/***???***/);
cl_error(err, "Failed to create program with source\n");
free(sourceCode);
- Build the source code of the program for a given device with the procedure clBuildProgram(). In addition to this, it is convenient to ensure the correctness in the dumped log after the compilation process.
// Build the executable and check errors
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS){
size_t len;
char buffer[2048];
printf("Error: Some error at building process.\n");
clGetProgramBuildInfo(/***???***/);
printf("%s\n", buffer);
exit(-1);
}
- Create a kernel with the selected kernel procedure from the file
kernel.cl: __kernel void pow_of_two ( ... )
.
// Create a compute kernel with the program we want to run
kernel = clCreateKernel(/***???***/);
cl_error(err, "Failed to create kernel from the program\n");
-
Create and initialize the input and output arrays at the host memory. Taking into account the kernel definition, the data type of these arrays should be floating point.
-
Create the input and output arrays at the device memory. Computing device will refer to these objects as the source and destination of the power of two calculations, and they will be referred as arguments of the kernel function at the file
kernel.cl
.
// Create OpenCL buffer visible to the OpenCl runtime
cl_mem in_device_object = clCreateBuffer(context, CL_MEM_READ_ONLY, /***???***/, NULL, &err);
cl_error(err, "Failed to create memory buffer at device\n");
cl_mem out_device_object = clCreateBuffer(context, CL_MEM_WRITE_ONLY, /***???***/, NULL, &err);
cl_error(err, "Failed to create memory buffer at device\n");
- Completely copy floating point values in the input array at the host memory to the corresponding input array at the device memory.
// Write date into the memory object
err = clEnqueueWriteBuffer(command_queue, in_device_object, CL_TRUE, 0, sizeof(float) * count, \\
/***???***/, 0, NULL, NULL);
cl_error(err, "Failed to enqueue a write command\n");
- Set the proper arguments to the kernel OpenCL object, as they are specified in the kernel function in
kernel.cl
. First, a pointer to the input memory at the device which should hold the source values on which the power of two operation will be performed. Second, a pointer to the output memory at the device which should hold the result values after finishing the computations in the computation device. Third, the number of elements of the input and output arrays.
// Set the arguments to the kernel
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &in_device_object);
cl_error(err, "Failed to set argument 0\n");
err = clSetKernelArg(kernel, 1, sizeof(cl_mem), &out_device_object);
cl_error(err, "Failed to set argument 1\n");
err = clSetKernelArg(kernel, 2, /***???***/);
cl_error(err, "Failed to set argument 2\n");
- After all the previous steps, we are ready to launch the kernel function to be executed at the computing device.
// Launch Kernel
local_size = 128;
global_size = /***???***/;
err = clEnqueueNDRangeKernel(/***???***/, 1, NULL, &global_size, &local_size, 0, NULL, NULL);
cl_error(err, "Failed to launch kernel to the device\n");
- Completely read the results of the calculations from the output device memory object and copy them to host memory output array.
// Read data form device memory back to host memory
err = clEnqueueReadBuffer(/***???***/, CL_TRUE, 0, /***???***/, 0, NULL, NULL);
cl_error(err, "Failed to enqueue a read command\n");
-
Write code to check the correctness of the execution.
-
Release all the OpenCL memory objects allocated along this program.
clReleaseMemObject(/***???***/);
clReleaseMemObject(/***???***/);
clReleaseProgram(/***???***/);
clReleaseKernel(/***???***/);
clReleaseCommandQueue(/***???***/);
clReleaseContext(/***???***/);
In this section we propose two basic problems[^1] to be implemented by launching an OpenCL kernel with the core computations for each one. As a result and in order to check the correctness of each problem, plot on the screen the corresponding results, either numerical or transformed images.
A Gaussian filter is an image processing tool used to smooth or blur an image, often for reducing noise or preparing an image for further processing like edge detection. The Gaussian filter applies a weighted average to each pixel, where pixels closer to the center of the filter have more influence than those further away. This method preserves image details better than simple averaging.
- Gaussian Kernel: The filter uses a kernel (a matrix of values) that approximates the shape of a Gaussian function:
Here,
-
Convolution Process: The Gaussian kernel is convolved over the image, meaning it is slid across the image and a weighted average is calculated for each pixel based on the surrounding pixels. This process produces a smoothing effect that reduces sharp changes in intensity, like noise, while preserving edges.
-
Smoothing Effect: By adjusting σσ and the kernel size, the filter can provide a variable level of smoothness. This can reduce noise while keeping the general structure and edges of the image more intact than simpler averaging methods.
A typical 3x3 Gaussian kernel with
More information at https://www.southampton.ac.uk/~msn/book/new_demo/gaussian/ and https://www.southampton.ac.uk/~msn/book/
Sobel Edge Detection is a common technique used in image processing to detect edges by highlighting areas of rapid intensity change. It applies the Sobel operator, which calculates the gradient of the image intensity at each pixel, emphasizing regions with high spatial frequency—typically corresponding to edges. This method is simple, efficient, and can detect both horizontal and vertical edges.
- The Sobel operator uses two convolution kernels (or filters):
-
Sobel Edge Detection Steps
-
Apply Convolution: The image is convolved with both
$Gx$ and$Gy$ kernels to get two gradient images: one for the horizontal gradient ($Gx$ ) and one for the vertical gradient ($Gy$ ). -
Calculate Gradient Magnitude: The magnitude of the gradient at each pixel is computed as:
-
Following links can be helpfull: https://github.com/petermlm/SobelFilter/tree/master and https://medium.com/@twinnroshan/understanding-and-implementing-edge-detection-in-c-with-sobel-operator-31159f26587c
[^1]: Feel free to implement any other problem of your consideration as far as the complexity is similar.
Overall Task. For this final part of the lab description, measure or elaborate the following metrics for at least one of the previous exercises. These metrics are intended for characterizing the application itself, specially the OpenCL key points. Consider the methodological approaches seen in the course or anything else you consider. In any case, please, describe your approach in the report.
- Execution time of the overall program
- Execution time of the kernel
- Bandwidth to/from memory to/from kernel. Amount data interchanged with memory for every second. From/to host memory to/from device memory and from/to kernel to/from memory
- Throughput of the kernel. Effective work per second
- Memory footprint of the program. Pay special attention to the OpenCL kernel
The following documents and resources must be delivered at Moodle. The material requested below will refer to at least one of the two implementations proposed in the "Basic kernel programming" section of this lab: Gaussian Filter and Edge Detection. The rest of the lab script sections and its questions are intended to help you with this final part, please, do not address that part in the following report.
Specifically, it must be delivered:
-
A report detailing: reasoned answers to the questions in section "Basic kernel programming", rationales behind the design, implementation and tuning decisions taken, and any additional information you deem significant. Clear and thorough explanations of these elements are critical for the lab's grading criteria.
-
Length: your choice. Not too much, not too little.
-
Source Files used.
Deadline: The deadline for the submission is December, 8th 2024.
Please share your feedback on this laboratory with me, thanks.
Programming and Architecture of Computing Systems