Skip to content

Latest commit

 

History

History

07-Hello-CUDA

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Current screenshot

Project 7: Hello CUDA

Table of Contents

CUDA C++

CUDA C++ is a language that is very similar to C++. When compiled by nvcc, CUDA can be linked with standard C++ files to create GPU-accelerated programs.

CUDA implementation code usually has the file extension .cu, which tells CMake that it should use nvcc to compile the code. CUDA headers use the standard .h files used by C++ and C implementation code, so C++ is easily linkable with CUDA C++.

Walkthrough

The heart of this simple CUDA application is a function that adds two numbers in an array.

This array can be very large (~1M elements in this program). Such a large number of elements would take a significant amount of time on a single-core CPU, but CUDA allows the operation to be broken into hundreds of smaller threads.

Here's how it's done:

// function to add the elements of two arrays
__global__
void add(int n, float* x, float* y)
{  
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}
  • __global__ tells the compiler that this function is intended to be run on a GPU and called from the CPU. __global__ functions are known as kernels. host code (code that runs on the CPU) calls kernels that run device code (code that runs on the GPU)
  • The CUDA compiler provides its own variables to __global__ functions. Three of these are:
    • threadIdx.x (the thread ID)
    • blockIdx.x (the block ID of that thread).
    • blockDim.x (the number of blocks )
    • gridDim.x (the number of threads in the grid)
  • The structure of this for loop is so common that it has a name: the grid-stride loop.

Resources

Installing CUDA Tools

Writing Applications in CUDA C++