Read this in Chinese: 中文.
ECE408 is a great course offered by UIUC on learning CUDA. However, for students not enrolled at UIUC, it can be challenging to follow along with the labs and milestone projects which require access to specific computing resources.
This project aims to make it easier to learn the material in ECE408 using your own local environment (GPU, system etc). It provides implementations of key assignments from the course, so you can get hands-on CUDA experience.
ECE408 Class Schedule: https://wiki.illinois.edu/wiki/display/ECE408/Class+Schedule ECE408 Labs and Project: https://wiki.illinois.edu/wiki/display/ECE408/Labs+and+Project
By completing this project, you will:
- Gain practical experience with CUDA
Even without access to UIUC's specific infrastructure, you can get hands-on practice with the key concepts from ECE408. Let me know if any part of the course content is unclear or missing - I'm happy to add more details to cover the full curriculum.
This is a repository containing subfolders for individual ECE408 CUDA labs and milestone projects.
To run them, follow these steps:
-
Clone this repository locally, either using git clone or by downloading the zip file from GitHub and extracting it.
-
Each lab and project has a CMake file for compilation. Go into the subfolder, create a build directory, and run the cmake .. command to generate the Makefile.
-
Use the make command to compile and generate the executable.
Here are separate explanations for the milestone projects and labs in English:
-
The project code is in mini-dnn/GPU_conv_baseline.cu. This is a baseline CUDA convolution layer implementation.
-
You need to modify this file and utilize learned CUDA optimization techniques to accelerate convolution computation, such as shared memory, loop unrolling, etc.
-
You can incrementally optimize in steps, implementing one technique at a time. Test and record speedups.
-
The goal is to maximize convolution layer execution speed as much as possible, achieving peak performance.
-
Commit your optimized code to GitHub for comparison with others' results.
-
Each lab has a separate folder containing code templates.
-
You need to edit the specified files and complete the code according to lab requirements.
-
The labs cover various aspects of CUDA programming, such as memory, threads, etc.
-
Run the 'run_datasets' to verify correct implementation.
There is 1 device supporting CUDA
Device 0 name: NVIDIA GeForce RTX 4090
- Computational Capabilities: 8.9
- Maximum global memory size: 25756696576
- Maximum constant memory size: 65536
- Maximum shared memory size per block: 49152
- Maximum block dimensions: 1024 x 1024 x 64
- Maximum grid dimensions: 2147483647 x 65535 x 65535
- Warp size: 32
This shows my system has 1 GPU (NVIDIA GeForce RTX 4090) available for CUDA computations, with 24GB of memory and compute capability 8.9.
The GPU can support up to 1024 x 1024 x 64 threads per block, and very large grid dimensions for parallel processing.
The output validates that my GPU setup is correctly configured for running CUDA programs and neural network training.