This repository is the implementation of Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs
+ Ubuntu 18.04
+ cuda == 10.2
+ cuDNN == 7.6
- CUDA == 12.2
- cuDNN == 8.8+
For using CMake and C++ you can use clang, llvm library can be used in addition:
- llvm-16
- clang-16+
- Vgg-16
running ECR for convolution layer
cd OCPA/ECR/ECR/time_vgg
nvcc batchedECR.cu -o batchedECR.out
./batchedECR.out 32
running PECR for convolution+pooling layer
cd OCPA/PECR/pecr/time_vgg
nvcc batchedPECR.cu -o batchedPECR.out
./batchedPECR.out 32
running cudnn(using tensor core) for convolution layer
cd OCPA/ECR/cudnn/time_vgg
make
./cudnn 32
running cudnn(using tensor core) for convolution+pooling layer
cd OCPA/PECR/cudnn/time_vgg
make
./cudnn 32
- Resnet-50
running ECR for convolution layer
cd OCPA/ECR/ECR/time_resnet
nvcc batchedECR.cu -o batchedECR.out
./batchedECR.out 32
running PECR for convolution+pooling layer
cd OCPA/PECR/pecr/time_resnet
nvcc batchedPECR.cu -o batchedPECR.out
./batchedPECR.out 32
running cudnn(using tensor core) for convolution layer
cd OCPA/ECR/cudnn/time_resnet
make
./cudnn 32
running cudnn(using tensor core) for convolution+pooling layer
cd OCPA/PECR/pecr/time_resnet
make
./cudnn 32
We can get the running time of other algorithms by analogy with the above methods using cuDNN.
The Vgg-16 and Resnet-50 speedup effects can be obtained by running programs under the folder OCPA/speedup
. As shown in the following figure: