CUDA Stream Compaction

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2

Name: Zhan Xiong Chin
Tested on: Windows 7 Professional, Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70 GHz 3.70 GHz, GTX 1070 8192MB (SIG Lab)

Overview

This implements a GPU-based scan (i.e. computes the prefix sums of an array) and stream compaction (i.e. moves nonzero elements to the front of array). There are versions for a CPU-based scan, a naive GPU-based scan (uses O(nlogn) additions) and a work-efficient GPU-based scan (uses O(n) additions).

Build Instructions

See here

Performance analysis

Milliseconds to calculate prefix sums for arrays of given sizes:

Log2 of nanoseconds for same data:

The timings above did not include the time needed to copy arrays onto device.

For small arrays, the naive algorithm appears to be the fastest, though this may be due to the inability to effectively time CPU execution for times smaller than 1 millisecond. The CPU algorithm still beats out the naive algorithm for most of the timings, suggesting that the extra logn additions needed is slowing down the execution of the naive algorithm significantly.

The efficient algorithm is approximately 3 times faster than the naive algorithm and twice as fast as the CPU algorithm. This is in line with the O(n) operations needed by both of them; the GPU algorithm is making better use of its multiple cores to achieve this speedup.

Compared to the thrust-based implementation, all algorithms beat it for small arrays, but it is significantly faster than all other algorithms for large arrays. Based on the large number of registers used and the small grid size, it may be using the work-efficient algorithm, but with a larger base than 2 (e.g. ternary or quarternary tree).

Test output

S:\cis565\Project2-Stream-Compaction\build>Release\cis565_stream_compaction_test
.exe

****************
** SCAN TESTS **
****************
    [  38  19  38  37   5  47  15  35   0  12   3   0  42 ...  10   0 ]
==== cpu scan, power-of-two ====
    [   0  38  57  95 132 137 184 199 234 234 246 249 249 ... 205473618 20547362
8 ]
==== cpu scan, non-power-of-two ====
    [   0  38  57  95 132 137 184 199 234 234 246 249 249 ... 205473252 20547325
5 ]
    passed
==== naive scan, power-of-two ====
    [   0  38  57  95 132 137 184 199 234 234 246 249 249 ... 205473618 20547362
8 ]
    passed
==== naive scan, non-power-of-two ====
    passed
==== work-efficient scan, power-of-two ====
    passed
==== work-efficient scan, non-power-of-two ====
    passed
==== thrust scan, power-of-two ====
    passed
==== thrust scan, non-power-of-two ====
    passed

*****************************
** STREAM COMPACTION TESTS **
*****************************
    [   2   3   2   1   3   1   1   1   2   0   1   0   2 ...   0   0 ]
==== cpu compact without scan, power-of-two ====
    [   2   3   2   1   3   1   1   1   2   1   2   1   1 ...   2   2 ]
    passed
==== cpu compact without scan, non-power-of-two ====
    [   2   3   2   1   3   1   1   1   2   1   2   1   1 ...   1   3 ]
    passed
==== cpu compact with scan ====
    [   2   3   2   1   3   1   1   1   2   1   2   1   1 ...   2   2 ]
    passed
==== work-efficient compact, power-of-two ====
    passed
==== work-efficient compact, non-power-of-two ====
    passed

S:\cis565\Project2-Stream-Compaction\build>

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
cmake		cmake
images		images
src		src
stream_compaction		stream_compaction
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md
cis565_stream_compaction_test.launch		cis565_stream_compaction_test.launch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Stream Compaction

Overview

Build Instructions

Performance analysis

Test output

About

Releases

Packages

Languages

czxcjx/Project2-Stream-Compaction

Folders and files

Latest commit

History

Repository files navigation

CUDA Stream Compaction

Overview

Build Instructions

Performance analysis

Test output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages