Project-2

A Study in Parallel Algorithms : Stream Compaction

Part2&3 Scan comparison:

As shown in the plot, the serial version is faster when the array size is small, but it becomes slower than GPU version when the array size is large, as it is single thread and will spend more time on loops. The global memory version is a little slower than the shared memory version, as writing/reading global memory is slower than writing/reading shared memory.

Part4 Stream compaction comparison:

As shown in the plot, the thrust version is faster than GPU version, but their speed gets closer when the array size becomes larger. I think GPU version is slower because my code is not optimized and it uses some global memory.

References

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html http://docs.nvidia.com/cuda/thrust/#axzz3EameA17V

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Project		Project
.gitignore		.gitignore
README.md		README.md
Scan.xlsx		Scan.xlsx
ScanComparison.bmp		ScanComparison.bmp
StreamCompactionComparison.bmp		StreamCompactionComparison.bmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-2

About

Releases

Packages

Languages

wulinjiansheng/Project2-StreamCompaction

Folders and files

Latest commit

History

Repository files navigation

Project-2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages