Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Xueyin Wan #24

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
67 changes: 60 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,63 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
####University of Pennsylvania
####CIS 565: GPU Programming and Architecture

### (TODO: Your README)
##Project 1 - Flocking

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
* Xueyin Wan
* Tested on: Windows 10, i7-4870 @ 2.50GHz 16GB, NVIDIA GeForce GT 750M 2GB (Personal Laptop)

==================================================================
###Final Result Screenshot
![alt text](https://github.com/xueyinw/Project1-CUDA-Flocking/blob/master/images/Xueyin_Performance.gif "Xueyin's Performance Analysis")

####Parameters:
* Number of boids = 15000
* dT = 0.2
* Algorithm used in the screenshot : Coherent Uniform Grid
* BlockSize = 128
* rule1Distance = 5.0f, rule1Scale = 0.01f
* rule2Distance = 3.0f, rule2Scale = 0.1f
* rule3Distance = 5.0f, rule3Scale = 0.1f
* maxSpeed = 1.0f
* scene_scale = 100.0f

==================================================================
###Performance Analysis


I choose to use 1st method : Disable visualization (#define VISUALIZE to 0 ) to measure performance.
###Without Visualization
####(#define VISUALIZE 0)
| Number of boids | 5000 | 15000 | 25000 | 35000 | 45000 | 55000 | 65000 | 75000 | 85000 | 95000 |
| ------------- |:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| -----:|
| Brute Force neighbor search FPS | 57.7 | 6.6 | 2.2 | | | | | | | |
| Uniform Grid neighbor search FPS | 580 | 250 | 160 | 108.4 | 80.4 | 63.6 | 53.2 | 42.7 | 30.5 | 25.7 |
| Coherent Uniform Grid neighbor search FPS | 680 | 300 | 180 | 130 | 100.7 | 78.3 | 67.4 | 57.4 | 49.5 | 39.7 |

We could see the result from this visualized chart I made.
![alt text](https://github.com/xueyinw/Project1-CUDA-Flocking/blob/master/images/AlgorithmComparision.png "Xueyin's Updated Chart")

We could see the comparison of the FPS situation between Brute Search, Uniform Grid and Coherent Uniform Grid when boids' number increases.

###Questions & Answer
####1. For each implementation, how does changing the number of boids affect performance? Why do you think this is?
Answer:

* Brute Force neighbor search algorithm: as the number of boids increases, frame-rate decreases very fast
* Uniform Grid neighbor search: the number of boids could as many as almost 80000 as the fps keeps at 60, performance is much better than Brute Force neighbor search algorithm.
* Coherent Uniform Grid neighbor search: the number of boids could as many as almost 100000 as the fps keeps at 60, performance is much better than Brute Force neighbor search algorithm and a little better than Uniform Grid neighbor search.



####2.For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

Answer:

* Generally speaking, when block count decreases and block size increases , the performance will be better.
* But in order to get a great performance, we should make a balance between block count and block size, and set their value wisely in order to improve memory performance.

####3. For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
Answer:

* My answer is yes. As my first two tables at Performance Analysis part, we can see that Coherent Uniform Grid neighbor search is better than Uniform Grid neighbor search. When writing codes to implement Coherent Uniform Grid neighbor search in part 2.3 , I rearranged the boid data itself so that all the velocities and positions of boids in one cell were also contiguous in memory, so this data can be accessed directly and much more convenient than Uniform Grid neighbor search in part 2.1 . The result is as I expected, since GPU performance will be better when dealing with continuous memory.
Binary file added images/AlgorithmComparision.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Xueyin_Performance.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_30
)
Loading