Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 4: Jian Ru #9

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 68 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,76 @@ CUDA Rasterizer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Jian Ru
* Tested on: Windows 10, i7-4850 @ 2.3GHz 16GB, GT 750M 2GB (Personal)

### (TODO: Your README)
### Overview

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
**Summary:** A basic software rasterizer implemented in CUDA

**Features:**

* Basic pipeline
* Vertex assembly and vertex shader
* Primitive assembly
* Rasterization
* Depth test
* Fragment shader
* SSAO
* Tile-based rasterization
* Texture mapping with perspective-correct interpolation and bilinear filtering
* Back face culling

**Result:**

![](renders/demo.gif)

### Performance Analysis

* SSAO
* Implemented after due date. Please do not count it into my grade because I don't want to use my late days.
* Screen space ambient occlusion darkens area where occlusion is likely to happen. The main idea is that you take a bunch of samples in the upper hemisphere and at run time, for each pixel, you transform the samples into eye space, randomly rotate them about z-axis, and align the z-axis of their local frame with the eye-space normal of the pixel. Then you project the samples onto screen space and compare their depth with the depth of the pixel at that point (need to use LINEAR depth). A sample is occluded if the pixel is before it (less deep). The goal is to compute the ratio of samples being occluded and darken pixel color accordingly.
* SSAO is not physically accurate but it is usually visually plausible and cheap enough for real-time rendering. The effect is usually subtle but does enhance realism of object appearance.
* In the test, I used 64 samples and a 4x4 noise texture. The cost is about 3 ms. This cost scales with render target resolution and is not affected by scene complexity.
![](renders/ssao_perf.png)

| 2_cylineder_engine | di |
| --- | --- |
| ![](renders/ssao_engine.png) | ![](renders/ssao_di.png) |

| SSAO On | SSAO Off |
| --- | --- |
| ![](renders/ssao_demo1.png) | ![](renders/ssao_demo2.png) |
* Tile-based rasterization
* The following graph shows the performance of tile-based (with/without culling) and per-primitive rasterization tested on various 3D models. It shows that tile-based approach has better performance on models that have fewer but bigger triangles but it is slightly slower than the basic approach when the scene is composed of small triangles. Even if so, tile-based approach is more stable in terms of execution time and totally eliminates write conflicts. The drawback is that we need more memory to mantain a list of triangles for each tile. The size of extra memory required can be quite large if we choose to be safe and we may risk data corruption if we make it too small. Back face culling is important here because it can significantly reduce the number of triangles each tile may overlap and thus reduce the size of triangle list and improves performance.
![](renders/ras_perf.png)
* Performance vs. tile size
* I experimented several possible tile sizes and found that an 8x8 tile size works the best on my computer
![](renders/ts_perf.png)
* According to the perf report from NSight, my tile-based rasterization kernel is expensive and resource demanding. It uses 63 register per thread (I tried to optimize on register usage but could not reduce the number further. I guess nvcc may have done the optimization already) and contains relatively complex logics. Given this nature, incrasing tile size doesn't improve occupancy before each SM has almost saturated (occupancy data from NSight agrees on this explanation), however, it increases the number of triangles each tile needs to process. Increasing execution complexity without incrasing occupancy, that is why increasing tile size from 8 doesn't improve performance. On the other hand, due to the active block limit of my GPU, further decrease tile size cannot increase the number of active blocks on each SM (occupancy data from NSight shows decreased occupancy from 50% to 25%). For a tile size of 4, each block contains only 16 threads, which is less than warp size. So half of the threads in a warp are doing nothing. This is also another reason not to decrease tile size further.
![](renders/occupancy_graph.png)

| Tile Size 4 | Tile Size 8 |
| --- | --- |
| ![](renders/ts4_occupancy.png) | ![](renders/ts8_occupancy.png) |

| Tile Size 16 | Tile Size 32 |
| --- | --- |
| ![](renders/ts16_occupancy.png) | ![](renders/ts32_occupancy.png) |
* Bilinear interpolation vs. Nearest Neighbor
* Based on experimentation, bilinear texture filtering has small hit on performance but increase texture quality especially when the object is close to the camera.
![](renders/interp_perf.png)

| Bilinear | Nearest Neighbor |
| --- | --- |
| ![](renders/bilerp.png) | ![](renders/nn.png) |

* Perspective correctness
* Since the Barycentric coordinates of triangles are calculated in screen space, we cannot use it to perform linear interpolation directly. The main reason is that perspective projection doesn't preserver distance (parallel lines will converge to a single point after perspective projection). Luckily, people have found out that doing linear interpolation on the reciprocals of vertex depths (in eye space) does give perspective correct results for depth. For other vertex attributes, we can linearly interpolate each attribute scaled by the reciprocal of corresponding depth value and multiply the result by the correctly interpolated depth. So this leads to the implementation of perspective correct interpolation of vertex attributes (the effect is obvious when you have large textured triangles).

| With Perspective Correction | Without Perspective Correction |
| --- | --- |
| ![](renders/perspective_correct.png) | ![](renders/perspective_incorrect.png) |


### Credits
Expand Down
Binary file modified external/lib/win/glfw3.lib
Binary file not shown.
Binary file modified external/lib/win/glfw3dll.lib
Binary file not shown.
301 changes: 301 additions & 0 deletions gltfs/checkerboard/checkerboard.gltf
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
{
"accessors": {
"accessor_index_0": {
"bufferView": "bufferView_1",
"byteOffset": 0,
"byteStride": 0,
"componentType": 5123,
"count": 6,
"type": "SCALAR",
"min": [
0
],
"max": [
3
]
},
"accessor_position": {
"bufferView": "bufferView_0",
"byteOffset": 0,
"byteStride": 0,
"componentType": 5126,
"count": 4,
"min": [
-1,
0,
-1
],
"max": [
1,
0,
1
],
"type": "VEC3"
},
"accessor_uv": {
"bufferView": "bufferView_0",
"byteOffset": 48,
"byteStride": 0,
"componentType": 5126,
"count": 4,
"type": "VEC2",
"min": [
0.00009999999747378752,
0.00009999999747378752
],
"max": [
0.9998999834060669,
0.9998999834060669
]
},
"accessor_normal_generated": {
"bufferView": "bufferView_0",
"byteOffset": 80,
"byteStride": 0,
"componentType": 5126,
"count": 4,
"type": "VEC3",
"min": [
0,
1,
0
],
"max": [
0,
1,
0
]
}
},
"asset": {
"generator": "OBJ2GLTF",
"premultipliedAlpha": true,
"profile": {
"api": "WebGL",
"version": "1.0"
},
"version": "1.0"
},
"buffers": {
"buffer_0": {
"type": "arraybuffer",
"byteLength": 140,
"uri": "data:application/octet-stream;base64,AACAvwAAAAAAAIA/AACAPwAAAAAAAIA/AACAPwAAAAAAAIC/AACAvwAAAAAAAIC/F7fROHL5fz9y+X8/cvl/P3L5fz8Xt9E4F7fROBe30TgAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAEAAgAAAAIAAwA="
}
},
"bufferViews": {
"bufferView_0": {
"buffer": "buffer_0",
"byteLength": 128,
"byteOffset": 0,
"target": 34962
},
"bufferView_1": {
"buffer": "buffer_0",
"byteLength": 12,
"byteOffset": 128,
"target": 34963
}
},
"images": {
"Checkered": {
"uri": ""
}
},
"materials": {
"material_Material.001": {
"name": "Material.001",
"extensions": {},
"values": {
"ambient": [
0,
0,
0,
1
],
"diffuse": "texture_Checkered",
"emission": [
0,
0,
0,
1
],
"specular": [
0.5,
0.5,
0.5,
1
],
"shininess": 96.078431,
"transparency": 1
},
"technique": "technique0"
}
},
"meshes": {
"mesh_checkerboard": {
"name": "checkerboard",
"primitives": [
{
"attributes": {
"POSITION": "accessor_position",
"TEXCOORD_0": "accessor_uv",
"NORMAL": "accessor_normal_generated"
},
"indices": "accessor_index_0",
"material": "material_Material.001",
"mode": 4
}
]
}
},
"nodes": {
"rootNode": {
"children": [],
"meshes": [
"mesh_checkerboard"
],
"matrix": [
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1
]
}
},
"samplers": {
"sampler_0": {
"magFilter": 9729,
"minFilter": 9986,
"wrapS": 10497,
"wrapT": 10497
}
},
"scene": "scene_checkerboard",
"scenes": {
"scene_checkerboard": {
"nodes": [
"rootNode"
]
}
},
"textures": {
"texture_Checkered": {
"format": 6407,
"internalFormat": 6407,
"sampler": "sampler_0",
"source": "Checkered",
"target": 3553,
"type": 5121
}
},
"extensionsUsed": [],
"animations": {},
"cameras": {},
"techniques": {
"technique0": {
"attributes": {
"a_position": "position",
"a_normal": "normal",
"a_texcoord_0": "texcoord_0"
},
"parameters": {
"modelViewMatrix": {
"semantic": "MODELVIEW",
"type": 35676
},
"projectionMatrix": {
"semantic": "PROJECTION",
"type": 35676
},
"normalMatrix": {
"semantic": "MODELVIEWINVERSETRANSPOSE",
"type": 35675
},
"ambient": {
"type": 35666
},
"diffuse": {
"type": 35678
},
"emission": {
"type": 35666
},
"specular": {
"type": 35666
},
"shininess": {
"type": 5126
},
"transparency": {
"type": 5126
},
"position": {
"semantic": "POSITION",
"type": 35665
},
"normal": {
"semantic": "NORMAL",
"type": 35665
},
"texcoord_0": {
"semantic": "TEXCOORD_0",
"type": 35664
}
},
"program": "program0",
"states": {
"enable": [
2884,
2929
]
},
"uniforms": {
"u_modelViewMatrix": "modelViewMatrix",
"u_projectionMatrix": "projectionMatrix",
"u_normalMatrix": "normalMatrix",
"u_ambient": "ambient",
"u_diffuse": "diffuse",
"u_emission": "emission",
"u_specular": "specular",
"u_shininess": "shininess",
"u_transparency": "transparency"
}
}
},
"programs": {
"program0": {
"attributes": [
"a_position",
"a_normal",
"a_texcoord_0"
],
"fragmentShader": "fragmentShader0",
"vertexShader": "vertexShader0"
}
},
"shaders": {
"vertexShader0": {
"type": 35633,
"uri": "data:text/plain;base64,cHJlY2lzaW9uIGhpZ2hwIGZsb2F0Owp1bmlmb3JtIG1hdDQgdV9tb2RlbFZpZXdNYXRyaXg7CnVuaWZvcm0gbWF0NCB1X3Byb2plY3Rpb25NYXRyaXg7CnVuaWZvcm0gbWF0MyB1X25vcm1hbE1hdHJpeDsKYXR0cmlidXRlIHZlYzMgYV9wb3NpdGlvbjsKdmFyeWluZyB2ZWMzIHZfcG9zaXRpb25FQzsKYXR0cmlidXRlIHZlYzMgYV9ub3JtYWw7CnZhcnlpbmcgdmVjMyB2X25vcm1hbDsKYXR0cmlidXRlIHZlYzIgYV90ZXhjb29yZF8wOwp2YXJ5aW5nIHZlYzIgdl90ZXhjb29yZF8wOwp2b2lkIG1haW4odm9pZCkgewogIHZlYzQgcG9zID0gdV9tb2RlbFZpZXdNYXRyaXggKiB2ZWM0KGFfcG9zaXRpb24sMS4wKTsKICB2X3Bvc2l0aW9uRUMgPSBwb3MueHl6OwogIGdsX1Bvc2l0aW9uID0gdV9wcm9qZWN0aW9uTWF0cml4ICogcG9zOwogIHZfbm9ybWFsID0gdV9ub3JtYWxNYXRyaXggKiBhX25vcm1hbDsKICB2X3RleGNvb3JkXzAgPSBhX3RleGNvb3JkXzA7Cn0K"
},
"fragmentShader0": {
"type": 35632,
"uri": "data:text/plain;base64,cHJlY2lzaW9uIGhpZ2hwIGZsb2F0Owp1bmlmb3JtIHZlYzQgdV9hbWJpZW50Owp1bmlmb3JtIHNhbXBsZXIyRCB1X2RpZmZ1c2U7CnVuaWZvcm0gdmVjNCB1X2VtaXNzaW9uOwp1bmlmb3JtIHZlYzQgdV9zcGVjdWxhcjsKdW5pZm9ybSBmbG9hdCB1X3NoaW5pbmVzczsKdW5pZm9ybSBmbG9hdCB1X3RyYW5zcGFyZW5jeTsKdmFyeWluZyB2ZWMzIHZfcG9zaXRpb25FQzsKdmFyeWluZyB2ZWMzIHZfbm9ybWFsOwp2YXJ5aW5nIHZlYzIgdl90ZXhjb29yZF8wOwp2b2lkIG1haW4odm9pZCkgewogIHZlYzMgbm9ybWFsID0gbm9ybWFsaXplKHZfbm9ybWFsKTsKICB2ZWM0IGRpZmZ1c2UgPSB0ZXh0dXJlMkQodV9kaWZmdXNlLCB2X3RleGNvb3JkXzApOwogIHZlYzMgZGlmZnVzZUxpZ2h0ID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICB2ZWMzIHNwZWN1bGFyID0gdV9zcGVjdWxhci5yZ2I7CiAgdmVjMyBzcGVjdWxhckxpZ2h0ID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICB2ZWMzIGVtaXNzaW9uID0gdV9lbWlzc2lvbi5yZ2I7CiAgdmVjMyBhbWJpZW50ID0gdV9hbWJpZW50LnJnYjsKICB2ZWMzIHZpZXdEaXIgPSAtbm9ybWFsaXplKHZfcG9zaXRpb25FQyk7CiAgdmVjMyBhbWJpZW50TGlnaHQgPSB2ZWMzKDAuMCwgMC4wLCAwLjApOwogIGFtYmllbnRMaWdodCArPSB2ZWMzKDAuMiwgMC4yLCAwLjIpOwogIHZlYzMgbCA9IHZlYzMoMC4wLCAwLjAsIDEuMCk7CiAgZGlmZnVzZUxpZ2h0ICs9IHZlYzMoMS4wLCAxLjAsIDEuMCkgKiBtYXgoZG90KG5vcm1hbCxsKSwgMC4pOwogIHZlYzMgaCA9IG5vcm1hbGl6ZShsICsgdmlld0Rpcik7CiAgZmxvYXQgc3BlY3VsYXJJbnRlbnNpdHkgPSBtYXgoMC4sIHBvdyhtYXgoZG90KG5vcm1hbCwgaCksIDAuKSwgdV9zaGluaW5lc3MpKTsKICBzcGVjdWxhckxpZ2h0ICs9IHZlYzMoMS4wLCAxLjAsIDEuMCkgKiBzcGVjdWxhckludGVuc2l0eTsKICB2ZWMzIGNvbG9yID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICBjb2xvciArPSBkaWZmdXNlLnJnYiAqIGRpZmZ1c2VMaWdodDsKICBjb2xvciArPSBzcGVjdWxhciAqIHNwZWN1bGFyTGlnaHQ7CiAgY29sb3IgKz0gZW1pc3Npb247CiAgY29sb3IgKz0gYW1iaWVudCAqIGFtYmllbnRMaWdodDsKICBnbF9GcmFnQ29sb3IgPSB2ZWM0KGNvbG9yICogZGlmZnVzZS5hLCBkaWZmdXNlLmEgKiB1X3RyYW5zcGFyZW5jeSk7Cn0K"
}
},
"skins": {},
"extensions": {}
}
Binary file modified gltfs/flower/flower.gltf
Binary file not shown.
Binary file added renders/bilerp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/demo_checker.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/demo_duck.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/demo_engine.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/demo_truck.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/interp_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/nn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/occupancy_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/perspective_correct.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/perspective_incorrect.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/proj4_perf_data.xlsx
Binary file not shown.
Binary file added renders/ras_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_demo1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_demo2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_di.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_engine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ssao_perf_data.xlsx
Binary file not shown.
Binary file added renders/ts16_occupancy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ts32_occupancy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ts4_occupancy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ts8_occupancy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ts_perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_61
)
Loading