by Trung Le (website) and David Grosman.
===============
Video demo (click on image) and Presentation
Deferred rendering has gained major popularity in real-time rendering. Some of its advantages are the fact that it reduces the rendering algorithm complexity from O(numLights*numObjects)
to O(numLights + numObjects)
by rendering a scene in two passes: It first renders the scene geometry into a G-Buffer and then, uses that G-Buffer to calculate the scene lighting in a second pass. It is also easier to maintain since the Lighting stage is entirely disconnected from the Geometry stage. Unfortunately, deferred rendering is not the best solution for all cases:
- The G-Buffer can require a lot of memory especially at higher screen resolutions. Furthermore, material characteristics are stored in diverse render targets. The more diverse the materials, the more target buffers are needed which might hit a hardware limit (Thus, the introduction of light pre-pass renderer).
- There is no direct support for refractive and reflective materials because the G-Buffer can only retain information for one 'layer'.
- Shadows are typicaly computed using shadow mapping. This can introduce inaccurate shadows due to aliasing from low resolution shadow maps. Cascaded shadow mapping can mitigate these issues, but requires more memory for each cascaded level.
To combat with the above issues, we implemented a hybrid raytracer-rasterizer program using the explicit graphics API [Vulkan][Vulkan] to accommodate rendering transparent objects for games. This techique is currently being used by the PowerVR Raytracing GPU family for real time application. Our version is more light-weight and handles lower geometry details.
The basic concept is to first use rasterization through a deferred-renderer to capture all objects in our scene and then apply a full-screen ray tracing pass by tracing rays initialized from the G-buffer information.
Image taken from Practical techniques for ray-tracing in games
There are only 3 layers needed in the G-buffer: position, normal, material ID to detect the first bounce. Differently from a traditional raytracing, in our version, the first ray-triangle intersection has been precomputed by the G-buffer pass. For comparison, with a 800x800 resolution image, with a 1000 triangles scene, assuming we cast one ray per pixel, then we'll have for first bounce:
800 x 800 x 1000 ray-triangle intersections = 640,000,000 ray-triangle intersections
With deferred shading aid, we basically eliminate entirely this first bounce cost and transfer that onto the rasterization cost done inside the fragment shader in hardware.
With raytracing, we're now able to implement shadows and and refractive or reflective material. For this project, we are not implementing PBR materials.
Our rendering pipeline is broken down into three stages: deferred, raytracing, and on screen output.
As you can see in the image above, the CPU first computed the BVH structure of our scene. The deferred pass is renderred off-screen, then passes the G-buffer to the compute shader to perform raytracing. The result is then taken from the image texture from compute shader and rendered on-screen.
Vulkan is a great graphics API. It is quite verbose, required careful planning and code structuring since everything is recorded before it runs. This project allows us some working experience with building a Vulkan rendering framework.
Our application's overview from the top-down:
We use axis-aligned boxes as bounding volumes over our scene geometry since they ar easy to compute, needs only few bytes of storage, and there are already many Efficient and Robust Ray–Box Intersection Algorithms We build our BVH Trees using a Top-down approach where we partition the input set into two subsets along the biggest-extent axis. Our BVH Tree construction takes in two input configurations: 1) Max. number of triangles per leaf node and 2) Max Depth of the Tree.
Note that we have a BVH tree per scene mesh (allowing for better/faster tree construction/traversal) and each tree is composed of a set of BVH Nodes which are encoded as two vec4s in our Vulkan’s raytracing compute shader (.w components are used as indices to right/left child node). Note that Shadows are drawn up to 6 times faster when using our BVH optimization.
We used Muller's fast triangle intersection test. This skips computing the plane's equation.
We only computed shadows for ground-level surfaces to avoid unnecessarily checking surfaces that are high up and unlikely to be in shadows for our test scene. We think this is a reasonable approach, since it's a common practice in game design.
- Material ID is passed into the shader first as the fourth element of indices
- Instead of having an additional G-buffer for material ID, using the w element of position inside the position layer. This value was normalized by the material count of the entire scene.
- Pass in the triangles into the raytracing compute shaders as triangle soup. Since the scene can be quite big, the triangle soup helps reduce the amount of vertices having to pass to our shaders.
- Use uint16_t indices for binding index buffer to the pipeline.
- If geometry's normal == vec3(0), don't raytrace
1. Deferred shading layers
This debugging tool can be enabled by hitting 'F' key
From top-left, to bottom-right: Position, World Normal, Albedo G-Buffers and Final Image.
2. BVH visualization
This debugging tool can be enabled by hitting 'G' key
3. Color by ray bounces
This debugging tool can be enabled by hitting 'C' key
4. Toggleable effects
All effects in our renderer can be toggled on and off using the following keys:
- 'F': toggle G-buffer viewing
- 'G': toggle BVH visualization
- 'B': toggle BVH
- 'Y': toggle shadows
- 'T': toggle refraction
- 'R': toggle reflection
- 'L': add more lights
- 'C': toggle coloring by number of ray bounces
The bottleneck of the pipeline is in the ray tracing pass. This has been traditionally quite slow.
In order to test our performance we a) varying the number of moving lights, 2) zoomed in from the camera to cover more pixels and 3) toggling on and off shadows, refraction, and BVH optimization. Our scene configuration is:
- Image size: 800x800
- Compute shader work groups: 16x16
- 5086 triangles and 15258 vertices
- 7 materials: 3 refractive surfaces and 4 diffuse surfaces
- 3 refractive spheres and 7 diffuse objects
- Framerate is capped at 60FPS for all tests
- Tested on Microsoft Windows 10 Home, Microsoft Visual Studio 2015, target x64, i7-4790 CPU @ 3.60GHz 12GB, GTX 980 Ti
1. Far scene. Camera is at -30.0f Z unit away
Scene | Analysis |
---|---|
- Refraction: Interestingly, refraction doesn't seem to be affected. We were able to maintain the same frame rate through out.
- Shadow: this effect does get hit big. We tanked right away as soon as shadow is turned on, dropped below ~15 FPS. With the aid of BVH, we were able to gain back ~10 FPS. Acceleration data structure choice and configuration plays an important role here. This showcase that our renderer isn't quite ready for real-time application yet.
2. Close scene. Camera is at -15.0f unit away
Scene | Analysis |
---|---|
- Refraction: Up close, more weaknesses can be scene with hybrid rendering. Refraction now has a significant drop to 20FPS. However, it stays the same across varying number of light sources. This is because refraction is affected by material types, not the number of lights in the scene.
- Shadow: Again, shadow isn't in good shape. We need more improvement to increase our framerate.
A lot of the improvement lies in an efficient acceleration data structure and effecient memory access pattern.
Similarly, we compared the same scene with our raytracing only renderer, but the framerate was consistently at 1FPS, so we decided that a hybrid renderer is in fact faster that traditional raytracing.
We varied the dimension of the compute shader invocations, but that didn't affect performance.
The project had a great deal of software engineering in term of developing a Vulkan graphics engine and team collaboration. It was also a great opportunity for us to explore the possibility of using raytracing in real-time application. Even though we were not able to achieve a similar frame-rate as PowerVR raytracing demo, we gained a great deal of experience.
Refraction gone wrong.
This is just bad attribute stride.
This is when a vertex attribute weren't intialized correctly,
- Our project uses CMake to build. Requires a Vulkan-capable graphics card, Visual Studio 2013, target platform x64.
- Tested on:
- Microsoft Windows 10 Home, i7-4790 CPU @ 3.60GHz 12GB, GTX 980 Ti (Desktop).
- Microsoft Windows 7 Professional, i7-5600U @ 2.6GHz, 256GB, GeForce 840M (Laptop).
We would like to thank:
- Sascha Willems and Alexander Overvoorde which greatly inspired our initial code-base; Big thanks to their incredible effort to bring Vulkan to the community!
- Morgan McGuire and Gareth Morgan, Jesper Mortensen for giving us the idea of implement an Hybrid Renderer.
- GDCVault14: Practical techniques for ray-tracing in games
- [Vulkan, Industry Forged] (https://www.khronos.org/vulkan/)
- Practical techniques for ray-tracing in games
- Imagination PowerVR 6XT GR6500 mobile GPU - Ray Tracing demos vs Nvidia Geforce GTX 980 Ti
- Asynchronous Compute in DX12 & Vulkan: Dispelling Myths & Misconceptions Concurrently
- Doom benchmarks return: Vulkan vs. OpenGL
- Rise of the Tomb Raider async compute update boosts performance on AMD hardware