diff --git a/README.md b/README.md index cad1abd..9bd5e65 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,157 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) - **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) - -### (TODO: Your README) - -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +* Rony Edde (redde) +* Tested on: Windows 10, i7-6700k @ 4.00GHz 64GB, GTX 980M 8GB (Personal Laptop) +This is a rasterizer running on the GPU using CUDA. +* ![rasterizer](./renders/rasterizer.gif) ### Credits - * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo) * [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md) + +* Features + * Support for polygonal geometry. + * Texture mapping with perspective correction. + * Shaded polygon, line and point rendering. + * Correct attribute interpolation. + * Gouraud and specular shading. + * MSAA anti aliasing. + * SSAA super sampling anti aliasing. + * Backface culling with removal optimization. + * Depth sorting primitive ordering by depth. + * Axis aligned bounding box optimization. + +* Controls + * Esc to exit + * S to enable / disable specular. + * W to change display mode to polygons/lines/points. + * P to enable / disable perspective correction for textures. + * A to enable / disable antialiasing (MSAA or SSAA depending on mode). + * C to enable / disable backface culling. + * T to enable / disable performance testing. + * B to enable / disable axis aligned bounding box optimization. + * X to enable / disable cheap culling. This is faster but not advised as it can lead to artifacts in renders. + * IMPORTANT NOTE: To enable supersampling, you must change line 22 in main.cpp to this and recompile: + * static bool supersample = true; + +* Polygonal geometry. + * Drawing our first triangle to check if all projection, camera, and geometry + are all correct. + * ![firsttriangle](./renders/triangle_test.png) + extending this to a more complex model: + * ![duck](./renders/image_0001.png) + * Now we can start looking up texture data and apply the texture with the model's uvs. + Reading uv information and using simple barycentric coordinates give us textures. + However, these textures are not perspective correct due to the projection. + Initial result: + * ![texture_wrong](./renders/image_0006.png) + By dividing with z, we can recompute the correct uv values after projection. We now + have perspective correct textures: + * ![texture_wrong](./renders/image_0007.png) + + * Shaded polygon, line and point rendering. + Now that we have textures, we can apply shading models. We start with a simple gouraud + shader which shades the surface based on the light angle. In this rendering stage, if we apply + the shading without textures we get the following result: + * ![gouraud](./renders/image_0002.png) + Multiplying by the texture color, we get this: + * ![gouraud_tex](./renders/image_0005.png) + The second stage is to get some specular highlights. We compute the specular by calculating the + incident vector from the camera and the light vector. Here's the final result with gouraud + specular and texture combined. + * ![gouraud_spec_tex](./renders/duck_spec.png) + * correct attribute interpolation is essential in preserving surface normals continuity and texture + mapping. Every attribute in interpolated in order to preserve the smooth transitions. + + * Render types. + We can also chose how to render our primitives. Instead of rendering the full polygon, we can choose + to render the edges or the points. We can also specify a width. The following renders use a width of + 0.04: + * ![rendermode](./renders/rendermode.gif) + + * MSAA anti aliasing. + In order to achieve MSAA, we must check for each pixel multiple times. Each time we compute the color at + a pixel counts as 1 sample. Multi sampling means that we must take multiple samples at the same pixels. + Each sample is shiften within the pixel area. We use 4 samples to achieve MSAA. Here are the results: + * ![msaa](./renders/msaa.gif) + * ![aa](./renders/aa.gif) ![msaa_tri](./renders/msaa_tri.gif) + + * SSAA anti aliasing. + An alternative to MSAA is to simply render the image with 4 times the resolution and downsample the final image. + A simple downsampling would consist of taking the closest pixel and assigning it to the lower resolution image. + This, however would not take advantage of the additional rendered pixels, to in order to take advantage of this, + we enable antialiasing by taking the average of the 4 pixels in the high resolution image and assigning that value + to the target low resolution pixel. This is a costly approach but well worth it. Here's the result: + * ![supersample](./renders/supersample.gif) + The result is a much cleaner image than MSAA at the cost of performance. + + * Backface culling with removal optimization. + At this stage, rendering an image is starting to take a considerable hit in performance. Backface culling can + help reduce this impact by removing faces with normals opposed to the camera look at vector. By removing these + faces, we can benefit by rendering less polygons and improve performance. Here's the result: + * ![backfaceculling](./renders/backfaceculling.gif) + + * Depth sorting. + Further optimization can be done by sorting the polygons by the depth just before we compute the color. + This reduces the overhead of overlapping polygons and ensures that we walk the scene front to back, never hitting + a polygon that will have to be eliminated in the loop. We can be as bold as to stop checking for polygons once we hit + the first. This can improve performance but could lead to clipping when polygons intersect. Alternatively we can limit + the intersecting polygons to a certain number and take advantage of that but it's not always safe to do so it's ignored. + + * Axis aligned bounding box optimization. + Performance is still an issue with the additional shading and post processing. Complex scenes are a bottleneck. + In order to avoid this, we check each primitive's bounding box before we compute barycentric coordinates. This is a very + cheap computation since everything is in screen space and we only need to compare with the pixel coordinates. If the polygon's + bounding box encompasses the pixel we're rendering, then we can compute the barycentric coordinate and check intersections. + This eliminates most geomtry in the scene right away and give a considerable boost in performance. + We now can load complex models that were previously problematic. + * ![ninja](./renders/image_0015.png) + * ![cow](./renders/image_0016.png) + * ![engine](./renders/engine.png) + + +### Analysis + * The analysis shows that the most expensive computaion is with the scanline function. It is to be expected since + this is where most geometric computations happen. + * The benchmarks were conducted across an average of about 30 frames per test and the functions analysed were the following: + * vertexTransformAndAssembly, referred to as vertexTandA. This is where the geometric data is assembled for pushing to the scanliner. + * scanline, referred to as scanline. This is where the triangle data is fed through every fragment and where the barycentric computations occur. This is also where the normals and other attributes are interpolated and textures assigned. + * anti-aliasing, referred to as aa. This is where MSAA 4x is computed, it's not enabled when supersampling is turned on so it will take no resources when it's either off or supersampling is on. + * render, referred to as render is where shading occurs. gouraud and specular shading are computed and multiplied with the texture data and the framebuffer is then assigned the final color. + * downsample, referred to as downsample, is where the image is downsampled. This is also where anti-aliasing is computed when supersampling is turned on. Not to be confused with the MSAA 4x computation and timing. + + * Results: + * The pie charts give a better overall estimation to the most costly functions while the final bar chart shows a more elaborate comparison between the different optimizations. + * This is the optimized run without antialiasing. We can see that most functions take a very low percentage since the scanline function is not hindered by anti-aliasing or supersampling. This is using backface culling with zdepth ordering and axis aligned bounding box optimization. Note that aa and downsample take up 0% since they are not being called: + * ![pie0](./renders/piechart_0.png) + * Now with backface culling and zdepth sorting disabled and only AABB enabled, there is a small dip in performance that will be visible in the final graph. But looking at the chart, we see that more time is spent in the scanline function doing more work. This is due to the freeing of resources taken by sorting and the culling. Note that these resources had a minimal impact on performance. + * ![pie1](./renders/piechart_1.png) + * Now with backface culling enabled and AABB disabled, there is a bigger dip in performance. Again, most resources are spent in the scanline function doing more work. + * ![pie2](./renders/piechart_2.png) + * Disabling all optimizations shows that most of the time spent by the GPU is in the scanline function. This chart will not show the dip in performance but it clearly shows that the most expensive function is the scanline. + * ![pie3](./renders/piechart_3.png) + * Enabling MSAA 4x with all optimizations shows something interesting, albeit expected. Most of the resources are still in the scanline function but 24% of the resources used by the scanline function are now dedicated to the MSAA function. Sampling sub pixels 4 times is expected to dip the performance. + * ![pie4](./renders/piechart_4.png) + * Enabling MSAA 4x without any optimization was obviously going to slow things down a lot more. We do see a dip in performance as the render function is using slightly less resources now. Going from 0.4% to 0.1% might not look like much but this is a percentage which means that the loss is considerable. Effectively we see a 4x decrease in render performance. + * ![pie5](./renders/piechart_5.png) + * SSAA is rendering a 4x supersample of the image and downsampling it by averaging the 4 nearest pixels. Here we see again something expected. Similar to the first 2 charts, we can see that almost all resources are spent in scanline due to the 4x resolution increase. + * ![pie6](./renders/piechart_6.png) + * SSAA without any optimization was not going to change much of the resource distribution but we can clearly see that almost all resources are dedicated to the scanline function at 99.5%. + * ![pie7](./renders/piechart_7.png) + + * Finally for a better visualization of performance with function distribution, we lay the data is box chart format. + * ![boxchart](./renders/boxchart.png) + * Now we can see the time taken with optimizations enabled. Less than 50 milliseconds per frame as opposed to over 900 milliseconds with supersampling and no optimizations. It's also interesting to note that MSAA uses lightly less resources than SSAA. + * Another point is that the axis aligned bounding box offers a considerable benefit for performance. Disabling it cuts the render time by half. + * Backface culling had an almost identical improvement as AABB albeit slightly less noticeable but that would vary from scene to scene. These benchmarks were used on the same scene view and geometry. + * The combination of backface culling and AABB reduced render times from 220ms to less than 50ms. An almost 5x improvement. + * Another thing that catches the eye is the downsampling and antialiasing hit on performance. We can see the resources being used a lot more when MSAA is enabled. It's less obvious with SSAA, considering that the scanline function is rendering more than it would if SSAA is disabled, the only hit in performance would be visible in the total render time. + + + + + diff --git a/gltfs/checkerboard/checkerboard.gltf b/gltfs/checkerboard/checkerboard.gltf new file mode 100644 index 0000000..a333738 --- /dev/null +++ b/gltfs/checkerboard/checkerboard.gltf @@ -0,0 +1,301 @@ +{ + "accessors": { + "accessor_index_0": { + "bufferView": "bufferView_1", + "byteOffset": 0, + "byteStride": 0, + "componentType": 5123, + "count": 6, + "type": "SCALAR", + "min": [ + 0 + ], + "max": [ + 3 + ] + }, + "accessor_position": { + "bufferView": "bufferView_0", + "byteOffset": 0, + "byteStride": 0, + "componentType": 5126, + "count": 4, + "min": [ + -1, + 0, + -1 + ], + "max": [ + 1, + 0, + 1 + ], + "type": "VEC3" + }, + "accessor_uv": { + "bufferView": "bufferView_0", + "byteOffset": 48, + "byteStride": 0, + "componentType": 5126, + "count": 4, + "type": "VEC2", + "min": [ + 0.00009999999747378752, + 0.00009999999747378752 + ], + "max": [ + 0.9998999834060669, + 0.9998999834060669 + ] + }, + "accessor_normal_generated": { + "bufferView": "bufferView_0", + "byteOffset": 80, + "byteStride": 0, + "componentType": 5126, + "count": 4, + "type": "VEC3", + "min": [ + 0, + 1, + 0 + ], + "max": [ + 0, + 1, + 0 + ] + } + }, + "asset": { + "generator": "OBJ2GLTF", + "premultipliedAlpha": true, + "profile": { + "api": "WebGL", + "version": "1.0" + }, + "version": "1.0" + }, + "buffers": { + "buffer_0": { + "type": "arraybuffer", + "byteLength": 140, + "uri": "data:application/octet-stream;base64,AACAvwAAAAAAAIA/AACAPwAAAAAAAIA/AACAPwAAAAAAAIC/AACAvwAAAAAAAIC/F7fROHL5fz9y+X8/cvl/P3L5fz8Xt9E4F7fROBe30TgAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAEAAgAAAAIAAwA=" + } + }, + "bufferViews": { + "bufferView_0": { + "buffer": "buffer_0", + "byteLength": 128, + "byteOffset": 0, + "target": 34962 + }, + "bufferView_1": { + "buffer": "buffer_0", + "byteLength": 12, + "byteOffset": 128, + "target": 34963 + } + }, + "images": { + "Checkered": { + "uri": "" + } + }, + "materials": { + "material_Material.001": { + "name": "Material.001", + "extensions": {}, + "values": { + "ambient": [ + 0, + 0, + 0, + 1 + ], + "diffuse": "texture_Checkered", + "emission": [ + 0, + 0, + 0, + 1 + ], + "specular": [ + 0.5, + 0.5, + 0.5, + 1 + ], + "shininess": 96.078431, + "transparency": 1 + }, + "technique": "technique0" + } + }, + "meshes": { + "mesh_checkerboard": { + "name": "checkerboard", + "primitives": [ + { + "attributes": { + "POSITION": "accessor_position", + "TEXCOORD_0": "accessor_uv", + "NORMAL": "accessor_normal_generated" + }, + "indices": "accessor_index_0", + "material": "material_Material.001", + "mode": 4 + } + ] + } + }, + "nodes": { + "rootNode": { + "children": [], + "meshes": [ + "mesh_checkerboard" + ], + "matrix": [ + 1, + 0, + 0, + 0, + 0, + 1, + 0, + 0, + 0, + 0, + 1, + 0, + 0, + 0, + 0, + 1 + ] + } + }, + "samplers": { + "sampler_0": { + "magFilter": 9729, + "minFilter": 9986, + "wrapS": 10497, + "wrapT": 10497 + } + }, + "scene": "scene_checkerboard", + "scenes": { + "scene_checkerboard": { + "nodes": [ + "rootNode" + ] + } + }, + "textures": { + "texture_Checkered": { + "format": 6407, + "internalFormat": 6407, + "sampler": "sampler_0", + "source": "Checkered", + "target": 3553, + "type": 5121 + } + }, + "extensionsUsed": [], + "animations": {}, + "cameras": {}, + "techniques": { + "technique0": { + "attributes": { + "a_position": "position", + "a_normal": "normal", + "a_texcoord_0": "texcoord_0" + }, + "parameters": { + "modelViewMatrix": { + "semantic": "MODELVIEW", + "type": 35676 + }, + "projectionMatrix": { + "semantic": "PROJECTION", + "type": 35676 + }, + "normalMatrix": { + "semantic": "MODELVIEWINVERSETRANSPOSE", + "type": 35675 + }, + "ambient": { + "type": 35666 + }, + "diffuse": { + "type": 35678 + }, + "emission": { + "type": 35666 + }, + "specular": { + "type": 35666 + }, + "shininess": { + "type": 5126 + }, + "transparency": { + "type": 5126 + }, + "position": { + "semantic": "POSITION", + "type": 35665 + }, + "normal": { + "semantic": "NORMAL", + "type": 35665 + }, + "texcoord_0": { + "semantic": "TEXCOORD_0", + "type": 35664 + } + }, + "program": "program0", + "states": { + "enable": [ + 2884, + 2929 + ] + }, + "uniforms": { + "u_modelViewMatrix": "modelViewMatrix", + "u_projectionMatrix": "projectionMatrix", + "u_normalMatrix": "normalMatrix", + "u_ambient": "ambient", + "u_diffuse": "diffuse", + "u_emission": "emission", + "u_specular": "specular", + "u_shininess": "shininess", + "u_transparency": "transparency" + } + } + }, + "programs": { + "program0": { + "attributes": [ + "a_position", + "a_normal", + "a_texcoord_0" + ], + "fragmentShader": "fragmentShader0", + "vertexShader": "vertexShader0" + } + }, + "shaders": { + "vertexShader0": { + "type": 35633, + "uri": "data:text/plain;base64,cHJlY2lzaW9uIGhpZ2hwIGZsb2F0Owp1bmlmb3JtIG1hdDQgdV9tb2RlbFZpZXdNYXRyaXg7CnVuaWZvcm0gbWF0NCB1X3Byb2plY3Rpb25NYXRyaXg7CnVuaWZvcm0gbWF0MyB1X25vcm1hbE1hdHJpeDsKYXR0cmlidXRlIHZlYzMgYV9wb3NpdGlvbjsKdmFyeWluZyB2ZWMzIHZfcG9zaXRpb25FQzsKYXR0cmlidXRlIHZlYzMgYV9ub3JtYWw7CnZhcnlpbmcgdmVjMyB2X25vcm1hbDsKYXR0cmlidXRlIHZlYzIgYV90ZXhjb29yZF8wOwp2YXJ5aW5nIHZlYzIgdl90ZXhjb29yZF8wOwp2b2lkIG1haW4odm9pZCkgewogIHZlYzQgcG9zID0gdV9tb2RlbFZpZXdNYXRyaXggKiB2ZWM0KGFfcG9zaXRpb24sMS4wKTsKICB2X3Bvc2l0aW9uRUMgPSBwb3MueHl6OwogIGdsX1Bvc2l0aW9uID0gdV9wcm9qZWN0aW9uTWF0cml4ICogcG9zOwogIHZfbm9ybWFsID0gdV9ub3JtYWxNYXRyaXggKiBhX25vcm1hbDsKICB2X3RleGNvb3JkXzAgPSBhX3RleGNvb3JkXzA7Cn0K" + }, + "fragmentShader0": { + "type": 35632, + "uri": "data:text/plain;base64,cHJlY2lzaW9uIGhpZ2hwIGZsb2F0Owp1bmlmb3JtIHZlYzQgdV9hbWJpZW50Owp1bmlmb3JtIHNhbXBsZXIyRCB1X2RpZmZ1c2U7CnVuaWZvcm0gdmVjNCB1X2VtaXNzaW9uOwp1bmlmb3JtIHZlYzQgdV9zcGVjdWxhcjsKdW5pZm9ybSBmbG9hdCB1X3NoaW5pbmVzczsKdW5pZm9ybSBmbG9hdCB1X3RyYW5zcGFyZW5jeTsKdmFyeWluZyB2ZWMzIHZfcG9zaXRpb25FQzsKdmFyeWluZyB2ZWMzIHZfbm9ybWFsOwp2YXJ5aW5nIHZlYzIgdl90ZXhjb29yZF8wOwp2b2lkIG1haW4odm9pZCkgewogIHZlYzMgbm9ybWFsID0gbm9ybWFsaXplKHZfbm9ybWFsKTsKICB2ZWM0IGRpZmZ1c2UgPSB0ZXh0dXJlMkQodV9kaWZmdXNlLCB2X3RleGNvb3JkXzApOwogIHZlYzMgZGlmZnVzZUxpZ2h0ID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICB2ZWMzIHNwZWN1bGFyID0gdV9zcGVjdWxhci5yZ2I7CiAgdmVjMyBzcGVjdWxhckxpZ2h0ID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICB2ZWMzIGVtaXNzaW9uID0gdV9lbWlzc2lvbi5yZ2I7CiAgdmVjMyBhbWJpZW50ID0gdV9hbWJpZW50LnJnYjsKICB2ZWMzIHZpZXdEaXIgPSAtbm9ybWFsaXplKHZfcG9zaXRpb25FQyk7CiAgdmVjMyBhbWJpZW50TGlnaHQgPSB2ZWMzKDAuMCwgMC4wLCAwLjApOwogIGFtYmllbnRMaWdodCArPSB2ZWMzKDAuMiwgMC4yLCAwLjIpOwogIHZlYzMgbCA9IHZlYzMoMC4wLCAwLjAsIDEuMCk7CiAgZGlmZnVzZUxpZ2h0ICs9IHZlYzMoMS4wLCAxLjAsIDEuMCkgKiBtYXgoZG90KG5vcm1hbCxsKSwgMC4pOwogIHZlYzMgaCA9IG5vcm1hbGl6ZShsICsgdmlld0Rpcik7CiAgZmxvYXQgc3BlY3VsYXJJbnRlbnNpdHkgPSBtYXgoMC4sIHBvdyhtYXgoZG90KG5vcm1hbCwgaCksIDAuKSwgdV9zaGluaW5lc3MpKTsKICBzcGVjdWxhckxpZ2h0ICs9IHZlYzMoMS4wLCAxLjAsIDEuMCkgKiBzcGVjdWxhckludGVuc2l0eTsKICB2ZWMzIGNvbG9yID0gdmVjMygwLjAsIDAuMCwgMC4wKTsKICBjb2xvciArPSBkaWZmdXNlLnJnYiAqIGRpZmZ1c2VMaWdodDsKICBjb2xvciArPSBzcGVjdWxhciAqIHNwZWN1bGFyTGlnaHQ7CiAgY29sb3IgKz0gZW1pc3Npb247CiAgY29sb3IgKz0gYW1iaWVudCAqIGFtYmllbnRMaWdodDsKICBnbF9GcmFnQ29sb3IgPSB2ZWM0KGNvbG9yICogZGlmZnVzZS5hLCBkaWZmdXNlLmEgKiB1X3RyYW5zcGFyZW5jeSk7Cn0K" + } + }, + "skins": {}, + "extensions": {} +} diff --git a/gltfs/flower/flower.gltf b/gltfs/flower/flower.gltf index a59e3d2..9023b9d 100644 Binary files a/gltfs/flower/flower.gltf and b/gltfs/flower/flower.gltf differ diff --git a/renders/aa.gif b/renders/aa.gif new file mode 100644 index 0000000..23a938b Binary files /dev/null and b/renders/aa.gif differ diff --git a/renders/backfaceculling.gif b/renders/backfaceculling.gif new file mode 100644 index 0000000..33c6bd6 Binary files /dev/null and b/renders/backfaceculling.gif differ diff --git a/renders/benchmark.py b/renders/benchmark.py new file mode 100644 index 0000000..3bc9d4b --- /dev/null +++ b/renders/benchmark.py @@ -0,0 +1,672 @@ +import matplotlib.pyplot as plt +import numpy as np + + +# bbox backface culling and sorting +p1 = [[0.365696, 49.720222, 0.000000, 0.886528, 0.000000], +[0.589344, 49.664959, 0.000000, 0.887264, 0.000000], +[0.334112, 49.678818, 0.000000, 0.885408, 0.000000], +[0.322304, 49.711166, 0.000000, 0.884512, 0.000000], +[0.326880, 49.688255, 0.000000, 0.887008, 0.000000], +[0.333536, 49.709217, 0.000000, 0.890496, 0.000000], +[0.424448, 49.668385, 0.000000, 0.887168, 0.000000], +[0.287072, 49.726593, 0.000000, 0.883040, 0.000000], +[0.464000, 49.656384, 0.000000, 0.888448, 0.000000], +[0.508320, 49.690239, 0.000000, 0.885952, 0.000000], +[0.626976, 49.719646, 0.000000, 0.884128, 0.000000], +[0.483872, 49.734207, 0.000000, 0.886592, 0.000000], +[0.409696, 49.695553, 0.000000, 0.883360, 0.000000], +[0.230752, 49.710850, 0.000000, 0.887904, 0.000000], +[0.502496, 49.713951, 0.000000, 0.895808, 0.000000], +[0.274336, 49.797825, 0.000000, 0.888384, 0.000000], +[0.248384, 49.725567, 0.000000, 0.888352, 0.000000], +[0.381920, 49.762402, 0.000000, 0.884320, 0.000000], +[0.264640, 49.661377, 0.000000, 0.877184, 0.000000], +[0.231360, 49.693439, 0.000000, 0.888096, 0.000000], +[0.302176, 49.682495, 0.000000, 0.896992, 0.000000], +[0.468256, 49.683968, 0.000000, 0.885152, 0.000000], +[0.363904, 49.646400, 0.000000, 0.885888, 0.000000], +[0.378720, 49.726177, 0.000000, 0.886752, 0.000000], +[0.417888, 49.671391, 0.000000, 0.886240, 0.000000], +[0.302048, 49.849792, 0.000000, 0.885440, 0.000000], +[0.413664, 49.720512, 0.000000, 0.884448, 0.000000], +[0.414592, 49.801407, 0.000000, 0.884352, 0.000000], +[0.298496, 49.673409, 0.000000, 0.889600, 0.000000], +[0.459648, 49.783455, 0.000000, 0.881376, 0.000000], +[0.403200, 49.673695, 0.000000, 0.882592, 0.000000], +[0.227232, 49.711266, 0.000000, 0.884512, 0.000000], +[0.381920, 49.702560, 0.000000, 0.891200, 0.000000], +[0.399616, 49.675423, 0.000000, 0.885024, 0.000000], +[0.617440, 49.689056, 0.000000, 0.886720, 0.000000], +[0.257216, 49.655041, 0.000000, 0.886080, 0.000000], +[0.480320, 49.671906, 0.000000, 0.886784, 0.000000], +[0.232352, 49.676289, 0.000000, 0.886176, 0.000000], +[0.523808, 49.764545, 0.000000, 0.891488, 0.000000], +[0.429792, 49.874561, 0.000000, 0.891072, 0.000000], +[0.250752, 49.698368, 0.000000, 0.881344, 0.000000], +[0.428224, 49.724609, 0.000000, 0.889760, 0.000000], +[0.322016, 49.710785, 0.000000, 0.900992, 0.000000], +[0.216928, 49.826401, 0.000000, 0.885120, 0.000000], +[0.385696, 49.794655, 0.000000, 0.890016, 0.000000], +[0.464192, 49.701088, 0.000000, 0.884480, 0.000000], +[0.290976, 49.764160, 0.000000, 0.884576, 0.000000], +[0.421632, 49.665855, 0.000000, 0.885088, 0.000000], +[0.273824, 49.778336, 0.000000, 0.891232, 0.000000], +[0.225184, 49.841633, 0.000000, 0.887296, 0.000000], +[0.407616, 49.648224, 0.000000, 0.887808, 0.000000], +[0.381312, 49.740032, 0.000000, 0.885728, 0.000000], +[0.682080, 49.680031, 0.000000, 0.888224, 0.000000], +[0.432288, 49.779808, 0.000000, 0.884992, 0.000000], +[0.278080, 49.816479, 0.000000, 0.886976, 0.000000], +[0.278720, 49.684769, 0.000000, 0.886720, 0.000000], +[0.491936, 49.702591, 0.000000, 0.886016, 0.000000], +[0.441632, 49.701694, 0.000000, 0.886080, 0.000000], +[0.576384, 49.671585, 0.000000, 0.884672, 0.000000], +[0.457408, 49.734367, 0.000000, 0.883104, 0.000000], +[0.466176, 49.655777, 0.000000, 0.885536, 0.000000], +[0.224064, 49.824993, 0.000000, 0.886016, 0.000000], +[0.355840, 49.663265, 0.000000, 0.889504, 0.000000], +[0.534720, 49.680191, 0.000000, 0.885152, 0.000000], +[0.323168, 49.661953, 0.000000, 0.889760, 0.000000], +[0.454144, 49.708321, 0.000000, 0.883360, 0.000000]] + +# bbox no backface culling and sorting +p2 = [[0.329984, 98.175964, 0.000000, 0.885632, 0.000000], +[0.270944, 98.230339, 0.000000, 0.885344, 0.000000], +[0.404352, 98.400162, 0.000000, 0.884800, 0.000000], +[0.733536, 98.013794, 0.000000, 0.885728, 0.000000], +[0.491488, 98.053436, 0.000000, 0.886048, 0.000000], +[0.450656, 98.009186, 0.000000, 0.886528, 0.000000], +[0.413280, 98.290337, 0.000000, 0.886656, 0.000000], +[0.223744, 98.431549, 0.000000, 0.886784, 0.000000], +[0.266880, 98.278275, 0.000000, 0.882912, 0.000000], +[0.266784, 98.012924, 0.000000, 0.890144, 0.000000], +[0.224512, 98.200256, 0.000000, 0.885344, 0.000000], +[0.350080, 98.009438, 0.000000, 0.886496, 0.000000], +[0.450624, 98.369247, 0.000000, 0.891008, 0.000000], +[0.348128, 98.340157, 0.000000, 0.886016, 0.000000], +[0.752832, 98.060959, 0.000000, 0.888960, 0.000000], +[0.250400, 98.046402, 0.000000, 0.884352, 0.000000], +[0.223840, 98.175964, 0.000000, 0.887776, 0.000000], +[0.270976, 98.342178, 0.000000, 0.882688, 0.000000], +[0.252160, 98.353821, 0.000000, 0.884000, 0.000000], +[0.640000, 98.022400, 0.000000, 0.885824, 0.000000], +[0.432160, 98.012161, 0.000000, 0.885280, 0.000000], +[0.367872, 98.089981, 0.000000, 0.886240, 0.000000], +[0.328736, 98.169373, 0.000000, 0.886880, 0.000000], +[0.270848, 98.307549, 0.000000, 0.887552, 0.000000], +[0.445984, 98.247742, 0.000000, 0.885728, 0.000000], +[0.361440, 98.069885, 0.000000, 0.887456, 0.000000], +[0.256288, 98.182655, 0.000000, 0.887104, 0.000000], +[0.429408, 98.071106, 0.000000, 0.893472, 0.000000], +[0.226624, 98.313919, 0.000000, 0.886656, 0.000000], +[0.408864, 98.068062, 0.000000, 0.886624, 0.000000], +[0.228192, 98.278114, 0.000000, 0.885568, 0.000000], +[0.405216, 98.055328, 0.000000, 0.886080, 0.000000], +[0.292448, 98.250435, 0.000000, 0.884832, 0.000000], +[0.441792, 98.007904, 0.000000, 0.876832, 0.000000], +[0.390944, 98.112450, 0.000000, 0.882720, 0.000000], +[0.353152, 98.051201, 0.000000, 0.888992, 0.000000], +[0.282848, 98.306717, 0.000000, 0.891360, 0.000000], +[0.661440, 98.017822, 0.000000, 0.883104, 0.000000], +[0.227712, 98.205437, 0.000000, 0.884384, 0.000000], +[0.277600, 98.191231, 0.000000, 0.884320, 0.000000], +[0.219776, 98.228798, 0.000000, 0.883264, 0.000000], +[0.229408, 98.393570, 0.000000, 0.893888, 0.000000], +[0.243680, 98.366692, 0.000000, 0.886816, 0.000000], +[0.417536, 98.045692, 0.000000, 0.887776, 0.000000], +[0.359296, 98.151489, 0.000000, 0.885504, 0.000000], +[0.400224, 98.040092, 0.000000, 0.888224, 0.000000], +[0.287456, 98.199615, 0.000000, 0.884512, 0.000000], +[0.567104, 98.043037, 0.000000, 0.885952, 0.000000], +[0.758208, 98.047325, 0.000000, 0.887936, 0.000000], +[0.302784, 98.051872, 0.000000, 0.887264, 0.000000], +[0.251456, 98.086205, 0.000000, 0.894080, 0.000000], +[0.447872, 98.057854, 0.000000, 0.886688, 0.000000], +[0.442432, 98.223907, 0.000000, 0.886688, 0.000000], +[0.683968, 98.024864, 0.000000, 0.886048, 0.000000], +[0.424928, 98.148605, 0.000000, 0.882208, 0.000000], +[0.689920, 98.088608, 0.000000, 0.891200, 0.000000], +[0.257600, 98.188354, 0.000000, 0.886784, 0.000000], +[0.386112, 98.157982, 0.000000, 0.888000, 0.000000], +[0.520256, 98.264320, 0.000000, 0.884128, 0.000000], +[0.621472, 98.054466, 0.000000, 0.886208, 0.000000], +[0.401856, 98.155357, 0.000000, 0.886272, 0.000000], +[0.406560, 98.068672, 0.000000, 0.887104, 0.000000], +[0.232032, 98.282532, 0.000000, 0.886944, 0.000000], +[0.464864, 98.030403, 0.000000, 0.886400, 0.000000], +[0.431744, 98.085312, 0.000000, 0.889056, 0.000000], +[0.580480, 98.059074, 0.000000, 0.885920, 0.000000], +[0.517152, 98.270721, 0.000000, 0.883840, 0.000000], +[0.418688, 98.052162, 0.000000, 0.888480, 0.000000], +[0.294304, 98.279427, 0.000000, 0.881344, 0.000000], +[0.760256, 98.040611, 0.000000, 0.885120, 0.000000], +[0.319200, 98.050179, 0.000000, 0.883328, 0.000000], +[0.657376, 98.027740, 0.000000, 0.886144, 0.000000], +[0.458912, 98.265823, 0.000000, 0.883584, 0.000000], +[0.791648, 98.080544, 0.000000, 0.888800, 0.000000], +[0.250016, 98.368156, 0.000000, 0.885696, 0.000000]] + +# backface culling and sorting but no bbox +p3 = [[0.589248, 110.232414, 0.000000, 0.890784, 0.000000], +[0.820128, 110.080704, 0.000000, 0.886080, 0.000000], +[0.224224, 110.167107, 0.000000, 0.886144, 0.000000], +[0.389888, 110.178268, 0.000000, 0.884608, 0.000000], +[0.432544, 110.082207, 0.000000, 0.888160, 0.000000], +[0.624512, 110.146652, 0.000000, 0.889664, 0.000000], +[0.382784, 110.191841, 0.000000, 0.878848, 0.000000], +[0.378432, 110.137955, 0.000000, 0.887328, 0.000000], +[0.218880, 110.165085, 0.000000, 0.884704, 0.000000], +[0.420736, 110.231583, 0.000000, 0.887264, 0.000000], +[0.322016, 110.105537, 0.000000, 0.887392, 0.000000], +[0.259072, 110.186722, 0.000000, 0.886016, 0.000000], +[0.393024, 110.111328, 0.000000, 0.887136, 0.000000], +[0.477696, 110.208961, 0.000000, 0.880704, 0.000000], +[0.599424, 110.096733, 0.000000, 0.888800, 0.000000], +[0.362240, 110.284988, 0.000000, 0.884864, 0.000000], +[0.791872, 110.098274, 0.000000, 0.892608, 0.000000], +[0.267136, 110.176414, 0.000000, 0.887680, 0.000000], +[0.442496, 110.108383, 0.000000, 0.889632, 0.000000], +[0.479104, 110.171745, 0.000000, 0.887136, 0.000000], +[0.574304, 110.152702, 0.000000, 0.883872, 0.000000], +[0.691936, 110.120605, 0.000000, 0.887328, 0.000000], +[0.493600, 110.149345, 0.000000, 0.889600, 0.000000], +[0.266528, 110.123299, 0.000000, 0.887104, 0.000000], +[0.368992, 110.101982, 0.000000, 0.884384, 0.000000], +[0.440416, 110.067390, 0.000000, 0.886944, 0.000000], +[0.271264, 110.185951, 0.000000, 0.883456, 0.000000], +[0.436800, 110.276413, 0.000000, 0.886272, 0.000000], +[0.302496, 110.155106, 0.000000, 0.886368, 0.000000], +[0.257024, 110.206146, 0.000000, 0.882048, 0.000000], +[0.400128, 110.205406, 0.000000, 0.885888, 0.000000], +[0.405888, 110.163391, 0.000000, 0.881472, 0.000000], +[0.638688, 110.089699, 0.000000, 0.885504, 0.000000], +[0.470112, 110.206688, 0.000000, 0.885696, 0.000000], +[0.364512, 110.144386, 0.000000, 0.887168, 0.000000], +[0.227488, 110.144318, 0.000000, 0.885824, 0.000000], +[0.434336, 110.192352, 0.000000, 0.879872, 0.000000], +[0.394528, 110.098717, 0.000000, 0.890816, 0.000000], +[0.251232, 110.081123, 0.000000, 0.887040, 0.000000], +[0.488832, 110.224220, 0.000000, 0.882912, 0.000000], +[0.324448, 110.215614, 0.000000, 0.890880, 0.000000], +[0.227808, 110.177475, 0.000000, 0.885984, 0.000000], +[0.423936, 110.180511, 0.000000, 0.896864, 0.000000], +[0.418624, 110.190399, 0.000000, 0.887616, 0.000000], +[0.690976, 110.148483, 0.000000, 0.888160, 0.000000], +[0.466336, 110.208099, 0.000000, 0.885728, 0.000000], +[0.439936, 110.162941, 0.000000, 0.888768, 0.000000], +[0.349408, 110.181725, 0.000000, 0.883040, 0.000000], +[0.380320, 110.213631, 0.000000, 0.885888, 0.000000], +[0.287456, 110.208572, 0.000000, 0.887072, 0.000000], +[0.452224, 110.148735, 0.000000, 0.888864, 0.000000], +[0.249024, 110.271683, 0.000000, 0.885184, 0.000000], +[0.390432, 110.211906, 0.000000, 0.886304, 0.000000], +[0.486176, 110.181313, 0.000000, 0.888256, 0.000000], +[0.663424, 110.230049, 0.000000, 0.889312, 0.000000], +[0.893152, 110.279137, 0.000000, 0.897856, 0.000000], +[0.246016, 110.163040, 0.000000, 0.886784, 0.000000], +[0.285120, 110.189697, 0.000000, 0.883328, 0.000000], +[0.411296, 110.238403, 0.000000, 0.890656, 0.000000], +[0.392416, 110.261086, 0.000000, 0.886688, 0.000000], +[0.601216, 110.185822, 0.000000, 0.886560, 0.000000], +[0.472800, 110.222466, 0.000000, 0.884864, 0.000000], +[0.428064, 110.150497, 0.000000, 0.894624, 0.000000], +[0.228096, 110.241440, 0.000000, 0.884992, 0.000000], +[0.421280, 110.091682, 0.000000, 0.887264, 0.000000], +[0.455104, 110.194496, 0.000000, 0.881984, 0.000000], +[0.662464, 110.132545, 0.000000, 0.888544, 0.000000], +[0.464416, 110.161118, 0.000000, 0.873792, 0.000000], +[0.423840, 110.133026, 0.000000, 0.897376, 0.000000], +[0.440672, 110.202019, 0.000000, 0.883840, 0.000000]] + +# no backface culling and sorting and no bbox +p4 = [[0.284640, 217.891769, 0.000000, 0.883968, 0.000000], +[0.388032, 217.745468, 0.000000, 0.887232, 0.000000], +[0.294816, 217.824615, 0.000000, 0.885632, 0.000000], +[0.422592, 217.674911, 0.000000, 0.887328, 0.000000], +[0.422464, 217.887451, 0.000000, 0.885664, 0.000000], +[0.601760, 217.805435, 0.000000, 0.885888, 0.000000], +[0.603520, 217.779297, 0.000000, 0.888704, 0.000000], +[0.511616, 217.748322, 0.000000, 0.887008, 0.000000], +[0.285344, 217.957336, 0.000000, 0.881824, 0.000000], +[0.727904, 217.711365, 0.000000, 0.888128, 0.000000], +[0.295712, 217.995422, 0.000000, 0.881024, 0.000000], +[0.225504, 218.031235, 0.000000, 0.887488, 0.000000], +[0.264864, 217.880508, 0.000000, 0.885792, 0.000000], +[0.530752, 217.729187, 0.000000, 0.886336, 0.000000], +[0.227968, 217.940231, 0.000000, 0.885600, 0.000000], +[0.326528, 217.728775, 0.000000, 0.886368, 0.000000], +[0.218624, 217.972412, 0.000000, 0.887040, 0.000000], +[0.313536, 217.947067, 0.000000, 0.886336, 0.000000], +[0.421824, 218.040283, 0.000000, 0.884032, 0.000000], +[0.432672, 217.812546, 0.000000, 0.888352, 0.000000], +[0.225344, 217.911392, 0.000000, 0.887648, 0.000000], +[0.554240, 217.756668, 0.000000, 0.899616, 0.000000], +[0.355904, 218.013824, 0.000000, 0.884480, 0.000000], +[0.625856, 217.788284, 0.000000, 0.886080, 0.000000], +[0.418304, 217.869507, 0.000000, 0.886080, 0.000000], +[0.570592, 217.763168, 0.000000, 0.887296, 0.000000], +[0.342720, 217.886108, 0.000000, 0.886016, 0.000000], +[0.372960, 217.760986, 0.000000, 0.885888, 0.000000], +[0.419712, 217.815720, 0.000000, 0.888000, 0.000000], +[0.252640, 217.765472, 0.000000, 0.886432, 0.000000], +[0.360864, 218.011780, 0.000000, 0.884832, 0.000000], +[0.342240, 217.799835, 0.000000, 0.886720, 0.000000], +[0.226656, 217.896286, 0.000000, 0.885440, 0.000000], +[0.378880, 217.672455, 0.000000, 0.888704, 0.000000], +[0.316736, 217.927002, 0.000000, 0.886080, 0.000000], +[0.612192, 217.807556, 0.000000, 0.887296, 0.000000], +[0.433536, 217.903900, 0.000000, 0.891744, 0.000000], +[0.429984, 217.820953, 0.000000, 0.885664, 0.000000], +[0.220128, 218.021469, 0.000000, 0.886240, 0.000000], +[0.438752, 217.785629, 0.000000, 0.890912, 0.000000], +[0.438528, 218.021820, 0.000000, 0.883168, 0.000000], +[0.308128, 217.730911, 0.000000, 0.885056, 0.000000], +[0.440928, 217.842331, 0.000000, 0.884992, 0.000000], +[0.389504, 217.816833, 0.000000, 0.884544, 0.000000], +[0.294144, 217.949127, 0.000000, 0.885504, 0.000000], +[0.406880, 217.827133, 0.000000, 0.889280, 0.000000], +[0.294112, 217.906876, 0.000000, 0.886944, 0.000000], +[0.300768, 217.738205, 0.000000, 0.886816, 0.000000], +[0.399008, 218.059418, 0.000000, 0.886112, 0.000000], +[0.382624, 217.750778, 0.000000, 0.887424, 0.000000], +[0.225728, 217.918564, 0.000000, 0.883232, 0.000000], +[0.643680, 217.815231, 0.000000, 0.885952, 0.000000], +[0.369632, 217.877625, 0.000000, 0.884864, 0.000000], +[0.613120, 217.810562, 0.000000, 0.887840, 0.000000], +[0.334944, 217.984222, 0.000000, 0.886240, 0.000000], +[0.378912, 217.729919, 0.000000, 0.887584, 0.000000], +[0.300256, 217.944290, 0.000000, 0.884000, 0.000000], +[0.490272, 217.909851, 0.000000, 0.883936, 0.000000], +[0.356192, 217.856613, 0.000000, 0.885312, 0.000000], +[0.706912, 217.638535, 0.000000, 0.888544, 0.000000], +[0.340640, 217.817093, 0.000000, 0.885792, 0.000000], +[0.312480, 217.785767, 0.000000, 0.886912, 0.000000], +[0.310016, 217.851898, 0.000000, 0.885728, 0.000000], +[0.367360, 217.699356, 0.000000, 0.899968, 0.000000], +[0.267520, 217.957657, 0.000000, 0.887232, 0.000000]] + +# all optimizations MSAA4x +p5 = [[0.330176, 50.216671, 154.838242, 0.888544, 0.000000], +[0.416480, 49.992256, 154.537628, 0.885536, 0.000000], +[0.260704, 49.866367, 154.463867, 0.892544, 0.000000], +[0.269792, 50.268002, 155.127777, 0.887552, 0.000000], +[0.226304, 49.853951, 154.479172, 0.886432, 0.000000], +[0.479904, 49.918400, 154.396027, 0.885696, 0.000000], +[0.727872, 50.495232, 155.497253, 0.888320, 0.000000], +[0.416192, 49.841854, 154.473511, 0.885568, 0.000000], +[0.477216, 49.798496, 154.661057, 0.887936, 0.000000], +[0.225376, 50.540607, 155.692322, 0.888224, 0.000000], +[0.602464, 49.796032, 154.395004, 0.888352, 0.000000], +[0.296992, 50.284607, 155.099106, 0.887008, 0.000000], +[0.674432, 49.810497, 154.436676, 1.140640, 0.000000], +[0.521856, 50.461887, 155.622406, 0.889888, 0.000000], +[0.679200, 50.278687, 155.100357, 0.887328, 0.000000], +[0.316640, 49.854912, 154.469116, 0.886176, 0.000000], +[0.485344, 50.415455, 155.901413, 0.889664, 0.000000], +[0.416640, 50.242783, 155.539749, 0.886688, 0.000000], +[0.721088, 49.859585, 154.460770, 0.889664, 0.000000], +[0.351008, 49.875935, 154.631943, 0.890912, 0.000000], +[0.419584, 50.248001, 155.653244, 0.886112, 0.000000], +[0.477216, 49.821342, 154.363708, 0.887168, 0.000000], +[0.403104, 50.496159, 155.051193, 0.887360, 0.000000], +[0.222048, 50.446815, 154.484039, 0.885728, 0.000000], +[0.595392, 49.787327, 154.505829, 0.886784, 0.000000], +[0.921312, 49.785183, 154.574844, 0.889696, 0.000000], +[0.559424, 50.810818, 154.590149, 0.884800, 0.000000], +[0.301312, 49.924000, 154.556168, 0.886368, 0.000000], +[0.369376, 50.280670, 155.623672, 0.886048, 0.000000], +[0.425984, 49.821503, 154.374619, 0.885440, 0.000000], +[0.229568, 50.466656, 155.996582, 0.885120, 0.000000], +[0.446048, 49.989506, 154.561722, 0.889824, 0.000000], +[0.592000, 50.494335, 155.382980, 0.884832, 0.000000], +[0.223968, 49.886047, 154.425369, 0.884576, 0.000000], +[0.250048, 50.364704, 154.580765, 0.886112, 0.000000], +[0.413952, 49.898594, 154.724411, 0.887552, 0.000000], +[0.301760, 49.827168, 154.435455, 0.892576, 0.000000], +[0.491488, 50.530239, 154.692825, 0.887904, 0.000000], +[0.641568, 49.823135, 154.480057, 0.892544, 0.000000], +[0.267296, 49.933281, 154.412262, 0.892448, 0.000000], +[0.490560, 50.250656, 155.085831, 0.889984, 0.000000], +[0.428960, 49.877792, 154.565536, 0.883424, 0.000000], +[0.690368, 49.849407, 154.455841, 0.887616, 0.000000], +[0.330976, 50.248608, 155.317978, 0.891584, 0.000000], +[0.499392, 49.775105, 154.465088, 0.887232, 0.000000], +[0.252480, 50.418976, 156.053146, 0.886496, 0.000000], +[0.730112, 50.276928, 155.667877, 0.884128, 0.000000], +[0.440832, 49.859650, 154.622971, 0.891648, 0.000000], +[0.264352, 49.931232, 155.842148, 0.889248, 0.000000], +[0.386848, 50.428577, 155.713821, 0.890528, 0.000000], +[0.219136, 49.926434, 154.511642, 0.884576, 0.000000], +[0.229056, 49.864990, 155.174561, 0.900384, 0.000000], +[0.361824, 50.244034, 156.084091, 0.886560, 0.000000], +[0.310528, 49.921726, 154.592957, 0.892288, 0.000000], +[0.439968, 50.281567, 155.178787, 0.896192, 0.000000], +[0.338432, 50.585728, 154.910400, 0.890496, 0.000000], +[0.517088, 49.868385, 154.561661, 0.884704, 0.000000], +[0.225600, 50.337921, 155.592743, 0.890496, 0.000000], +[0.408832, 50.540222, 155.313477, 0.886112, 0.000000], +[0.457056, 49.856224, 154.611099, 0.887264, 0.000000], +[0.458336, 50.294880, 155.617798, 0.888288, 0.000000], +[0.764896, 50.522945, 155.439041, 0.887200, 0.000000], +[0.261920, 49.808384, 154.604889, 0.887008, 0.000000], +[0.298208, 49.864574, 154.460709, 0.893344, 0.000000], +[0.289408, 50.474014, 155.688995, 0.886400, 0.000000], +[0.305728, 50.442242, 155.235992, 0.890208, 0.000000], +[0.584704, 50.298496, 154.812668, 0.886112, 0.000000]] + +# no optimizations MSAA4x +p6 = [[0.320320, 218.566727, 660.388489, 0.896064, 0.000000], +[0.380192, 218.213318, 660.216370, 0.894432, 0.000000], +[0.227904, 218.423264, 659.974548, 0.891296, 0.000000], +[0.419200, 218.241028, 660.166260, 0.891936, 0.000000], +[0.444928, 218.340637, 660.329590, 0.893088, 0.000000], +[0.334656, 218.404709, 660.248047, 0.895232, 0.000000], +[0.493856, 218.518845, 660.270874, 0.897984, 0.000000], +[0.397248, 218.391907, 660.536865, 0.897568, 0.000000], +[0.222144, 218.454971, 660.400391, 0.895744, 0.000000], +[0.464768, 218.434174, 660.274292, 0.891904, 0.000000], +[0.326176, 218.503296, 660.201477, 0.895136, 0.000000], +[0.594976, 218.260193, 660.441650, 0.894560, 0.000000], +[0.374240, 218.528763, 660.428589, 0.896832, 0.000000], +[0.437440, 218.253372, 660.380615, 0.893120, 0.000000], +[0.673280, 218.378265, 660.262024, 0.892672, 0.000000], +[0.570432, 218.258240, 660.261963, 0.895008, 0.000000], +[0.264000, 218.511810, 660.293091, 0.901312, 0.000000], +[0.658688, 218.385498, 660.255310, 0.891584, 0.000000], +[0.341472, 218.508606, 660.451477, 0.890528, 0.000000], +[0.337056, 218.305817, 660.285400, 0.896448, 0.000000], +[0.229312, 218.470367, 660.316589, 0.894272, 0.000000], +[0.281760, 218.409058, 660.113159, 0.891904, 0.000000], +[0.441280, 218.360092, 660.282104, 0.894752, 0.000000], +[0.226432, 218.370087, 660.116394, 0.895808, 0.000000], +[0.379040, 218.347260, 660.239014, 0.895392, 0.000000], +[0.554240, 218.270020, 660.157959, 0.895392, 0.000000], +[0.227584, 218.493958, 660.201843, 0.902976, 0.000000], +[0.492640, 218.449829, 660.314575, 0.893408, 0.000000], +[0.537376, 218.428284, 660.186340, 0.901248, 0.000000], +[0.661504, 218.306244, 660.145264, 0.892320, 0.000000], +[0.251616, 218.484451, 660.609680, 0.894560, 0.000000], +[0.318048, 218.370880, 660.378784, 0.892576, 0.000000], +[0.255168, 218.391266, 660.399231, 0.896000, 0.000000], +[0.224224, 218.348572, 660.299805, 0.896096, 0.000000], +[0.435104, 218.530045, 660.387695, 0.895392, 0.000000], +[0.647136, 218.368134, 660.564941, 0.893856, 0.000000], +[0.311424, 218.473953, 660.288818, 0.894080, 0.000000], +[0.457824, 218.324646, 660.378174, 0.892032, 0.000000], +[0.252512, 218.556610, 660.402710, 0.896736, 0.000000], +[0.388768, 218.317444, 660.241699, 0.893120, 0.000000], +[0.381216, 218.502014, 660.572571, 0.895392, 0.000000], +[0.659616, 218.308609, 660.231506, 0.897088, 0.000000], +[0.492064, 218.474777, 660.388977, 0.894016, 0.000000], +[0.269216, 218.318817, 660.390381, 0.893600, 0.000000], +[0.228320, 218.425537, 660.367493, 0.897344, 0.000000], +[0.413248, 218.266403, 660.223999, 0.893184, 0.000000]] + + +# all optimization SSAA +p7 = [[0.319936, 200.197723, 0.000000, 3.479488, 0.254752], +[0.261472, 202.024506, 0.000000, 3.476320, 0.255232], +[0.261056, 200.256409, 0.000000, 3.481248, 0.255680], +[0.529088, 200.187805, 0.000000, 3.476736, 0.254784], +[0.399872, 202.294983, 0.000000, 3.475392, 0.255744], +[0.500672, 201.595642, 0.000000, 3.477280, 0.255680], +[0.404352, 200.180130, 0.000000, 3.474528, 0.254240], +[0.294432, 200.282303, 0.000000, 3.471616, 0.254560], +[0.444192, 200.206085, 0.000000, 3.473248, 0.254720], +[0.435456, 201.302887, 0.000000, 3.478208, 0.255712], +[0.572896, 203.124832, 0.000000, 3.472992, 0.255008], +[0.411136, 200.218140, 0.000000, 3.468704, 0.255872], +[0.433536, 200.257828, 0.000000, 3.473600, 0.258688], +[0.259712, 203.637314, 0.000000, 3.480416, 0.254816], +[0.291552, 200.251526, 0.000000, 3.478400, 0.255040], +[0.360640, 200.160263, 0.000000, 3.478336, 0.254880], +[0.308384, 200.221191, 0.000000, 3.475552, 0.254688], +[0.478592, 203.352158, 0.000000, 3.470496, 0.255680], +[0.614560, 200.135559, 0.000000, 3.475520, 0.255040], +[0.296384, 200.305374, 0.000000, 3.475488, 0.253504], +[0.595872, 203.332260, 0.000000, 3.477600, 0.260640], +[0.366592, 200.307205, 0.000000, 3.475264, 0.253952], +[0.659936, 200.277222, 0.000000, 3.473408, 0.254336], +[0.257280, 201.149216, 0.000000, 3.478080, 0.255104], +[0.293536, 203.067459, 0.000000, 3.480448, 0.257440], +[0.326912, 202.290054, 0.000000, 3.477152, 0.253920], +[0.439232, 200.182144, 0.000000, 3.474624, 0.254848], +[0.371776, 203.654846, 0.000000, 3.478560, 0.254976], +[0.669664, 200.221954, 0.000000, 3.479648, 0.255872], +[0.233568, 200.348221, 0.000000, 3.478528, 0.255328], +[0.424544, 200.920319, 0.000000, 3.479264, 0.254272], +[0.224896, 203.114822, 0.000000, 3.476576, 0.255360], +[0.417536, 200.383270, 0.000000, 3.482272, 0.255648], +[0.275104, 201.377625, 0.000000, 3.475648, 0.256064], +[0.599552, 200.262589, 0.000000, 3.480160, 0.256064], +[0.401600, 202.645050, 0.000000, 3.479168, 0.254336], +[0.482400, 200.309433, 0.000000, 3.481536, 0.254752], +[0.228160, 203.721054, 0.000000, 3.474688, 0.254144], +[0.439552, 200.287903, 0.000000, 3.473952, 0.255392], +[0.461280, 202.359741, 0.000000, 3.477536, 0.256352], +[0.702144, 200.332520, 0.000000, 3.474528, 0.254720], +[0.438592, 200.464996, 0.000000, 3.479264, 0.255840], +[0.494496, 202.470169, 0.000000, 3.482112, 0.255200], +[0.420032, 200.366562, 0.000000, 3.475264, 0.254336], +[0.371360, 203.980316, 0.000000, 3.476544, 0.255808], +[0.422496, 200.433182, 0.000000, 3.478816, 0.255136], +[0.271520, 202.048294, 0.000000, 3.475072, 0.255040], +[0.546240, 200.309479, 0.000000, 3.474048, 0.254176], +[0.528032, 203.205338, 0.000000, 3.478528, 0.255648], +[0.439456, 202.599426, 0.000000, 3.473984, 0.254496], +[0.682656, 200.300674, 0.000000, 3.478400, 0.254592], +[0.279936, 200.292480, 0.000000, 3.475712, 0.253888], +[0.573216, 202.630310, 0.000000, 3.476544, 0.254432], +[0.366336, 201.789658, 0.000000, 3.474400, 0.255136], +[0.803392, 200.341858, 0.000000, 3.474400, 0.255680], +[0.227040, 203.064453, 0.000000, 3.476576, 0.255168], +[0.518976, 200.330429, 0.000000, 3.473088, 0.253408], +[0.449344, 201.538300, 0.000000, 3.475680, 0.254912], +[0.258400, 203.756058, 0.000000, 3.475680, 0.254304], +[0.490592, 200.229980, 0.000000, 3.736992, 0.258784], +[0.411328, 201.636993, 0.000000, 3.476960, 0.256608], +[0.229920, 200.371902, 0.000000, 3.474432, 0.254112], +[0.360320, 202.941757, 0.000000, 3.476128, 0.254592], +[0.432800, 202.121216, 0.000000, 3.478656, 0.254816], +[0.595168, 200.272644, 0.000000, 3.477504, 0.254464], +[0.460992, 203.372421, 0.000000, 3.479392, 0.254944], +[0.365344, 200.344955, 0.000000, 3.473120, 0.255264], +[0.678144, 201.549026, 0.000000, 3.476704, 0.255168], +[0.514208, 203.586365, 0.000000, 3.484768, 0.255392], +[0.413376, 203.008865, 0.000000, 3.473376, 0.255776], +[0.582368, 201.979202, 0.000000, 3.477888, 0.254656], +[0.367264, 200.388474, 0.000000, 3.473472, 0.254048], +[0.321664, 203.103836, 0.000000, 3.479488, 0.255296], +[0.223904, 202.339005, 0.000000, 3.477728, 0.254400], +[0.437152, 200.332794, 0.000000, 3.475296, 0.255744], +[0.444480, 203.670013, 0.000000, 3.473984, 0.255680], +[0.730912, 200.287399, 0.000000, 3.472416, 0.254816]] + + +# no optimizations SSAA +p8 = [[0.339008, 869.142578, 0.000000, 3.440640, 0.252768], +[0.260800, 869.426453, 0.000000, 3.444960, 0.252768], +[0.225216, 869.487915, 0.000000, 3.440736, 0.253024], +[0.390176, 869.082886, 0.000000, 3.443520, 0.253984], +[0.418048, 869.484680, 0.000000, 3.438656, 0.253408], +[0.635456, 869.182556, 0.000000, 3.443904, 0.252800], +[0.260800, 869.479919, 0.000000, 3.442816, 0.252992], +[0.373888, 869.219116, 0.000000, 3.444096, 0.257920], +[0.232128, 869.142883, 0.000000, 3.440992, 0.252480], +[0.231968, 869.217102, 0.000000, 3.441120, 0.253184], +[0.448384, 869.239868, 0.000000, 3.440096, 0.252160], +[0.257088, 869.053955, 0.000000, 3.446432, 0.255552], +[0.221184, 869.248901, 0.000000, 3.438816, 0.252544], +[0.296160, 869.316956, 0.000000, 3.445440, 0.251392], +[0.326528, 869.232605, 0.000000, 3.440448, 0.255168], +[0.398048, 869.016174, 0.000000, 3.444032, 0.253504], +[0.470176, 869.163818, 0.000000, 3.437984, 0.253152], +[0.244544, 869.090820, 0.000000, 3.445248, 0.251328], +[0.507328, 869.353027, 0.000000, 3.440672, 0.251872], +[0.724928, 869.088562, 0.000000, 3.444768, 0.253440], +[0.247552, 869.197815, 0.000000, 3.436768, 0.252352], +[0.389824, 869.120911, 0.000000, 3.437536, 0.250912], +[0.467136, 869.173096, 0.000000, 3.447200, 0.252768], +[0.660992, 868.958923, 0.000000, 3.448800, 0.252384], +[0.444160, 869.207092, 0.000000, 3.442368, 0.251936], +[0.367200, 869.233765, 0.000000, 3.442880, 0.253248], +[0.307744, 869.647766, 0.000000, 3.442336, 0.252448], +[0.472928, 869.365356, 0.000000, 3.443936, 0.252192], +[0.449248, 869.312134, 0.000000, 3.442144, 0.252992], +[0.568896, 869.215515, 0.000000, 3.439808, 0.254464], +[0.223360, 869.474548, 0.000000, 3.446624, 0.253824], +[0.422176, 869.053894, 0.000000, 3.446112, 0.254304], +[0.248576, 869.274719, 0.000000, 3.436352, 0.251744], +[0.500192, 869.180725, 0.000000, 3.445120, 0.252576], +[0.465664, 869.346008, 0.000000, 3.441088, 0.252032], +[0.307552, 869.191345, 0.000000, 3.440640, 0.254752], +[0.422752, 869.463501, 0.000000, 3.439872, 0.253504], +[0.472000, 869.236145, 0.000000, 3.446400, 0.252896], +[0.267840, 869.141968, 0.000000, 3.447360, 0.252608], +[0.391008, 869.072083, 0.000000, 3.446240, 0.251968], +[0.403168, 869.467285, 0.000000, 3.438528, 0.252320], +[0.661120, 869.138489, 0.000000, 3.449440, 0.253696], +[0.437984, 869.180115, 0.000000, 3.442688, 0.253184], +[0.399968, 869.233276, 0.000000, 3.447840, 0.254304], +[0.301600, 869.605347, 0.000000, 3.446144, 0.252416], +[0.681792, 869.258362, 0.000000, 3.445792, 0.254592], +[0.362496, 869.286743, 0.000000, 3.441760, 0.253088]] + + + +# averages: +pp1 = np.sum(np.array(p1), axis = 0) / len(p1) +pp2 = np.sum(np.array(p2), axis = 0) / len(p2) +pp3 = np.sum(np.array(p3), axis = 0) / len(p3) +pp4 = np.sum(np.array(p4), axis = 0) / len(p4) +pp5 = np.sum(np.array(p5), axis = 0) / len(p5) +pp6 = np.sum(np.array(p6), axis = 0) / len(p6) +pp7 = np.sum(np.array(p7), axis = 0) / len(p7) +pp8 = np.sum(np.array(p8), axis = 0) / len(p8) + +print pp1 +print pp2 +print pp3 +print pp4 +print pp5 +print pp6 +print pp7 +print pp8 + + + + + +# Data to plot +labels = ['vertexTandA', 'scanline', 'aa', 'render', 'downsample'] +colors = ['#005511', '#002244', '#004477', '#118899', '#990000'] +explode = (0.1, 0.5, 0.2, 0.1, 0.3) + + +P = [pp1, pp2, pp3, pp4, pp5, pp6, pp7, pp8] +titles = ['optimized\nfunctions time percentage', + 'aabb only\nfunctions time percentage', + 'backface cull\nfunctions time percentage', + 'no opt\nfunctions time percentage', + 'MSAA 4x opt\nfunctions time percentage', + 'MSAA4x no opt\nfunctions time percentage', + 'SSAA opt\nfunctions time percentage', + 'SSAA no opt\nfunctions time percentage'] + + +for i in range(8): + sizes = P[i] + + percent = 100.*P[i]/P[i].sum() + + print percent + + # Plot + fig = plt.figure(facecolor='black') + #fig.patch.set_alpha(1.0) + fig.patch.set_facecolor('black') + plt.rcParams['text.color'] = 'gray' + plt.rcParams['axes.facecolor'] = 'black' + plt.rcParams['lines.linewidth'] = 4 + + ax = plt.subplot(111, axisbg='black') + + pie = ax.pie(sizes, explode=explode, colors=colors, textprops = {'color':'#aaaaaa', 'fontweight':'bold'}, + autopct='%1.1f%%', shadow=True, startangle=140) + + + + ax.set_ylabel('Test Benchmark Averages', color = '#338888', fontweight='bold') + ax.set_title(titles[i]) + ax.patch.set_facecolor('black') + ax.axis('equal') + + percentlabels = list(labels) + for j in range(len(percentlabels)): + percentlabels[j] += (' %.1f' % percent[j])+'''%''' + + plt.legend(pie[0], percentlabels, loc='upper corner') + + plt.savefig('piechart_%d.png' % i, bbox_inches='tight', facecolor='black') + #plt.show() + + + + + +fig = plt.figure(facecolor='black') +#fig.patch.set_alpha(1.0) +fig.patch.set_facecolor('black') +plt.rcParams['text.color'] = 'gray' +plt.rcParams['axes.facecolor'] = 'black' +plt.rcParams['lines.linewidth'] = 4 + +N = 8 +t1 = [pp1[0], pp2[0], pp3[0], pp4[0], pp5[0], pp6[0], pp7[0], pp8[0]] +t2 = [pp1[1], pp2[1], pp3[1], pp4[1], pp5[1], pp6[1], pp7[1], pp8[1]] +t3 = [pp1[2], pp2[2], pp3[2], pp4[2], pp5[2], pp6[2], pp7[2], pp8[2]] +t4 = [pp1[3], pp2[3], pp3[3], pp4[3], pp5[3], pp6[3], pp7[3], pp8[3]] +t5 = [pp1[4], pp2[4], pp3[4], pp4[4], pp5[4], pp6[4], pp7[4], pp8[4]] + +ind = np.arange(N) # the x locations for the groups +width = 0.8 # the width of the bars: can also be len(x) sequence + +pl1 = plt.bar(ind, t1, width, color='#004466') +pl2 = plt.bar(ind, t2, width, color='#006699', bottom=t1 ) +pl3 = plt.bar(ind, t3, width, color='#11aa88', bottom=t2 ) +pl4 = plt.bar(ind, t4, width, color='#ff5555', bottom=t3 ) +pl5 = plt.bar(ind, t5, width, color='#00ffaa', bottom=t4 ) +pl6 = plt.bar(ind, t5, width, color='#11ff55', bottom=t5 ) +pl7 = plt.bar(ind, t5, width, color='#3322ff', bottom=t5 ) +pl8 = plt.bar(ind, t5, width, color='#338822', bottom=t5 ) + +plt.ylabel('Time in ms', color = '#338888', fontweight='bold') +plt.title('Optimization and post processing benchmark') +plt.xticks(ind + width/2., ('optimized', + 'aabb\nonly', + 'backface\ncull', + 'no opt', + 'MSAA 4x\nopt', + 'MSAA4x\nno opt', + 'SSAA\nopt', + 'SSAA\nno opt'), + color = '#333333', + fontweight='bold') + +plt.yticks(np.arange(0, 1000, 100), color = '#333333', fontweight='bold') +plt.legend([pl1[0], pl2[0], pl3[0], pl4[0], pl5[0]], labels, loc=0) + +plt.savefig('boxchart.png', bbox_inches='tight', facecolor='black') +#plt.show() + + + + + +# no scanline + + + + + diff --git a/renders/boxchart.png b/renders/boxchart.png new file mode 100644 index 0000000..29baf31 Binary files /dev/null and b/renders/boxchart.png differ diff --git a/renders/duck_spec.png b/renders/duck_spec.png new file mode 100644 index 0000000..ee416a4 Binary files /dev/null and b/renders/duck_spec.png differ diff --git a/renders/engine.png b/renders/engine.png new file mode 100644 index 0000000..78eef82 Binary files /dev/null and b/renders/engine.png differ diff --git a/renders/image.xcf b/renders/image.xcf new file mode 100644 index 0000000..021657f Binary files /dev/null and b/renders/image.xcf differ diff --git a/renders/image_0001.png b/renders/image_0001.png new file mode 100644 index 0000000..51f3034 Binary files /dev/null and b/renders/image_0001.png differ diff --git a/renders/image_0002.png b/renders/image_0002.png new file mode 100644 index 0000000..c675c4a Binary files /dev/null and b/renders/image_0002.png differ diff --git a/renders/image_0003.png b/renders/image_0003.png new file mode 100644 index 0000000..4986a60 Binary files /dev/null and b/renders/image_0003.png differ diff --git a/renders/image_0004.png b/renders/image_0004.png new file mode 100644 index 0000000..fbd4f1c Binary files /dev/null and b/renders/image_0004.png differ diff --git a/renders/image_0005.png b/renders/image_0005.png new file mode 100644 index 0000000..c271d18 Binary files /dev/null and b/renders/image_0005.png differ diff --git a/renders/image_0006.png b/renders/image_0006.png new file mode 100644 index 0000000..b5fa497 Binary files /dev/null and b/renders/image_0006.png differ diff --git a/renders/image_0007.png b/renders/image_0007.png new file mode 100644 index 0000000..9f7dcd5 Binary files /dev/null and b/renders/image_0007.png differ diff --git a/renders/image_0008.png b/renders/image_0008.png new file mode 100644 index 0000000..33452fb Binary files /dev/null and b/renders/image_0008.png differ diff --git a/renders/image_0009.png b/renders/image_0009.png new file mode 100644 index 0000000..c6fb46f Binary files /dev/null and b/renders/image_0009.png differ diff --git a/renders/image_0010.png b/renders/image_0010.png new file mode 100644 index 0000000..25cd1d4 Binary files /dev/null and b/renders/image_0010.png differ diff --git a/renders/image_0011.png b/renders/image_0011.png new file mode 100644 index 0000000..eee5878 Binary files /dev/null and b/renders/image_0011.png differ diff --git a/renders/image_0012.png b/renders/image_0012.png new file mode 100644 index 0000000..f227163 Binary files /dev/null and b/renders/image_0012.png differ diff --git a/renders/image_0013.png b/renders/image_0013.png new file mode 100644 index 0000000..ef6c2b9 Binary files /dev/null and b/renders/image_0013.png differ diff --git a/renders/image_0014.png b/renders/image_0014.png new file mode 100644 index 0000000..9984874 Binary files /dev/null and b/renders/image_0014.png differ diff --git a/renders/image_0015.png b/renders/image_0015.png new file mode 100644 index 0000000..f27b888 Binary files /dev/null and b/renders/image_0015.png differ diff --git a/renders/image_0016.png b/renders/image_0016.png new file mode 100644 index 0000000..deb63fa Binary files /dev/null and b/renders/image_0016.png differ diff --git a/renders/msaa.gif b/renders/msaa.gif new file mode 100644 index 0000000..fd3f05c Binary files /dev/null and b/renders/msaa.gif differ diff --git a/renders/msaa_tri.gif b/renders/msaa_tri.gif new file mode 100644 index 0000000..d35adcd Binary files /dev/null and b/renders/msaa_tri.gif differ diff --git a/renders/piechart_0.png b/renders/piechart_0.png new file mode 100644 index 0000000..848e3bb Binary files /dev/null and b/renders/piechart_0.png differ diff --git a/renders/piechart_1.png b/renders/piechart_1.png new file mode 100644 index 0000000..cc70e70 Binary files /dev/null and b/renders/piechart_1.png differ diff --git a/renders/piechart_2.png b/renders/piechart_2.png new file mode 100644 index 0000000..7f8f996 Binary files /dev/null and b/renders/piechart_2.png differ diff --git a/renders/piechart_3.png b/renders/piechart_3.png new file mode 100644 index 0000000..3d22d73 Binary files /dev/null and b/renders/piechart_3.png differ diff --git a/renders/piechart_4.png b/renders/piechart_4.png new file mode 100644 index 0000000..30ea772 Binary files /dev/null and b/renders/piechart_4.png differ diff --git a/renders/piechart_5.png b/renders/piechart_5.png new file mode 100644 index 0000000..1aae780 Binary files /dev/null and b/renders/piechart_5.png differ diff --git a/renders/piechart_6.png b/renders/piechart_6.png new file mode 100644 index 0000000..63e9fa2 Binary files /dev/null and b/renders/piechart_6.png differ diff --git a/renders/piechart_7.png b/renders/piechart_7.png new file mode 100644 index 0000000..d82b50d Binary files /dev/null and b/renders/piechart_7.png differ diff --git a/renders/points.png b/renders/points.png new file mode 100644 index 0000000..74fc185 Binary files /dev/null and b/renders/points.png differ diff --git a/renders/rasterizer.gif b/renders/rasterizer.gif new file mode 100644 index 0000000..a126de9 Binary files /dev/null and b/renders/rasterizer.gif differ diff --git a/renders/rendermode.gif b/renders/rendermode.gif new file mode 100644 index 0000000..dd89776 Binary files /dev/null and b/renders/rendermode.gif differ diff --git a/renders/supersample.gif b/renders/supersample.gif new file mode 100644 index 0000000..99d1438 Binary files /dev/null and b/renders/supersample.gif differ diff --git a/renders/triangle_test.png b/renders/triangle_test.png new file mode 100644 index 0000000..bef90ef Binary files /dev/null and b/renders/triangle_test.png differ diff --git a/renders/wireframe.png b/renders/wireframe.png new file mode 100644 index 0000000..53b1e5b Binary files /dev/null and b/renders/wireframe.png differ diff --git a/src/main.cpp b/src/main.cpp index a36b955..269fe3b 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -14,6 +14,16 @@ #define TINYGLTF_LOADER_IMPLEMENTATION #include + +static int displaymode = 0; +static bool perspectivecorrect = true; +static bool spec = true; +static bool antialias = false; +static bool supersample = false; +static bool culling = true; +static bool testingmode = false; +static bool aabbcheck = true; +static bool cheapculling = false; //------------------------------- //-------------MAIN-------------- //------------------------------- @@ -103,8 +113,14 @@ void runCuda() { // No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer dptr = NULL; - glm::mat4 P = glm::frustum(-scale * ((float)width) / ((float)height), - scale * ((float)width / (float)height), + if (!supersample) + { + cwidth = width; + cheight = height; + } + + glm::mat4 P = glm::frustum(-scale * ((float)cwidth) / ((float)cheight), + scale * ((float)cwidth / (float)cheight), -scale, scale, 1.0, 1000.0); glm::mat4 V = glm::mat4(1.0f); @@ -119,7 +135,10 @@ void runCuda() { glm::mat4 MVP = P * MV; cudaGLMapBufferObject((void **)&dptr, pbo); - rasterize(dptr, MVP, MV, MV_normal); + rasterize(dptr, MVP, MV, MV_normal, + displaymode, perspectivecorrect, + spec, antialias, supersample, + culling, testingmode, aabbcheck, cheapculling); cudaGLUnmapBufferObject(pbo); frame++; @@ -137,8 +156,6 @@ bool init(const tinygltf::Scene & scene) { return false; } - width = 800; - height = 800; window = glfwCreateWindow(width, height, "CIS 565 Pathtracer", NULL, NULL); if (!window) { glfwTerminate(); @@ -213,7 +230,13 @@ void initCuda() { // Use device with highest Gflops/s cudaGLSetGLDevice(0); - rasterizeInit(width, height); + if (!supersample) + { + cwidth = width; + cheight = height; + } + + rasterizeInit(cwidth, cheight, width, height); // Clean up on program exit atexit(cleanupCuda); @@ -323,12 +346,46 @@ void errorCallback(int error, const char *description) { fputs(description, stderr); } -void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods) { - if (key == GLFW_KEY_ESCAPE && action == GLFW_PRESS) { - glfwSetWindowShouldClose(window, GL_TRUE); +void keyCallback(GLFWwindow* window, int key, int scancode, int action, int mods) { + if (action == GLFW_REPEAT || action == GLFW_PRESS) { + if (key == GLFW_KEY_ESCAPE){ + glfwSetWindowShouldClose(window, GL_TRUE); + } + else if (key == GLFW_KEY_S){ + spec = !spec; + printf("\nspec = %d", spec); + } + else if (key == GLFW_KEY_W){ + displaymode = ++displaymode > 2 ? 0 : displaymode; + } + else if (key == GLFW_KEY_P){ + perspectivecorrect = !perspectivecorrect; + printf("\nperspective correct = %d", perspectivecorrect); + } + else if (key == GLFW_KEY_A){ + antialias = !antialias; + printf("\nanti aliasing = %d", antialias); + } + else if (key == GLFW_KEY_C){ + culling = !culling; + printf("\nculling = %d", culling); + } + else if (key == GLFW_KEY_T){ + testingmode = !testingmode; + printf("\ntesting mode = %d", testingmode); + } + else if (key == GLFW_KEY_B){ + aabbcheck = !aabbcheck; + printf("\nbounding box check = %d", aabbcheck); + } + else if (key == GLFW_KEY_X){ + cheapculling = !cheapculling; + printf("\nbounding box check = %d", cheapculling); + } } } + //---------------------------- //----- util ----------------- //---------------------------- @@ -397,3 +454,4 @@ void mouseWheelCallback(GLFWwindow* window, double xoffset, double yoffset) const double s = 1.0; // sensitivity z_trans += (float)(s * yoffset); } + diff --git a/src/main.hpp b/src/main.hpp index 4816fa1..0e9cdc3 100644 --- a/src/main.hpp +++ b/src/main.hpp @@ -49,6 +49,9 @@ GLFWwindow *window; int width = 800; int height = 800; +int cwidth = 1600; +int cheight = 1600; + //------------------------------- //-------------MAIN-------------- diff --git a/src/rasterize.cu b/src/rasterize.cu index 4e3504b..f8f0040 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -1,10 +1,19 @@ -/** - * @file rasterize.cu - * @brief CUDA-accelerated rasterization pipeline. - * @authors Skeleton code: Yining Karl Li, Kai Ninomiya, Shuai Shao (Shrek) - * @date 2012-2016 - * @copyright University of Pennsylvania & STUDENT - */ +/**__global__ void clearVertexBuffer(int n, Fragment* dev_fragmentBuffer, glm::vec3 color) +{ +int index = (blockIdx.x * blockDim.x) + threadIdx.x; + +if (index < n) +{ +dev_fragmentBuffer[index].color = color; +} +} +* @file rasterize.cu +* @brief CUDA-accelerated rasterization pipeline. +* @authors Skeleton code: Yining Karl Li, Kai Ninomiya, Shuai Shao (Shrek) +* @date 2012-2016is +* @copyright University of Pennsylvania & STUDENT +*/ + #include #include @@ -17,102 +26,167 @@ #include "rasterize.h" #include #include +#include +#include +#include +#include +#include + +static glm::vec4 lightpos(500.0f, 500.0f, 500.0f, 1.0f); +static glm::vec4 lightcol(0.92f, 0.92f, 0.85f, 1.0f); +static int numlights = 4; +//namespace { + + typedef unsigned short VertexIndex; + typedef glm::vec3 VertexAttributePosition; + typedef glm::vec3 VertexAttributeNormal; + typedef glm::vec2 VertexAttributeTexcoord; + typedef unsigned char TextureData; + + typedef unsigned char BufferByte; + + enum PrimitiveType{ + Point = 1, + Line = 2, + Triangle = 3 + }; + + struct VertexOut { + glm::vec4 pos; + + // TODO: add new attributes to your VertexOut + // The attributes listed below might be useful, + // but always feel free to modify on your own + + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation + //glm::vec3 col; + glm::vec2 texcoord0; + TextureData* dev_diffuseTex = NULL; + int texWidth, texHeight; + // ... + }; + + struct Primitive { + PrimitiveType primitiveType = Triangle; // C++ 11 init + VertexOut v[3]; + }; + + struct Fragment { + glm::vec3 color; -namespace { - - typedef unsigned short VertexIndex; - typedef glm::vec3 VertexAttributePosition; - typedef glm::vec3 VertexAttributeNormal; - typedef glm::vec2 VertexAttributeTexcoord; - typedef unsigned char TextureData; - - typedef unsigned char BufferByte; - - enum PrimitiveType{ - Point = 1, - Line = 2, - Triangle = 3 - }; - - struct VertexOut { - glm::vec4 pos; - - // TODO: add new attributes to your VertexOut - // The attributes listed below might be useful, - // but always feel free to modify on your own - - glm::vec3 eyePos; // eye space position used for shading - glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; - glm::vec2 texcoord0; - TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; - // ... - }; - - struct Primitive { - PrimitiveType primitiveType = Triangle; // C++ 11 init - VertexOut v[3]; - }; - - struct Fragment { - glm::vec3 color; - - // TODO: add new attributes to your Fragment - // The attributes listed below might be useful, - // but always feel free to modify on your own - - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; - // ... - }; - - struct PrimitiveDevBufPointers { - int primitiveMode; //from tinygltfloader macro - PrimitiveType primitiveType; - int numPrimitives; - int numIndices; - int numVertices; - - // Vertex In, const after loaded - VertexIndex* dev_indices; - VertexAttributePosition* dev_position; - VertexAttributeNormal* dev_normal; - VertexAttributeTexcoord* dev_texcoord0; - - // Materials, add more attributes when needed - TextureData* dev_diffuseTex; - // TextureData* dev_specularTex; - // TextureData* dev_normalTex; - // ... - - // Vertex Out, vertex used for rasterization, this is changing every frame - VertexOut* dev_verticesOut; - - // TODO: add more attributes when needed - }; - -} + // TODO: add new attributes to your Fragment + // The attributes listed below might be useful, + // but always feel free to modify on your own + + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; + // VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; + float depth; + // ... + }; + + struct PrimitiveDevBufPointers { + int primitiveMode; //from tinygltfloader macro + PrimitiveType primitiveType; + int numPrimitives; + int numIndices; + int numVertices; + + // Vertex In, const after loaded + VertexIndex* dev_indices; + VertexAttributePosition* dev_position; + VertexAttributeNormal* dev_normal; + VertexAttributeTexcoord* dev_texcoord0; + + // Materials, add more attributes when needed + TextureData* dev_diffuseTex; + // TextureData* dev_specularTex; + // TextureData* dev_normalTex; + // ... + + // Vertex Out, vertex used for rasterization, this is changing every frame + VertexOut* dev_verticesOut; + + // TODO: add more attributes when needed + int txWidth; + int txHeight; + }; + +//} static std::map> mesh2PrimitivesMap; static int width = 0; static int height = 0; +static int cwidth = 0; +static int cheight = 0; static int totalNumPrimitives = 0; static Primitive *dev_primitives = NULL; +static Primitive *dev_primitivestmp = NULL; static Fragment *dev_fragmentBuffer = NULL; +static Fragment *dev_dsfragmentBuffer = NULL; static glm::vec3 *dev_framebuffer = NULL; +static glm::vec3 *dev_dsframebuffer = NULL; + +//static glm::vec4 *dev_lightspos = NULL; +//static glm::vec4 *dev_lightscol = NULL; + static int * dev_depth = NULL; // you might need this buffer when doing depth test -/** - * Kernel that writes the image to the OpenGL PBO directly. - */ -__global__ + +__host__ __device__ bool operator<(const Primitive &lhs, const Primitive &rhs) +{ + return (lhs.v[0].eyePos.z + lhs.v[1].eyePos.z + lhs.v[2].eyePos.z) > (rhs.v[0].eyePos.z + rhs.v[1].eyePos.z + rhs.v[2].eyePos.z); +} + + +struct is_backface +{ + __host__ __device__ + bool operator()(const Primitive p) + { + return (p.v[0].eyeNor.z < 0.0 && p.v[2].eyeNor.z < 0.0 && p.v[2].eyeNor.z < 0.0); + } +}; + +__global__ +void downsample(int w, int h, glm::vec3 *dev_framebuffer, glm::vec3 *dev_dsframebuffer, bool aa) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + int index = x + (y * w); + + if (x < w - 1 && y < h - 1) + { + int index2 = ((x)* 2) + ((y)* 4 * (w)); + int index3 = x * 2 + ((y + 1) * 4 * w); + + if (aa) + { + dev_dsframebuffer[index] = + (dev_framebuffer[index2] + + dev_framebuffer[index2 + 1] + + dev_framebuffer[index3] + + dev_framebuffer[index3 + 1] + ) * 0.25f; + } + else + dev_dsframebuffer[index] = dev_framebuffer[index2]; + } +} + + + + +// +// Kernel that writes the image to the OpenGL PBO directly. +// +__global__ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; @@ -131,641 +205,1451 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { } } -/** -* Writes fragment colors to the framebuffer -*/ +// +// Writes fragment colors to the framebuffer +// __global__ -void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { +void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer, glm::vec4 lightposition, glm::vec4 lightcolor, bool spec) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; int index = x + (y * w); if (x < w && y < h) { framebuffer[index] = fragmentBuffer[index].color; - - // TODO: add your fragment shader code here - + + + // TODO: add your fragment shader code here + glm::vec3 incidentvec = glm::normalize(fragmentBuffer[index].eyePos - glm::vec3(lightposition)); + + // gouraud + framebuffer[index] *= glm::dot(incidentvec, -fragmentBuffer[index].eyeNor) * glm::vec3(lightcolor); + //if (index == 2500) + // printf("incidentvec = %f %f %f\n", incidentvec.x, incidentvec.y, incidentvec.z); + + // spec + if (spec) + { + float speccontrib = 0.5; + int power = 2; + glm::vec3 incident = glm::normalize(fragmentBuffer[index].eyePos); + glm::vec3 ilight = glm::normalize(glm::vec3(lightposition) - fragmentBuffer[index].eyePos); + glm::vec3 rfl = glm::reflect(incident, fragmentBuffer[index].eyeNor); + framebuffer[index] += powf(glm::clamp(glm::dot(rfl, ilight), 0.0f, 1.0f), power) * speccontrib * glm::vec3(lightcolor); + } + } } -/** - * Called once at the beginning of the program to allocate memory. - */ -void rasterizeInit(int w, int h) { +// +// Called once at the beginning of the program to allocate memory. +// +void rasterizeInit(int w, int h, int cw, int ch) { width = w; height = h; - cudaFree(dev_fragmentBuffer); - cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment)); - cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + cwidth = cw; + cheight = ch; + cudaFree(dev_fragmentBuffer); + cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment)); + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + cudaMalloc(&dev_dsfragmentBuffer, width * height * sizeof(Fragment)); + cudaMemset(dev_dsfragmentBuffer, 0, width * height * sizeof(Fragment)); cudaFree(dev_framebuffer); - cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); + cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); - - cudaFree(dev_depth); - cudaMalloc(&dev_depth, width * height * sizeof(int)); + cudaFree(dev_dsframebuffer); + cudaMalloc(&dev_dsframebuffer, cwidth * cheight * sizeof(glm::vec3)); + cudaMemset(dev_dsframebuffer, 0, cwidth * cheight * sizeof(glm::vec3)); + + //cudaFree(dev_lightspos); + //cudaMalloc(&dev_lightspos, numlights * sizeof(glm::vec4)); + //cudaMemset(dev_lightspos, 0, numlights * sizeof(glm::vec4)); + + //cudaFree(dev_lightscol); + //cudaMalloc(&dev_lightscol, numlights * sizeof(glm::vec4)); + //cudaMemset(dev_lightscol, 0, numlights * sizeof(glm::vec4)); + + cudaFree(dev_depth); + cudaMalloc(&dev_depth, width * height * sizeof(int)); - checkCUDAError("rasterizeInit"); + checkCUDAError("rasterizeInit"); } __global__ -void initDepth(int w, int h, int * depth) +void initDepth(int w, int h, Fragment* dev_fragmentbuffer) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; - - if (x < w && y < h) - { - int index = x + (y * w); - depth[index] = INT_MAX; - } + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + dev_fragmentbuffer[index].depth = 999999999.9f; + } } -/** -* kern function with support for stride to sometimes replace cudaMemcpy -* One thread is responsible for copying one component -*/ -__global__ +__global__ +void mergeframebuffers(int w, int h, Fragment* dev_out, Fragment* dev_in, float div) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + dev_out[index].color = (dev_out[index].color + dev_in[index].color) * div; + } +} + +__global__ +void multframebuffer(int w, int h, Fragment* dev_in, float m) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + dev_in[index].color = dev_in[index].color * m; + } +} + +__global__ +void accumframebuffers(int w, int h, Fragment* dev_dst, Fragment* dev_src) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + dev_dst[index].color += dev_src[index].color; + //dev_dst[index].dev_diffuseTex = dev_src[index].dev_diffuseTex; + dev_dst[index].eyeNor = (dev_dst[index].eyeNor + dev_src[index].eyeNor)*0.5f; + dev_dst[index].eyePos = (dev_dst[index].eyePos + dev_src[index].eyePos)*0.5f; + dev_dst[index].depth = (dev_dst[index].depth + dev_src[index].depth)*0.5f; + } +} + + +// +// kern function with support for stride to sometimes replace cudaMemcpy +// One thread is responsible for copying one component +// +__global__ void _deviceBufferCopy(int N, BufferByte* dev_dst, const BufferByte* dev_src, int n, int byteStride, int byteOffset, int componentTypeByteSize) { - - // Attribute (vec3 position) - // component (3 * float) - // byte (4 * byte) - - // id of component - int i = (blockIdx.x * blockDim.x) + threadIdx.x; - - if (i < N) { - int count = i / n; - int offset = i - count * n; // which component of the attribute - - for (int j = 0; j < componentTypeByteSize; j++) { - - dev_dst[count * componentTypeByteSize * n - + offset * componentTypeByteSize - + j] - - = - - dev_src[byteOffset - + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) - + offset * componentTypeByteSize - + j]; - } - } - + // Attribute (vec3 position) + // component (3 * float) + // byte (4 * byte) + + // id of component + int i = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (i < N) { + int count = i / n; + int offset = i - count * n; // which component of the attribute + + for (int j = 0; j < componentTypeByteSize; j++) { + + dev_dst[count * componentTypeByteSize * n + + offset * componentTypeByteSize + + j] + + = + + dev_src[byteOffset + + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) + + offset * componentTypeByteSize + + j]; + } + } } __global__ void _nodeMatrixTransform( - int numVertices, - VertexAttributePosition* position, - VertexAttributeNormal* normal, - glm::mat4 MV, glm::mat3 MV_normal) { - - // vertex id - int vid = (blockIdx.x * blockDim.x) + threadIdx.x; - if (vid < numVertices) { - position[vid] = glm::vec3(MV * glm::vec4(position[vid], 1.0f)); - normal[vid] = glm::normalize(MV_normal * normal[vid]); - } +int numVertices, +VertexAttributePosition* position, +VertexAttributeNormal* normal, +glm::mat4 MV, glm::mat3 MV_normal) { + + // vertex id + int vid = (blockIdx.x * blockDim.x) + threadIdx.x; + if (vid < numVertices) { + position[vid] = glm::vec3(MV * glm::vec4(position[vid], 1.0f)); + normal[vid] = glm::normalize(MV_normal * normal[vid]); + } } glm::mat4 getMatrixFromNodeMatrixVector(const tinygltf::Node & n) { - - glm::mat4 curMatrix(1.0); - - const std::vector &m = n.matrix; - if (m.size() > 0) { - // matrix, copy it - - for (int i = 0; i < 4; i++) { - for (int j = 0; j < 4; j++) { - curMatrix[i][j] = (float)m.at(4 * i + j); - } - } - } else { - // no matrix, use rotation, scale, translation - - if (n.translation.size() > 0) { - curMatrix[3][0] = n.translation[0]; - curMatrix[3][1] = n.translation[1]; - curMatrix[3][2] = n.translation[2]; - } - - if (n.rotation.size() > 0) { - glm::mat4 R; - glm::quat q; - q[0] = n.rotation[0]; - q[1] = n.rotation[1]; - q[2] = n.rotation[2]; - - R = glm::mat4_cast(q); - curMatrix = curMatrix * R; - } - - if (n.scale.size() > 0) { - curMatrix = curMatrix * glm::scale(glm::vec3(n.scale[0], n.scale[1], n.scale[2])); - } - } - - return curMatrix; + + glm::mat4 curMatrix(1.0); + + const std::vector &m = n.matrix; + if (m.size() > 0) { + // matrix, copy it + + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + curMatrix[i][j] = (float)m.at(4 * i + j); + } + } + } + else { + // no matrix, use rotation, scale, translation + + if (n.translation.size() > 0) { + curMatrix[3][0] = n.translation[0]; + curMatrix[3][1] = n.translation[1]; + curMatrix[3][2] = n.translation[2]; + } + + if (n.rotation.size() > 0) { + glm::mat4 R; + glm::quat q; + q[0] = n.rotation[0]; + q[1] = n.rotation[1]; + q[2] = n.rotation[2]; + + R = glm::mat4_cast(q); + curMatrix = curMatrix * R; + } + + if (n.scale.size() > 0) { + curMatrix = curMatrix * glm::scale(glm::vec3(n.scale[0], n.scale[1], n.scale[2])); + } + } + + return curMatrix; } -void traverseNode ( - std::map & n2m, - const tinygltf::Scene & scene, - const std::string & nodeString, - const glm::mat4 & parentMatrix - ) +void traverseNode( + std::map & n2m, + const tinygltf::Scene & scene, + const std::string & nodeString, + const glm::mat4 & parentMatrix + ) { - const tinygltf::Node & n = scene.nodes.at(nodeString); - glm::mat4 M = parentMatrix * getMatrixFromNodeMatrixVector(n); - n2m.insert(std::pair(nodeString, M)); + const tinygltf::Node & n = scene.nodes.at(nodeString); + glm::mat4 M = parentMatrix * getMatrixFromNodeMatrixVector(n); + n2m.insert(std::pair(nodeString, M)); - auto it = n.children.begin(); - auto itEnd = n.children.end(); + auto it = n.children.begin(); + auto itEnd = n.children.end(); - for (; it != itEnd; ++it) { - traverseNode(n2m, scene, *it, M); - } + for (; it != itEnd; ++it) { + traverseNode(n2m, scene, *it, M); + } } void rasterizeSetBuffers(const tinygltf::Scene & scene) { - totalNumPrimitives = 0; + totalNumPrimitives = 0; + + std::map bufferViewDevPointers; + + // 1. copy all `bufferViews` to device memory + { + std::map::const_iterator it( + scene.bufferViews.begin()); + std::map::const_iterator itEnd( + scene.bufferViews.end()); + + for (; it != itEnd; it++) { + const std::string key = it->first; + const tinygltf::BufferView &bufferView = it->second; + if (bufferView.target == 0) { + continue; // Unsupported bufferView. + } + + const tinygltf::Buffer &buffer = scene.buffers.at(bufferView.buffer); + + BufferByte* dev_bufferView; + cudaMalloc(&dev_bufferView, bufferView.byteLength); + cudaMemcpy(dev_bufferView, &buffer.data.front() + bufferView.byteOffset, bufferView.byteLength, cudaMemcpyHostToDevice); + + checkCUDAError("Set BufferView Device Mem"); + + bufferViewDevPointers.insert(std::make_pair(key, dev_bufferView)); + } + } - std::map bufferViewDevPointers; - // 1. copy all `bufferViews` to device memory - { - std::map::const_iterator it( - scene.bufferViews.begin()); - std::map::const_iterator itEnd( - scene.bufferViews.end()); - for (; it != itEnd; it++) { - const std::string key = it->first; - const tinygltf::BufferView &bufferView = it->second; - if (bufferView.target == 0) { - continue; // Unsupported bufferView. - } + // 2. for each mesh: + // for each primitive: + // build device buffer of indices, materail, and each attributes + // and store these pointers in a map + { + + std::map nodeString2Matrix; + auto rootNodeNamesList = scene.scenes.at(scene.defaultScene); + { + auto it = rootNodeNamesList.begin(); + auto itEnd = rootNodeNamesList.end(); + for (; it != itEnd; ++it) { + traverseNode(nodeString2Matrix, scene, *it, glm::mat4(1.0f)); + } + } + + + // parse through node to access mesh + + auto itNode = nodeString2Matrix.begin(); + auto itEndNode = nodeString2Matrix.end(); + for (; itNode != itEndNode; ++itNode) { + + const tinygltf::Node & N = scene.nodes.at(itNode->first); + const glm::mat4 & matrix = itNode->second; + const glm::mat3 & matrixNormal = glm::transpose(glm::inverse(glm::mat3(matrix))); + + auto itMeshName = N.meshes.begin(); + auto itEndMeshName = N.meshes.end(); + + for (; itMeshName != itEndMeshName; ++itMeshName) { + + const tinygltf::Mesh & mesh = scene.meshes.at(*itMeshName); + + auto res = mesh2PrimitivesMap.insert(std::pair>(mesh.name, std::vector())); + std::vector & primitiveVector = (res.first)->second; + + // for each primitive + for (size_t i = 0; i < mesh.primitives.size(); i++) { + const tinygltf::Primitive &primitive = mesh.primitives[i]; + + if (primitive.indices.empty()) + return; + + // TODO: add new attributes for your PrimitiveDevBufPointers when you add new attributes + VertexIndex* dev_indices = NULL; + VertexAttributePosition* dev_position = NULL; + VertexAttributeNormal* dev_normal = NULL; + VertexAttributeTexcoord* dev_texcoord0 = NULL; + + // ----------Indices------------- + + const tinygltf::Accessor &indexAccessor = scene.accessors.at(primitive.indices); + const tinygltf::BufferView &bufferView = scene.bufferViews.at(indexAccessor.bufferView); + BufferByte* dev_bufferView = bufferViewDevPointers.at(indexAccessor.bufferView); + + // assume type is SCALAR for indices + int n = 1; + int numIndices = indexAccessor.count; + int componentTypeByteSize = sizeof(VertexIndex); + int byteLength = numIndices * n * componentTypeByteSize; + + dim3 numThreadsPerBlock(32); + dim3 numBlocks((numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + cudaMalloc(&dev_indices, byteLength); + _deviceBufferCopy << > > ( + numIndices, + (BufferByte*)dev_indices, + dev_bufferView, + n, + indexAccessor.byteStride, + indexAccessor.byteOffset, + componentTypeByteSize); + + + checkCUDAError("Set Index Buffer"); + + + // ---------Primitive Info------- + + // Warning: LINE_STRIP is not supported in tinygltfloader + int numPrimitives; + PrimitiveType primitiveType; + switch (primitive.mode) { + case TINYGLTF_MODE_TRIANGLES: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices / 3; + break; + case TINYGLTF_MODE_TRIANGLE_STRIP: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices - 2; + break; + case TINYGLTF_MODE_TRIANGLE_FAN: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices - 2; + break; + case TINYGLTF_MODE_LINE: + primitiveType = PrimitiveType::Line; + numPrimitives = numIndices / 2; + break; + case TINYGLTF_MODE_LINE_LOOP: + primitiveType = PrimitiveType::Line; + numPrimitives = numIndices + 1; + break; + case TINYGLTF_MODE_POINTS: + primitiveType = PrimitiveType::Point; + numPrimitives = numIndices; + break; + default: + // output error + break; + }; + + + // ----------Attributes------------- + + auto it(primitive.attributes.begin()); + auto itEnd(primitive.attributes.end()); + + int numVertices = 0; + // for each attribute + for (; it != itEnd; it++) { + const tinygltf::Accessor &accessor = scene.accessors.at(it->second); + const tinygltf::BufferView &bufferView = scene.bufferViews.at(accessor.bufferView); + + int n = 1; + if (accessor.type == TINYGLTF_TYPE_SCALAR) { + n = 1; + } + else if (accessor.type == TINYGLTF_TYPE_VEC2) { + n = 2; + } + else if (accessor.type == TINYGLTF_TYPE_VEC3) { + n = 3; + } + else if (accessor.type == TINYGLTF_TYPE_VEC4) { + n = 4; + } + + BufferByte * dev_bufferView = bufferViewDevPointers.at(accessor.bufferView); + BufferByte ** dev_attribute = NULL; + + numVertices = accessor.count; + int componentTypeByteSize; + + // Note: since the type of our attribute array (dev_position) is static (float32) + // We assume the glTF model attribute type are 5126(FLOAT) here + + if (it->first.compare("POSITION") == 0) { + componentTypeByteSize = sizeof(VertexAttributePosition) / n; + dev_attribute = (BufferByte**)&dev_position; + } + else if (it->first.compare("NORMAL") == 0) { + componentTypeByteSize = sizeof(VertexAttributeNormal) / n; + dev_attribute = (BufferByte**)&dev_normal; + } + else if (it->first.compare("TEXCOORD_0") == 0) { + componentTypeByteSize = sizeof(VertexAttributeTexcoord) / n; + dev_attribute = (BufferByte**)&dev_texcoord0; + } + + std::cout << accessor.bufferView << " - " << it->second << " - " << it->first << '\n'; + + dim3 numThreadsPerBlock(32); + dim3 numBlocks((n * numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + int byteLength = numVertices * n * componentTypeByteSize; + cudaMalloc(dev_attribute, byteLength); + + _deviceBufferCopy << > > ( + n * numVertices, + *dev_attribute, + dev_bufferView, + n, + accessor.byteStride, + accessor.byteOffset, + componentTypeByteSize); + + std::string msg = "Set Attribute Buffer: " + it->first; + checkCUDAError(msg.c_str()); + } + + // malloc for VertexOut + VertexOut* dev_vertexOut; + cudaMalloc(&dev_vertexOut, numVertices * sizeof(VertexOut)); + checkCUDAError("Malloc VertexOut Buffer"); + + // ----------Materials------------- + + int width = 0; + int height = 0; + + // You can only worry about this part once you started to + // implement textures for your rasterizer + TextureData* dev_diffuseTex = NULL; + if (!primitive.material.empty()) { + const tinygltf::Material &mat = scene.materials.at(primitive.material); + printf("material.name = %s\n", mat.name.c_str()); + + if (mat.values.find("diffuse") != mat.values.end()) { + std::string diffuseTexName = mat.values.at("diffuse").string_value; + if (scene.textures.find(diffuseTexName) != scene.textures.end()) { + const tinygltf::Texture &tex = scene.textures.at(diffuseTexName); + if (scene.images.find(tex.source) != scene.images.end()) { + const tinygltf::Image &image = scene.images.at(tex.source); + + size_t s = image.image.size() * sizeof(TextureData); + cudaMalloc(&dev_diffuseTex, s); + cudaMemcpy(dev_diffuseTex, &image.image.at(0), s, cudaMemcpyHostToDevice); + + // TODO: store the image size to your PrimitiveDevBufPointers + // image.width; + // image.height; + width = image.width; + height = image.height; + + checkCUDAError("Set Texture Image data"); + } + } + } + + // TODO: write your code for other materails + // You may have to take a look at tinygltfloader + // You can also use the above code loading diffuse material as a start point + } + + + // ---------Node hierarchy transform-------- + cudaDeviceSynchronize(); + + dim3 numBlocksNodeTransform((numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + _nodeMatrixTransform << > > ( + numVertices, + dev_position, + dev_normal, + matrix, + matrixNormal); + + checkCUDAError("Node hierarchy transformation"); + + // at the end of the for loop of primitive + // push dev pointers to map + primitiveVector.push_back(PrimitiveDevBufPointers{ + primitive.mode, + primitiveType, + numPrimitives, + numIndices, + numVertices, + + dev_indices, + dev_position, + dev_normal, + dev_texcoord0, + + dev_diffuseTex, + + dev_vertexOut, //VertexOut + + width, + height, + }); - const tinygltf::Buffer &buffer = scene.buffers.at(bufferView.buffer); + totalNumPrimitives += numPrimitives; - BufferByte* dev_bufferView; - cudaMalloc(&dev_bufferView, bufferView.byteLength); - cudaMemcpy(dev_bufferView, &buffer.data.front() + bufferView.byteOffset, bufferView.byteLength, cudaMemcpyHostToDevice); + } // for each primitive + + } // for each mesh + + } // for each node + + } - checkCUDAError("Set BufferView Device Mem"); - bufferViewDevPointers.insert(std::make_pair(key, dev_bufferView)); + // 3. Malloc for dev_primitives + { + cudaMalloc(&dev_primitives, totalNumPrimitives * sizeof(Primitive)); + cudaMalloc(&dev_primitivestmp, totalNumPrimitives * sizeof(Primitive)); + } - } - } + // Finally, cudaFree raw dev_bufferViews + { + std::map::const_iterator it(bufferViewDevPointers.begin()); + std::map::const_iterator itEnd(bufferViewDevPointers.end()); - // 2. for each mesh: - // for each primitive: - // build device buffer of indices, materail, and each attributes - // and store these pointers in a map - { - - std::map nodeString2Matrix; - auto rootNodeNamesList = scene.scenes.at(scene.defaultScene); - - { - auto it = rootNodeNamesList.begin(); - auto itEnd = rootNodeNamesList.end(); - for (; it != itEnd; ++it) { - traverseNode(nodeString2Matrix, scene, *it, glm::mat4(1.0f)); - } - } + //bufferViewDevPointers + for (; it != itEnd; it++) { + cudaFree(it->second); + } - // parse through node to access mesh - - auto itNode = nodeString2Matrix.begin(); - auto itEndNode = nodeString2Matrix.end(); - for (; itNode != itEndNode; ++itNode) { - - const tinygltf::Node & N = scene.nodes.at(itNode->first); - const glm::mat4 & matrix = itNode->second; - const glm::mat3 & matrixNormal = glm::transpose(glm::inverse(glm::mat3(matrix))); - - auto itMeshName = N.meshes.begin(); - auto itEndMeshName = N.meshes.end(); - - for (; itMeshName != itEndMeshName; ++itMeshName) { - - const tinygltf::Mesh & mesh = scene.meshes.at(*itMeshName); - - auto res = mesh2PrimitivesMap.insert(std::pair>(mesh.name, std::vector())); - std::vector & primitiveVector = (res.first)->second; - - // for each primitive - for (size_t i = 0; i < mesh.primitives.size(); i++) { - const tinygltf::Primitive &primitive = mesh.primitives[i]; - - if (primitive.indices.empty()) - return; - - // TODO: add new attributes for your PrimitiveDevBufPointers when you add new attributes - VertexIndex* dev_indices; - VertexAttributePosition* dev_position; - VertexAttributeNormal* dev_normal; - VertexAttributeTexcoord* dev_texcoord0; - - // ----------Indices------------- - - const tinygltf::Accessor &indexAccessor = scene.accessors.at(primitive.indices); - const tinygltf::BufferView &bufferView = scene.bufferViews.at(indexAccessor.bufferView); - BufferByte* dev_bufferView = bufferViewDevPointers.at(indexAccessor.bufferView); - - // assume type is SCALAR for indices - int n = 1; - int numIndices = indexAccessor.count; - int componentTypeByteSize = sizeof(VertexIndex); - int byteLength = numIndices * n * componentTypeByteSize; - - dim3 numThreadsPerBlock(128); - dim3 numBlocks((numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - cudaMalloc(&dev_indices, byteLength); - _deviceBufferCopy << > > ( - numIndices, - (BufferByte*)dev_indices, - dev_bufferView, - n, - indexAccessor.byteStride, - indexAccessor.byteOffset, - componentTypeByteSize); - - - checkCUDAError("Set Index Buffer"); - - - // ---------Primitive Info------- - - // Warning: LINE_STRIP is not supported in tinygltfloader - int numPrimitives; - PrimitiveType primitiveType; - switch (primitive.mode) { - case TINYGLTF_MODE_TRIANGLES: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices / 3; - break; - case TINYGLTF_MODE_TRIANGLE_STRIP: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices - 2; - break; - case TINYGLTF_MODE_TRIANGLE_FAN: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices - 2; - break; - case TINYGLTF_MODE_LINE: - primitiveType = PrimitiveType::Line; - numPrimitives = numIndices / 2; - break; - case TINYGLTF_MODE_LINE_LOOP: - primitiveType = PrimitiveType::Line; - numPrimitives = numIndices + 1; - break; - case TINYGLTF_MODE_POINTS: - primitiveType = PrimitiveType::Point; - numPrimitives = numIndices; - break; - default: - // output error - break; - }; - - - // ----------Attributes------------- - - auto it(primitive.attributes.begin()); - auto itEnd(primitive.attributes.end()); - - int numVertices = 0; - // for each attribute - for (; it != itEnd; it++) { - const tinygltf::Accessor &accessor = scene.accessors.at(it->second); - const tinygltf::BufferView &bufferView = scene.bufferViews.at(accessor.bufferView); - - int n = 1; - if (accessor.type == TINYGLTF_TYPE_SCALAR) { - n = 1; - } - else if (accessor.type == TINYGLTF_TYPE_VEC2) { - n = 2; - } - else if (accessor.type == TINYGLTF_TYPE_VEC3) { - n = 3; - } - else if (accessor.type == TINYGLTF_TYPE_VEC4) { - n = 4; - } - - BufferByte * dev_bufferView = bufferViewDevPointers.at(accessor.bufferView); - BufferByte ** dev_attribute = NULL; - - numVertices = accessor.count; - int componentTypeByteSize; - - // Note: since the type of our attribute array (dev_position) is static (float32) - // We assume the glTF model attribute type are 5126(FLOAT) here - - if (it->first.compare("POSITION") == 0) { - componentTypeByteSize = sizeof(VertexAttributePosition) / n; - dev_attribute = (BufferByte**)&dev_position; - } - else if (it->first.compare("NORMAL") == 0) { - componentTypeByteSize = sizeof(VertexAttributeNormal) / n; - dev_attribute = (BufferByte**)&dev_normal; - } - else if (it->first.compare("TEXCOORD_0") == 0) { - componentTypeByteSize = sizeof(VertexAttributeTexcoord) / n; - dev_attribute = (BufferByte**)&dev_texcoord0; - } - - std::cout << accessor.bufferView << " - " << it->second << " - " << it->first << '\n'; - - dim3 numThreadsPerBlock(128); - dim3 numBlocks((n * numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - int byteLength = numVertices * n * componentTypeByteSize; - cudaMalloc(dev_attribute, byteLength); - - _deviceBufferCopy << > > ( - n * numVertices, - *dev_attribute, - dev_bufferView, - n, - accessor.byteStride, - accessor.byteOffset, - componentTypeByteSize); - - std::string msg = "Set Attribute Buffer: " + it->first; - checkCUDAError(msg.c_str()); - } - - // malloc for VertexOut - VertexOut* dev_vertexOut; - cudaMalloc(&dev_vertexOut, numVertices * sizeof(VertexOut)); - checkCUDAError("Malloc VertexOut Buffer"); - - // ----------Materials------------- - - // You can only worry about this part once you started to - // implement textures for your rasterizer - TextureData* dev_diffuseTex = NULL; - if (!primitive.material.empty()) { - const tinygltf::Material &mat = scene.materials.at(primitive.material); - printf("material.name = %s\n", mat.name.c_str()); - - if (mat.values.find("diffuse") != mat.values.end()) { - std::string diffuseTexName = mat.values.at("diffuse").string_value; - if (scene.textures.find(diffuseTexName) != scene.textures.end()) { - const tinygltf::Texture &tex = scene.textures.at(diffuseTexName); - if (scene.images.find(tex.source) != scene.images.end()) { - const tinygltf::Image &image = scene.images.at(tex.source); - - size_t s = image.image.size() * sizeof(TextureData); - cudaMalloc(&dev_diffuseTex, s); - cudaMemcpy(dev_diffuseTex, &image.image.at(0), s, cudaMemcpyHostToDevice); - - // TODO: store the image size to your PrimitiveDevBufPointers - // image.width; - // image.height; - - checkCUDAError("Set Texture Image data"); - } - } - } - - // TODO: write your code for other materails - // You may have to take a look at tinygltfloader - // You can also use the above code loading diffuse material as a start point - } - - - // ---------Node hierarchy transform-------- - cudaDeviceSynchronize(); - - dim3 numBlocksNodeTransform((numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - _nodeMatrixTransform << > > ( - numVertices, - dev_position, - dev_normal, - matrix, - matrixNormal); - - checkCUDAError("Node hierarchy transformation"); - - // at the end of the for loop of primitive - // push dev pointers to map - primitiveVector.push_back(PrimitiveDevBufPointers{ - primitive.mode, - primitiveType, - numPrimitives, - numIndices, - numVertices, - - dev_indices, - dev_position, - dev_normal, - dev_texcoord0, - - dev_diffuseTex, - - dev_vertexOut //VertexOut - }); - - totalNumPrimitives += numPrimitives; - - } // for each primitive - - } // for each mesh - - } // for each node - - } - - - // 3. Malloc for dev_primitives - { - cudaMalloc(&dev_primitives, totalNumPrimitives * sizeof(Primitive)); - } - - - // Finally, cudaFree raw dev_bufferViews - { - - std::map::const_iterator it(bufferViewDevPointers.begin()); - std::map::const_iterator itEnd(bufferViewDevPointers.end()); - - //bufferViewDevPointers - - for (; it != itEnd; it++) { - cudaFree(it->second); - } - - checkCUDAError("Free BufferView Device Mem"); - } + checkCUDAError("Free BufferView Device Mem"); + } } -__global__ +__global__ void _vertexTransformAndAssembly( - int numVertices, - PrimitiveDevBufPointers primitive, - glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, - int width, int height) { - - // vertex id - int vid = (blockIdx.x * blockDim.x) + threadIdx.x; - if (vid < numVertices) { - - // TODO: Apply vertex transformation here - // Multiply the MVP matrix for each vertex position, this will transform everything into clipping space - // Then divide the pos by its w element to transform into NDC space - // Finally transform x and y to viewport space - - // TODO: Apply vertex assembly here - // Assemble all attribute arraies into the primitive array - - } +int numVertices, +PrimitiveDevBufPointers primitive, +glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, +int width, int height) { + + // vertex id + int vid = (blockIdx.x * blockDim.x) + threadIdx.x; + if (vid < numVertices) { + + // TODO: Apply vertex transformation here + // Multiply the MVP matrix for each vertex position, this will transform everything into clipping space + // Then divide the pos by its w element to transform into NDC space + // Finally transform x and y to viewport space + + + glm::vec4 p = MVP * glm::vec4(primitive.dev_position[vid], 1.0f); + glm::vec4 ep = MV * glm::vec4(primitive.dev_position[vid], 1.0f); + + //if (vid % 8 == 0) + // printf("\n[%f %f] [%f %f] [%f %f]", p.x, p.y, p.x, p.y, p.x, p.y); + + p /= p.w; + + p.x = 0.5 * width * ((-p.x) * p.w + 1.0f); + p.y = 0.5 * height * ((-p.y) * p.w + 1.0f); + + // TODO: Apply vertex assembly here + // Assemble all attribute arraies into the primitive array + + primitive.dev_verticesOut[vid].pos = p; + + + glm::vec3 n = MV_normal * primitive.dev_normal[vid]; + primitive.dev_verticesOut[vid].eyeNor = n; + primitive.dev_verticesOut[vid].eyePos = glm::vec3(ep); + //primitive.dev_verticesOut[vid].col = primitive.col; + + if (primitive.dev_texcoord0 != NULL) + { + primitive.dev_verticesOut[vid].texcoord0 = primitive.dev_texcoord0[vid]; + primitive.dev_verticesOut[vid].dev_diffuseTex = primitive.dev_diffuseTex; + } + } } +__global__ void clearVertexBuffer(int n, Fragment* dev_fragmentBuffer, glm::vec3 color) +{ + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index < n) + { + dev_fragmentBuffer[index].color = color; + dev_fragmentBuffer[index].depth = 999999999.9; + } +} static int curPrimitiveBeginId = 0; -__global__ +__global__ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_primitives, PrimitiveDevBufPointers primitive) { - // index id - int iid = (blockIdx.x * blockDim.x) + threadIdx.x; + // index id + int iid = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (iid < numIndices) { + + // TODO: uncomment the following code for a start + // This is primitive assembly for triangles + + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + } - if (iid < numIndices) { - // TODO: uncomment the following code for a start - // This is primitive assembly for triangles + // TODO: other primitive types (point, line) + // processed in scanline function + } + +} - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} +__global__ +void scanline(int w, int h, Fragment* dev_fragBuffer, int numidx, int idbegin, +Primitive* dev_primitives, PrimitiveDevBufPointers primitive, +int mode, bool perspectivecorrect, float xoffset, float yoffset, bool aabbcheck, bool cheapculling) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + //printf("\nnum idx = %d", primitive.numPrimitives); + //primitive.dev_verticesOut[primitive.dev_indices[index]] + + //printf("\n1"); + glm::vec2 coords(x + xoffset, y + yoffset); + + float depth = dev_fragBuffer[index].depth; + for (int i = 0; i < numidx; i ++) + { + //hit = false; + + glm::vec3 p[] = { glm::vec3(dev_primitives[i].v[0].pos), + glm::vec3(dev_primitives[i].v[1].pos), + glm::vec3(dev_primitives[i].v[2].pos) }; + + + if (aabbcheck) + { + // bounding box check + float eps = 0.2f; + float minx = fminf(p[0].x - eps, fminf(p[1].x - eps, p[2].x - eps)); + float maxx = fmaxf(p[0].x + eps, fmaxf(p[1].x + eps, p[2].x + eps)); + float miny = fminf(p[0].y - eps, fminf(p[1].y - eps, p[2].y - eps)); + float maxy = fmaxf(p[0].y + eps, fmaxf(p[1].y + eps, p[2].y + eps)); + + if (minx > x || maxx < x || miny > y || maxy < y) + continue; + // bounding box end + } + /* + // remove backfaces without thrust + if (dev_primitives[i].v[0].eyeNor.z < 0.0f && + dev_primitives[i].v[1].eyeNor.z < 0.0f && + dev_primitives[i].v[2].eyeNor.z < 0.0f) + continue; + */ + + //printf("\npos[0] = %f %f %f", p[0].x, p[0].y, p[0].z); + + //printf("2"); + glm::vec3 bc = calculateBarycentricCoordinate(p, coords); + + //printf("\npos[0] = %f %f %f", bc.x, bc.y, bc.z); + //if (coords.y > 100.0f) + // printf("coords = %f %f\n", coords.x, coords.y); + if (bc.x < 0.0 || bc.x > 1.0 || + bc.y < 0.0 || bc.y > 1.0 || + bc.z < 0.0 || bc.z > 1.0) + { + continue; + } + else + { + //printf("\nHELLO WE ARE HERE!"); + if (mode == 0) // polygons + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { dev_primitives[i].v[0].eyeNor, + dev_primitives[i].v[1].eyeNor, + dev_primitives[i].v[2].eyeNor }; + + + glm::vec3 e[] = { dev_primitives[i].v[0].eyePos, + dev_primitives[i].v[1].eyePos, + dev_primitives[i].v[2].eyePos }; + + + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); + depth = tmp_depth; + //dev_fragBuffer[index].color = glm::vec3(1.0, 0.0, 0.0); + + + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + + // COLOR TEST -------------------------------------------- + if (dev_primitives[0].v[0].dev_diffuseTex == NULL) + { + dev_fragBuffer[index].color = glm::vec3(0.8, 0.8, 0.8); + dev_fragBuffer[index].depth = depth; + continue; + } + // COLOR TEST -------------------------------------------- + + float dt = glm::dot(glm::normalize(ee - pp), nn); + //dev_fragBuffer[index].color *= dt; + + + // texture test + + glm::vec3 uvs[] = { glm::vec3(dev_primitives[i].v[0].texcoord0, 0.0f), + glm::vec3(dev_primitives[i].v[1].texcoord0, 0.0f), + glm::vec3(dev_primitives[i].v[2].texcoord0, 0.0f) }; + + glm::vec3 uv(0.0f); + if (perspectivecorrect) + { + + float w = (1.0f / p[0].z) * bc[0] + (1.0f / p[1].z) * bc[1] + (1.0f / p[2].z) * bc[2]; + float u = ((uvs[0].x / p[0].z) * bc[0] + (uvs[1].x / p[1].z) * bc[1] + (uvs[2].x / p[2].z) * bc[2]) / w; + float v = ((uvs[0].y / p[0].z) * bc[0] + (uvs[1].y / p[1].z) * bc[1] + (uvs[2].y / p[2].z) * bc[2]) / w; + uv.x = u; + uv.y = v; + + /* + uvs[0] /= p[0].z; + uvs[1] /= p[1].z; + uvs[2] /= p[2].z; + + uvs[0].z = 1.0f / p[0].z; + uvs[1].z = 1.0f / p[1].z; + uvs[2].z = 1.0f / p[2].z; + */ + + } + else + { + uv = (uvs[0] * bc.x + uvs[1] * bc.y + uvs[2] * bc.z); + } + + /* + if (perspectivecorrect) + { + float zz = 1.0f / (uvs[0].z * bc.x + uvs[1].z * bc.y + uvs[2].z * bc.z); + uv *= zz; + } + */ + uv *= primitive.txHeight; + + int cix = ((int)uv.y * primitive.txHeight + (int)uv.x) * 3; + cix = cix % (primitive.txHeight * primitive.txWidth * 3); + + + //primitive.dev_verticesOut[primitive.dev_indices[i]].texcoord0; + unsigned char tx1 = dev_primitives[0].v[0].dev_diffuseTex[cix]; + unsigned char tx2 = dev_primitives[0].v[0].dev_diffuseTex[cix + 1]; + unsigned char tx3 = dev_primitives[0].v[0].dev_diffuseTex[cix + 2]; + + unsigned int red = tx1; + unsigned int green = tx2; + unsigned int blue = tx3; + + + //finalcolor = glm::vec3((float)red / 255.0, (float)green / 255.0, (float)blue / 255.0); + dev_fragBuffer[index].color = glm::vec3((float)red / 255.0, (float)green / 255.0, (float)blue / 255.0); + dev_fragBuffer[index].depth = depth; + } + } + else if (mode == 1 && (bc.x <= 0.04f || bc.y <= 0.04f || bc.z <= 0.04f)) + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { dev_primitives[i].v[0].eyeNor, + dev_primitives[i].v[0].eyeNor, + dev_primitives[i].v[0].eyeNor }; + + + glm::vec3 e[] = { dev_primitives[i].v[0].eyePos, + dev_primitives[i].v[0].eyePos, + dev_primitives[i].v[0].eyePos }; + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); + + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + depth = tmp_depth; + dev_fragBuffer[index].color = glm::vec3(1.0, 1.0, 1.0); + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + float dt = glm::dot(glm::normalize(ee - pp), nn); + dev_fragBuffer[index].color *= dt; + dev_fragBuffer[index].depth = depth; + } + } + + else if ((bc.x <= 0.04f && bc.y <= 0.04f) || + (bc.x <= 0.04f && bc.z <= 0.04f) || + (bc.y <= 0.04f && bc.z <= 0.04f)) + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { dev_primitives[i].v[0].eyeNor, + dev_primitives[i].v[0].eyeNor, + dev_primitives[i].v[0].eyeNor }; + + + glm::vec3 e[] = { dev_primitives[i].v[0].eyePos, + dev_primitives[i].v[0].eyePos, + dev_primitives[i].v[0].eyePos }; + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); + + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + + depth = tmp_depth; + dev_fragBuffer[index].color = glm::vec3(1.0, 1.0, 1.0); + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + float dt = glm::dot(glm::normalize(ee - pp), nn); + dev_fragBuffer[index].color *= dt; + dev_fragBuffer[index].depth = depth; + + } + } + if (cheapculling) // not advised + break; // dangerous bold move used after sorting that assumes that no bigger polygons intersect + } + //printf("3"); + } + } +} + + +__global__ +void scanline_bak(int w, int h, Fragment* dev_fragBuffer, int numidx, int idbegin, +Primitive* dev_primitives, PrimitiveDevBufPointers primitive, +int mode, bool perspectivecorrect, float xoffset, float yoffset) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + if (x < w && y < h) + { + int index = x + (y * w); + //printf("\nnum idx = %d", primitive.numPrimitives); + //primitive.dev_verticesOut[primitive.dev_indices[index]] + + //printf("\n1"); + glm::vec2 coords(x + xoffset, y + yoffset); + + float depth = dev_fragBuffer[index].depth; + for (int i = 0; i < primitive.numIndices; i += 3) + { + //hit = false; + glm::vec3 p[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].pos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].pos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].pos }; + + //printf("2"); + glm::vec3 bc = calculateBarycentricCoordinate(p, coords); + + if (bc.x < 0.0 || bc.x > 1.0 || + bc.y < 0.0 || bc.y > 1.0 || + bc.z < 0.0 || bc.z > 1.0) + { + continue; + } + else + { + if (mode == 0) + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyeNor }; + + + glm::vec3 e[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyePos }; + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); + + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + + depth = tmp_depth; + //dev_fragBuffer[index].color = glm::vec3(1.0, 0.0, 0.0); + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + float dt = glm::dot(glm::normalize(ee - pp), nn); + //dev_fragBuffer[index].color *= dt; + + + // texture test + + glm::vec3 uvs[] = { glm::vec3(primitive.dev_texcoord0[primitive.dev_indices[i]], 0.0f), + glm::vec3(primitive.dev_texcoord0[primitive.dev_indices[i + 1]], 0.0f), + glm::vec3(primitive.dev_texcoord0[primitive.dev_indices[i + 2]], 0.0f) }; + + glm::vec3 uv(0.0f); + if (perspectivecorrect) + { + + float w = (1.0f / p[0].z) * bc[0] + (1.0f / p[1].z) * bc[1] + (1.0f / p[2].z) * bc[2]; + float u = ((uvs[0].x / p[0].z) * bc[0] + (uvs[1].x / p[1].z) * bc[1] + (uvs[2].x / p[2].z) * bc[2]) / w; + float v = ((uvs[0].y / p[0].z) * bc[0] + (uvs[1].y / p[1].z) * bc[1] + (uvs[2].y / p[2].z) * bc[2]) / w; + uv.x = u; + uv.y = v; + + /* + uvs[0] /= p[0].z; + uvs[1] /= p[1].z; + uvs[2] /= p[2].z; + + uvs[0].z = 1.0f / p[0].z; + uvs[1].z = 1.0f / p[1].z; + uvs[2].z = 1.0f / p[2].z; + */ + + } + else + { + uv = (uvs[0] * bc.x + uvs[1] * bc.y + uvs[2] * bc.z); + } + + /* + if (perspectivecorrect) + { + float zz = 1.0f / (uvs[0].z * bc.x + uvs[1].z * bc.y + uvs[2].z * bc.z); + uv *= zz; + } + */ + uv *= primitive.txHeight; + + int cix = ((int)uv.y * primitive.txHeight + (int)uv.x) * 3; + cix = cix % (primitive.txHeight * primitive.txWidth * 3); + + + //primitive.dev_verticesOut[primitive.dev_indices[i]].texcoord0; + unsigned char tx1 = primitive.dev_diffuseTex[cix]; + unsigned char tx2 = primitive.dev_diffuseTex[cix + 1]; + unsigned char tx3 = primitive.dev_diffuseTex[cix + 2]; + + unsigned int red = tx1; + unsigned int green = tx2; + unsigned int blue = tx3; + + + //finalcolor = glm::vec3((float)red / 255.0, (float)green / 255.0, (float)blue / 255.0); + dev_fragBuffer[index].color = glm::vec3((float)red / 255.0, (float)green / 255.0, (float)blue / 255.0); + dev_fragBuffer[index].depth = depth; + } + } + else if (mode == 1 && (bc.x <= 0.01 || bc.y <= 0.01 || bc.z <= 0.01)) + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyeNor }; + + + glm::vec3 e[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyePos }; + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); - // TODO: other primitive types (point, line) - } - + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + depth = tmp_depth; + dev_fragBuffer[index].color = glm::vec3(1.0, 0.0, 0.0); + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + float dt = glm::dot(glm::normalize(ee - pp), nn); + dev_fragBuffer[index].color *= dt; + dev_fragBuffer[index].depth = depth; + + } + } + else if ((bc.x <= 0.02 && bc.y <= 0.02) || + (bc.x <= 0.02 && bc.z <= 0.02) || + (bc.y <= 0.02 && bc.z <= 0.02)) + { + // get the position from barycenter + glm::vec3 pp = p[0] * bc.x + p[1] * bc.y + p[2] * bc.z; + + float tmp_depth = pp.z; + if (tmp_depth < depth) + { + glm::vec3 n[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyeNor, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyeNor }; + + + glm::vec3 e[] = { (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 1]].eyePos, + (glm::vec3)primitive.dev_verticesOut[primitive.dev_indices[i + 2]].eyePos }; + + glm::vec3 nn = n[0] * bc.x + n[1] * bc.y + n[2] * bc.z; + nn = glm::normalize(nn); + + glm::vec3 ee = e[0] * bc.x + e[1] * bc.y + e[2] * bc.z; + + + depth = tmp_depth; + dev_fragBuffer[index].color = glm::vec3(1.0, 0.0, 0.0); + dev_fragBuffer[index].eyeNor = nn; + dev_fragBuffer[index].eyePos = ee; + + //float dt = glm::dot(glm::normalize(ee - pp), nn); + //dev_fragBuffer[index].color *= dt; + dev_fragBuffer[index].depth = depth; + + } + } + } + //printf("3"); + } + } } -/** - * Perform rasterization. - */ -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) { +// +// Perform rasterization. +// +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal, + int displaymode, bool perepectivecorrect, bool spec, bool aa, bool supersample, + bool culling, bool testingmode, bool aabbcheck, bool cheapculling) { int sideLength2d = 8; dim3 blockSize2d(sideLength2d, sideLength2d); - dim3 blockCount2d((width - 1) / blockSize2d.x + 1, - (height - 1) / blockSize2d.y + 1); - - // Execute your rasterization pipeline here - // (See README for rasterization pipeline outline.) - - // Vertex Process & primitive assembly - { - curPrimitiveBeginId = 0; - dim3 numThreadsPerBlock(128); - - auto it = mesh2PrimitivesMap.begin(); - auto itEnd = mesh2PrimitivesMap.end(); - - for (; it != itEnd; ++it) { - auto p = (it->second).begin(); // each primitive - auto pEnd = (it->second).end(); - for (; p != pEnd; ++p) { - dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - - _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); - checkCUDAError("Vertex Processing"); - cudaDeviceSynchronize(); - _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > - (p->numIndices, - curPrimitiveBeginId, - dev_primitives, - *p); - checkCUDAError("Primitive Assembly"); - - curPrimitiveBeginId += p->numPrimitives; - } - } - - checkCUDAError("Vertex Processing and Primitive Assembly"); - } - - cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); - initDepth << > >(width, height, dev_depth); - - // TODO: rasterize + dim3 blockCount2d((width - 1) / blockSize2d.x + 1, + (height - 1) / blockSize2d.y + 1); + + // Execute your rasterization pipeline here + // (See README for rasterization pipeline outline.) + + + + cudaEvent_t start_vertexTransformAndAssembly, stop_vertexTransformAndAssembly; + cudaEvent_t start_scanline, stop_scanline; + cudaEvent_t start_render, stop_render; + cudaEvent_t start_aa, stop_aa; + cudaEvent_t start_downsample, stop_downsample; + + float ms_vertexTransformAndAssembly = 0.0f; + float ms_scanline = 0.0f; + float ms_render = 0.0f; + float ms_aa = 0.0f; + float ms_downsample = 0.0f; + + float ms1 = 0.0f; + float ms2 = 0.0f; + float ms3 = 0.0f; + float ms4 = 0.0f; + float ms5 = 0.0f; + + + // update light transform + glm::vec4 lightposition = MV * lightpos; + lightposition /= lightposition.w; + + + if (testingmode) + { + cudaEventCreate(&start_vertexTransformAndAssembly); + cudaEventCreate(&stop_vertexTransformAndAssembly); + cudaEventRecord(start_vertexTransformAndAssembly); + } + + // Vertex Process & primitive assembly + { + curPrimitiveBeginId = 0; + dim3 numThreadsPerBlock(32); + + auto it = mesh2PrimitivesMap.begin(); + auto itEnd = mesh2PrimitivesMap.end(); + + for (; it != itEnd; ++it) { + auto p = (it->second).begin(); // each primitive + auto pEnd = (it->second).end(); + for (; p != pEnd; ++p) { + dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + + _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); + checkCUDAError("Vertex Processing"); + cudaDeviceSynchronize(); + _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > + (p->numIndices, + curPrimitiveBeginId, + dev_primitives, + *p); + checkCUDAError("Primitive Assembly"); + + curPrimitiveBeginId += p->numPrimitives; + } + } + + checkCUDAError("Vertex Processing and Primitive Assembly"); + } + + if (testingmode) + { + cudaEventRecord(stop_vertexTransformAndAssembly); cudaEventSynchronize(stop_vertexTransformAndAssembly); + ms1 = 0; + cudaEventElapsedTime(&ms1, start_vertexTransformAndAssembly, stop_vertexTransformAndAssembly); + ms_vertexTransformAndAssembly = ms1; + cudaEventDestroy(start_vertexTransformAndAssembly); + cudaEventDestroy(stop_vertexTransformAndAssembly); + } + + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + initDepth << > >(width, height, dev_fragmentBuffer); + cudaMemset(dev_dsfragmentBuffer, 0, width * height * sizeof(Fragment)); + initDepth << > >(width, height, dev_dsfragmentBuffer); + + // TODO: rasterize + auto it = mesh2PrimitivesMap.begin(); + auto itEnd = mesh2PrimitivesMap.end(); + + //for (; it != itEnd; it++) { + auto p = (it->second).begin(); // each primitive + auto pEnd = (it->second).end(); + //for (; p != pEnd; ++p) { + + + cudaMemcpy(dev_primitivestmp, dev_primitives, totalNumPrimitives*sizeof(Primitive), cudaMemcpyDeviceToDevice); + int totalNumPrimitives_tmp = totalNumPrimitives; + if (culling) + { + // remove backfaces ------------------------------------------------- + thrust::device_ptr thrust_prims(dev_primitivestmp); + thrust::sort(thrust_prims, thrust_prims + totalNumPrimitives); + //thrust::device_ptr thrust_prims2(dev_primitivestmp); + thrust::device_ptr P = thrust::remove_if(thrust_prims, thrust_prims + totalNumPrimitives, is_backface()); + totalNumPrimitives_tmp = P - thrust_prims; + // remove backfaces ------------------------------------------------- + } + + + if (testingmode) + { + cudaEventCreate(&start_scanline); + cudaEventCreate(&stop_scanline); + cudaEventRecord(start_scanline); + } + + //printf("totalNumPrimitives = %d\n", totalNumPrimitives); + scanline << > >(width, height, + dev_fragmentBuffer, + totalNumPrimitives_tmp, //p->numIndices, + curPrimitiveBeginId, + dev_primitivestmp, + *p, + displaymode, + perepectivecorrect, 0, 0, + aabbcheck, cheapculling); + checkCUDAError("scanline"); + //accumframebuffers << > >(width, height, dev_fragmentBuffer, dev_dsfragmentBuffer); + //checkCUDAError("accumframebuffers"); + + if (testingmode) + { + cudaEventRecord(stop_scanline); cudaEventSynchronize(stop_scanline); + ms2 = 0; + cudaEventElapsedTime(&ms2, start_scanline, stop_scanline); + ms_scanline = ms2; + cudaEventDestroy(start_scanline); + cudaEventDestroy(stop_scanline); + } + + // antialias + // shift + if (aa && !supersample) + { + if (testingmode) + { + cudaEventCreate(&start_aa); + cudaEventCreate(&stop_aa); + cudaEventRecord(start_aa); + } + + glm::vec3 black(0.0f); + clearVertexBuffer << > >(width * height, dev_dsfragmentBuffer, black); + checkCUDAError("clearVertexBuffer"); + + scanline << > >(width, height, + dev_dsfragmentBuffer, + totalNumPrimitives_tmp, //p->numIndices, + curPrimitiveBeginId, + dev_primitivestmp, + *p, + displaymode, + perepectivecorrect, 0.2, 0, + aabbcheck, cheapculling); + + checkCUDAError("Scanline"); + accumframebuffers << > >(width, height, dev_fragmentBuffer, dev_dsfragmentBuffer); + checkCUDAError("accumframebuffers"); + + clearVertexBuffer << > >(width * height, dev_dsfragmentBuffer, black); + checkCUDAError("clearVertexBuffer"); + scanline << > >(width, height, + dev_dsfragmentBuffer, + totalNumPrimitives_tmp, //p->numIndices, + curPrimitiveBeginId, + dev_primitivestmp, + *p, + displaymode, + perepectivecorrect, 0, 0.25, + aabbcheck, cheapculling); + + checkCUDAError("Scanline"); + accumframebuffers << > >(width, height, dev_fragmentBuffer, dev_dsfragmentBuffer); + checkCUDAError("accumframebuffers"); + + clearVertexBuffer << > >(width * height, dev_dsfragmentBuffer, black); + checkCUDAError("clearVertexBuffer"); + scanline << > >(width, height, + dev_dsfragmentBuffer, + totalNumPrimitives_tmp, //p->numIndices, + curPrimitiveBeginId, + dev_primitivestmp, + *p, + displaymode, + perepectivecorrect, 0.25, 0.2, + aabbcheck, cheapculling); + + checkCUDAError("Scanline"); + accumframebuffers << > >(width, height, dev_fragmentBuffer, dev_dsfragmentBuffer); + checkCUDAError("accumframebuffers"); + + + + multframebuffer << > >(width, height, dev_fragmentBuffer, 0.25f); + checkCUDAError("mergeframebuffer"); + //} + + //curPrimitiveBeginId += p->numPrimitives; + //} + + if (testingmode) + { + cudaEventRecord(stop_aa); cudaEventSynchronize(stop_aa); + ms3 = 0; + cudaEventElapsedTime(&ms3, start_aa, stop_aa); + ms_aa = ms3; + cudaEventDestroy(start_aa); + cudaEventDestroy(stop_aa); + } + } + + + if (testingmode) + { + cudaEventCreate(&start_render); + cudaEventCreate(&stop_render); + cudaEventRecord(start_render); + } // Copy depthbuffer colors into framebuffer - render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); - checkCUDAError("fragment shader"); - // Copy framebuffer into OpenGL buffer for OpenGL previewing - sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); - checkCUDAError("copy render result to pbo"); + render << > >(width, height, dev_fragmentBuffer, dev_framebuffer, lightposition, lightcol, spec); + checkCUDAError("fragment shader"); + + if (testingmode) + { + cudaEventRecord(stop_render); cudaEventSynchronize(stop_render); + ms4 = 0; + cudaEventElapsedTime(&ms4, start_render, stop_render); + ms_render = ms4; + cudaEventDestroy(start_render); + cudaEventDestroy(stop_render); + } + + + if (supersample) + { + if (testingmode) + { + cudaEventCreate(&start_downsample); + cudaEventCreate(&stop_downsample); + cudaEventRecord(start_downsample); + } + + downsample << > >(cwidth, cheight, dev_framebuffer, dev_dsframebuffer, aa); + checkCUDAError("downsample"); + + if (testingmode) + { + cudaEventRecord(stop_downsample); cudaEventSynchronize(stop_downsample); + ms5 = 0; + cudaEventElapsedTime(&ms5, start_downsample, stop_downsample); + ms_downsample = ms5; + cudaEventDestroy(start_downsample); + cudaEventDestroy(stop_downsample); + } + + // Copy framebuffer into OpenGL buffer for OpenGL previewing + sendImageToPBO << > >(pbo, cwidth, cheight, dev_dsframebuffer); + checkCUDAError("copy render result to pbo"); + } + else + { + // Copy framebuffer into OpenGL buffer for OpenGL previewing + sendImageToPBO << > >(pbo, width, height, dev_framebuffer); + checkCUDAError("copy render result to pbo"); + } + + if (testingmode) + { + printf("[%f, %f, %f, %f, %f],\n", ms_vertexTransformAndAssembly, + ms_scanline, + ms_aa, + ms_render, + ms_downsample); + } } -/** - * Called once at the end of the program to free CUDA memory. - */ +// +// Called once at the end of the program to free CUDA memory. +// void rasterizeFree() { // deconstruct primitives attribute/indices device buffer - auto it(mesh2PrimitivesMap.begin()); - auto itEnd(mesh2PrimitivesMap.end()); - for (; it != itEnd; ++it) { - for (auto p = it->second.begin(); p != it->second.end(); ++p) { - cudaFree(p->dev_indices); - cudaFree(p->dev_position); - cudaFree(p->dev_normal); - cudaFree(p->dev_texcoord0); - cudaFree(p->dev_diffuseTex); + auto it(mesh2PrimitivesMap.begin()); + auto itEnd(mesh2PrimitivesMap.end()); + for (; it != itEnd; ++it) { + for (auto p = it->second.begin(); p != it->second.end(); ++p) { + cudaFree(p->dev_indices); + cudaFree(p->dev_position); + cudaFree(p->dev_normal); + cudaFree(p->dev_texcoord0); + cudaFree(p->dev_diffuseTex); - cudaFree(p->dev_verticesOut); + cudaFree(p->dev_verticesOut); - - //TODO: release other attributes and materials - } - } - //////////// + //TODO: release other attributes and materials + } + } + + //////////// cudaFree(dev_primitives); dev_primitives = NULL; - cudaFree(dev_fragmentBuffer); - dev_fragmentBuffer = NULL; + cudaFree(dev_fragmentBuffer); + dev_fragmentBuffer = NULL; cudaFree(dev_framebuffer); dev_framebuffer = NULL; - cudaFree(dev_depth); - dev_depth = NULL; + cudaFree(dev_dsframebuffer); + dev_dsframebuffer = NULL; + + cudaFree(dev_depth); + dev_depth = NULL; checkCUDAError("rasterize Free"); } diff --git a/src/rasterize.h b/src/rasterize.h index 560aae9..8478a10 100644 --- a/src/rasterize.h +++ b/src/rasterize.h @@ -17,8 +17,10 @@ namespace tinygltf{ } -void rasterizeInit(int width, int height); +void rasterizeInit(int width, int height, int cwidth, int cheight); void rasterizeSetBuffers(const tinygltf::Scene & scene); -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal); +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal, + int displaymode, bool perepectivecorrect, + bool spec, bool aa, bool supersample, bool culling, bool testingmode, bool aabbcheck, bool cheapculling); void rasterizeFree();