CIS565-Fall-2016 · gabenaghi · Nov 10, 2016 · Nov 10, 2016 · Nov 10, 2016 · Nov 10, 2016
diff --git a/README.md b/README.md
@@ -1,16 +1,3 @@
-Vulkan Flocking: compute and shading in one pipeline!
-======================
+Vulkan Flocking: compute and shading in one pipeline!======================**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 6*** Gabriel Naghi* Windows 7, Xeon E5-1630 @ 3.70GHz 32GB, GeForce GTX 1070 4095MB (SIG Lab MOR103-56)![](img/boids.gif)##Vulkan BoidsVulkan in Khronos Group's new explicit API powering the latest generation of games. One of the interesting new features of the Vulkan API is that it provides an interface to do compute work as well as graphics. Simultaneous graphics and compute workloads are becoming increasingly important; it is believed that this is one of the main advantages held by AMD GPUs over those of NVIDIA. In this project, I run display the power of side-by-side graphics and compute with a very simple Boid Flocking re-implementation. This is the same algorithm as that implemented in [Project 1](https://github.com/gabenaghi/Project1-CUDA-Flocking), except in 2D. ##Food for Thought####Why do you think Vulkan expects explicit descriptors for things like generating pipelines and commands?Command buffers often live in pre-allocated GPU memory, which are allocated as command pools. These command pools can be highly optimized, depending on the type of commands that will live within. Vulkan thus expects explicit descriptors, so that the GPU can heavily optimize the memory they will live in. ####Describe a situation besides flip-flop buffers in which you may need multiple descriptor sets to fit one descriptor layout.Another scenario where you might need multiple descriptor sets with a single layout is if you have several data pools. For example, one might have several data pools from which he or she might want to draw data and, depending on the execution state, draw from any one of them. In this case, it is possible for them to define several descriptor sets and select whichever points to the data they deem necessary. ####What are some problems to keep in mind when using multiple Vulkan queues?When using multiple Vulkan queues, there are a couple problems to keep in mind. For one, since the queues are backed by physical GPU hardware, it is worth thinking about where precisely the queue is vs. where the queue is used. For example, if a queue is stored on one GPU and accessed from a second networked GPU, it might lead to poor performance. Additionally, one must beware of race conditions with the explicit threading capabilities provided by Vulkan. In particular, since it is possible to submit a single buffer for work in multiple queues, undefined outcomes will result if the buffers are not protected by some mutual exclusion mechanism. ####What is one advantage of using compute commands that can share data with a rendering pipeline?One huge advantage of compute commands sharing data with rendering pipelines is locality. Assuming the compiler and/or scheduler is intelligent enough, data should nearly always be local to any program which is executing on it, accelerating memory access and mitigating bandwidth bottlenecking. ## Bloopers
 
-**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 6**
-
-* (TODO) YOUR NAME HERE
-  Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
-
-  ### (TODO: Your README)
-
-  Include screenshots, analysis, etc. (Remember, this is public, so don't put
-  anything here that you don't want to share with the world.)
-
-### Credits
-
-* [Vulkan examples and demos](https://github.com/SaschaWillems/Vulkan) by [@SaschaWillems](https://github.com/SaschaWillems)
+It seems that I either did not learn my lesson or otherwise neglected to correct some aspect of my flocking algorithm from Project 1. I again created the black-hole boids that sucked in every other boid in some event horizon. Watch the video black-hole-boids.flv in the img/ directory to see what that looked like. ### Credits* [Vulkan examples and demos](https://github.com/SaschaWillems/Vulkan) by [@SaschaWillems](https://github.com/SaschaWillems)

diff --git a/data/shaders/computeparticles/flockingParticle.comp b/data/shaders/computeparticles/flockingParticle.comp
@@ -0,0 +1,115 @@
+#version 450
+
+#extension GL_ARB_separate_shader_objects : enable
+#extension GL_ARB_shading_language_420pack : enable
+
+struct Particle
+{
+	vec2 pos;
+	vec2 vel;
+};
+
+// LOOK: These bindings correspond to the DescriptorSetLayouts and
+// the DescriptorSets from prepareCompute()!
+
+// Binding 0 : Particle storage buffer (read)
+layout(std140, binding = 0) buffer ParticlesA
+{
+   Particle particlesA[ ];
+};
+
+// Binding 1 : Particle storage buffer (write)
+layout(std140, binding = 1) buffer ParticlesB
+{
+   Particle particlesB[ ];
+};
+
+layout (local_size_x = 16, local_size_y = 16) in;
+
+// LOOK: rule weights and distances, as well as particle count, based off uniforms.
+// The deltaT here has to be updated every frame to account for changes in
+// frame rate.
+layout (binding = 2) uniform UBO
+{
+	float deltaT;
+	float rule1Distance;
+	float rule2Distance;
+	float rule3Distance;
+	float rule1Scale;
+	float rule2Scale;
+	float rule3Scale;
+	int particleCount;
+} ubo;
+
+void main()
+{
+    // Current SSBO index
+    uint index = gl_GlobalInvocationID.x;
+	// Don't try to write beyond particle count
+    if (index >= ubo.particleCount)
+		return;
+
+    // Read position and velocity
+	vec2 vPos = particlesA[index].pos.xy;
+    vec2 vVel = particlesA[index].vel.xy;
+
+    vec2 centerOfMass = vec2(0.0f, 0.0f); //rule 1
+	vec2 keepAway     = vec2(0.0f, 0.0f); //rule 2
+	vec2 neighborVels = vec2(0.0f, 0.0f); //rule 3
+
+	float cnt1 = 0.0f;
+	float cnt3 = 0.0f;
+
+	for (int neighborIndex = 0; neighborIndex < ubo.particleCount; ++neighborIndex)
+	{
+		if (neighborIndex == index) continue;
+
+		vec2 neighborPos = particlesA[neighborIndex].pos.xy;
+		vec2 neighborVel = particlesA[neighborIndex].vel.xy;
+
+		// Rule 1: boids fly towards their local perceived center of mass, which excludes themselves
+		if (length(neighborPos - vPos) < ubo.rule1Distance)
+		{
+			centerOfMass = centerOfMass + neighborPos;
+			cnt1 = cnt1 + 1.0f;
+		}
+
+		// Rule 2: boids try to stay a distance d away from each other
+		if (length(neighborPos - vPos) < ubo.rule2Distance)
+			keepAway = keepAway - (neighborPos - vPos);
+
+		// Rule 3: boids try to match the speed of surrounding boids
+		if (length(neighborPos - vPos) < ubo.rule3Distance)
+		{
+			neighborVels = neighborVels + neighborVel;
+			cnt3 = cnt3 + 1.0f;
+		}
+	}
+
+	//calculate averaged parameters
+	if (cnt1 > 0.0f) 
+		centerOfMass = centerOfMass / cnt1;
+
+	centerOfMass = (centerOfMass - vPos) * ubo.rule1Scale;
+	keepAway = keepAway * ubo.rule2Scale;
+	neighborVels = neighborVels * ubo.rule3Scale;
+
+	vVel = vVel + centerOfMass + keepAway + neighborVels;
+
+	// clamp velocity for a more pleasing simulation.
+	vVel = normalize(vVel) * clamp(length(vVel), 0.0, 0.1);
+
+	// kinematic update
+	vPos += vVel * ubo.deltaT;
+
+    // Wrap around boundary
+	if (vPos.x < -1.0) vPos.x = 1.0;
+	if (vPos.x > 1.0) vPos.x = -1.0;
+	if (vPos.y < -1.0) vPos.y = 1.0;
+	if (vPos.y > 1.0) vPos.y = -1.0;
+
+    particlesB[index].pos.xy = vPos;
+
+    // Write back
+    particlesB[index].vel.xy = vVel;
+}
diff --git a/data/shaders/computeparticles/generate-spirv.bat b/data/shaders/computeparticles/generate-spirv.bat
@@ -1,5 +1,4 @@
 glslangvalidator -V particle.frag -o particle.frag.spv
 glslangvalidator -V particle.vert -o particle.vert.spv
 glslangvalidator -V particle.comp -o particle.comp.spv
-
-
+glslangvalidator -V flockingParticle.comp -o flockingParticle.comp.spv
diff --git a/img/2016-11-10-2018-28.flv b/img/2016-11-10-2018-28.flv
diff --git a/img/black-hole-boids.flv b/img/black-hole-boids.flv
diff --git a/img/boids.gif b/img/boids.gif
diff --git a/vulkanBoids/vulkanBoids.cpp b/vulkanBoids/vulkanBoids.cpp
@@ -151,13 +151,14 @@ class VulkanExample : public VulkanExampleBase
 
 		std::mt19937 rGenerator;
 		std::uniform_real_distribution<float> rDistribution(-1.0f, 1.0f);
+		float vel_scalar = 0.1f;
 
 		// Initial particle positions
 		std::vector<Particle> particleBuffer(PARTICLE_COUNT);
 		for (auto& particle : particleBuffer)
 		{
 			particle.pos = glm::vec2(rDistribution(rGenerator), rDistribution(rGenerator));
-			// TODO: add randomized velocities with a slight scale here, something like 0.1f.
+			particle.vel = vel_scalar * glm::vec2(rDistribution(rGenerator), rDistribution(rGenerator));
 		}
 
 		VkDeviceSize storageBufferSize = particleBuffer.size() * sizeof(Particle);
@@ -244,7 +245,7 @@ class VulkanExample : public VulkanExampleBase
 			VERTEX_BUFFER_BIND_ID,
 			1,
 			VK_FORMAT_R32G32_SFLOAT,
-			offsetof(Particle, pos)); // TODO: change this so that we can color the particles based on velocity.
+			offsetof(Particle, vel)); //color the particles based on velocity.
 
 		// vertices.inputState encapsulates everything we need for these particular buffers to
 		// interface with the graphics pipeline.
@@ -464,7 +465,7 @@ class VulkanExample : public VulkanExampleBase
 
 		// Create pipeline on the GPU - load shader, attach the pipeline layout we just made.
 		VkComputePipelineCreateInfo computePipelineCreateInfo = vkTools::initializers::computePipelineCreateInfo(compute.pipelineLayout, 0);
-		computePipelineCreateInfo.stage = loadShader(getAssetPath() + "shaders/computeparticles/particle.comp.spv", VK_SHADER_STAGE_COMPUTE_BIT);
+		computePipelineCreateInfo.stage = loadShader(getAssetPath() + "shaders/computeparticles/flockingParticle.comp.spv", VK_SHADER_STAGE_COMPUTE_BIT);
 		VK_CHECK_RESULT(vkCreateComputePipelines(device, pipelineCache, 1, &computePipelineCreateInfo, nullptr, &compute.pipeline));
 
 		//////// Create command pool and command buffer for compute commands ////////
@@ -515,9 +516,8 @@ class VulkanExample : public VulkanExampleBase
 
 		std::vector<VkWriteDescriptorSet> computeWriteDescriptorSets =
 		{
-			// LOOK
 			// WriteDescriptorSet writes each of these descriptors into the specified descriptorSet.
-			// THese first few are written into compute.descriptorSet[0].
+			// These first few are written into compute.descriptorSet[0].
 			// Each of these corresponds to a layout binding in the descriptor set layout,
 			// which in turn corresponds with something like `layout(std140, binding = 0)` in `particle.comp`.
 
@@ -540,13 +540,32 @@ class VulkanExample : public VulkanExampleBase
 			compute.descriptorSets[0],
 			VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
 			2,
-			&compute.uniformBuffer.descriptor)
+			&compute.uniformBuffer.descriptor),
 
-			// TODO: write the second descriptorSet, using the top for reference.
 			// We want the descriptorSets to be used for flip-flopping:
 			// on one frame, we use one descriptorSet with the compute pass,
 			// on the next frame, we use the other.
-			// What has to be different about how the second descriptorSet is written here?
+
+			// Binding 0 : Particle position storage buffer
+			vkTools::initializers::writeDescriptorSet(
+			compute.descriptorSets[1], // which descriptor set to write to?
+			VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+			0, // which binding in the descriptor set Layout?
+			&compute.storageBufferB.descriptor), // which SSBO?
+
+			// Binding 1 : Particle position storage buffer
+			vkTools::initializers::writeDescriptorSet(
+			compute.descriptorSets[1],
+			VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,
+			1,
+			&compute.storageBufferA.descriptor),
+
+			// Binding 2 : Uniform buffer
+			vkTools::initializers::writeDescriptorSet(
+			compute.descriptorSets[1],
+			VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER,
+			2,
+			&compute.uniformBuffer.descriptor)
 		};
 
 		vkUpdateDescriptorSets(device, static_cast<uint32_t>(computeWriteDescriptorSets.size()), computeWriteDescriptorSets.data(), 0, NULL);
@@ -583,13 +602,15 @@ class VulkanExample : public VulkanExampleBase
 		// are done executing.
 		VK_CHECK_RESULT(vkQueueSubmit(compute.queue, 1, &computeSubmitInfo, compute.fence));
 
-		// TODO: handle flip-flop logic. We want the next iteration to
+		// Handle flip-flop logic. We want the next iteration to
 		// run the compute pipeline with flipped SSBOs, so we have to
 		// swap the descriptorSets, which each allow access to the SSBOs
 		// in one configuration.
 		// We also want to flip what SSBO we draw with in the next
 		// pass through the graphics pipeline.
 		// Feel free to use std::swap here. You should need it twice.
+		std::swap(compute.descriptorSets[0], compute.descriptorSets[1]);
+		std::swap(compute.storageBufferA, compute.storageBufferB);
 	}
 
 	// Record command buffers for drawing using the graphics pipeline