Skip to content

The new Shader, GraphicsPipeline and DescriptorSets API usage

Przemog1 edited this page Nov 2, 2020 · 2 revisions

General overview

Prologue

Imagine that you would like to use raw API like OpenGL or Vulkan to get your desired image into a shader and use it to draw a mesh for instance. You could furthermore use the shader to compute some lights, refractions, etc, with usage of all data passed. The question is - how can you perform all the stuff above with our Nabla API usage? Well, several steps have to be done.

Quick brief of preparing to rendering process

First of all, we have to provide ICPUImageView Asset, so if you have ICPUImage Asset solely, you have to create an ICPUImageView of the Image. An ICPUShader is required, so you have to create one by providing GLSL string or by using existing shader saved in spriv binary code. Within ICPUShader, you create ICPUSpecializedShader specifing an entry point, stage kind and filling values used for specialization constants. Next step is to create a pipeline. There are two options you have to choose while creating - ICPUComputePipeline for compute shaders and ICPURenderpassIndependentPipeline for graphics shaders where you can use shaders like Vertex Shader, Tessellation Shader, Geometry Shader or Fragment Shader. To create a pipeline you have to provide it's layout - ICPUPipelineLayout. There is no difference in creating it for graphics pipeline and compute pipeline. The pipeline layout consists of descriptor set layouts and push constant ranges - SPushConstantRange. Remember that in our engine there can be 4 descriptor sets, so the pipeline layout consists of 4 descriptor set layouts as well.

To make your textures (ICPUImageView) or descriptors available for a shader, you have to bind them in appriopriate way, but to be able to bind them, you have to handle the descriptor set - ICPUDescriptorSet. To create ICPUDescriptorSet you need to create it's layout first. The layout tells the user which descriptor types are in certain bindings in a set, but they doesn't tell you which descriptors they are directly, because it is a descriptor set task to inform about it - ICPUDescriptorSetLayout. Remember that while binding a descriptor set to N number (index), it has to have a compatible layout with a description set layout which is being used in pipeline layout a pipeline under N number (index) is currently using.

Having done it we have to create GPU version of those to finally gain at least gpu meshbuffer, gpu pipeline and gpu descritpor sets if we wanted to do something curious in your shaders and use some calls for updating, binding and drawing processes.

Actually all you need to perform to render your first image with camera usage is covered in another tutorial. See this for more details: https://github.com/Devsh-Graphics-Programming/Nabla/wiki/Nabla-API-displaying-an-image-with-texture-in-a-pill

Graphs - creation GPU objects from CPU

Notes

Actually the gpu creation isn't only limited to creating those GPU ones by using their CPU version, because you can create a GPU object by driver independently. You should imagine 2 tress/DAGs. One filled completely with CPU objects and one filled with GPU - they create perfect mirrow. On each stage you can exchange a certain branch of GPU objects in the tree from A to Z with CPU object branch using cpu2gpu conventer. So basically you are able to create and fill entire tree with GPU objects without touching asset namespace - CPU.

Precise terminology description

Nabla objects

Each VkObject exists in Nabla engine as the combination of:

  • ICPU{Object}
  • IGPU{Object}

I, ICPU and IGPU prefix:

  • I without CPU or GPU sufix - abstract interface for various objects
  • ICPU - CPU version of an object. Generally used by loaders and writers. It exists because of cpu2gpu conversion capability, but nothing prevents you from using it. Stores all the data needed to transform it to GPU version of itself. CPU object isn't entirely immutable, it has some mutable functionality.
  • IGPU - GPU version of an object. Such objects are used by a driver. Each time you need to perform something to graphics card or send to it a request like updating descriptor sets, drawing a mesh or binding a pipeline - you have to use gpu version of an object passed to such calls. Furthermore GPU object is immutable, so cannot be changed. It often doesn't live in RAM memory, as it's top data takes a few bytes and the rest is placed in a driver as inaccesible data often as Video RAM.

Layouts

To provide an analogy, you should think of Layouts:

  • PipelineLayout
  • DescriptorSetLayout

as C++ data type declarations, and the objects created from layouts:

  • RenderpassIndependentPipeline
  • ComputePipeline
  • DescriptorSet

as particular objects of said type created in memory.

Pipeline

The pipeline object gathers all the information about the resources (types) and settings you'll be using while rendering, so that there are no nasty surprises in the "hot" render-loop.

There are two types of pipelines in Vulkan, a compute pipeline and a graphics pipeline.

They specify all the settings (programmable and fixed pipeline state) necessary to execute a {Multi}{Indirect}Draw or {Indirect}Dispatch, but none of the specific inputs like:

  • buffers
  • images
  • samplers

or outputs:

  • buffers
  • images

to use.

We introduced a RenderpassIndependentPipeline because we disliked how Vulkan forced the user to know about the types and formats of Framebuffer Attachments (Render Targets in DirectX parlance) that will be used when creating the pipeline. This would have made it impossible to load models in a "screen agnostic way".

Given a pipeline which is bound to a "pipeline bind point" (graphics or compute) you can bind Descriptor Sets to different set indices for different "pipeline bind points" (4 sets times 2 pipeline bind points) and that will make the shader use the Descriptors in the Set as shader inputs:

  • {dynamic} UBO
  • {dynamic} SSBO
  • Uniform Texel Buffer
  • Storage Texel Buffer
  • Combined Image Samples
  • Storage Images

or outputs:

  • SSBO
  • {dynamic} SSBO
  • Storage Texel Buffer
  • Storage Images

Pipeline layout

Pipeline layout stores all the state (options) common to multiple draw calls (meshes) as an aggregate. It exists because the same object exists in the Vulkan API.

The Pipeline Layout specifies all 4 templates of resource types ( a null descriptor layout is an empty template) that will be used by all the shaders used in the draw or compute dispatch. This is to allow for faster validation and execution of the hot loop, as most of the validation of shader GPU inputs and program CPU outputs has now been hoisted to pipeline creation time.

The pipeline layout is separated from the pipeline object to allow the creation of multiple pipelines that may have different shader code but which we want to remain compatible with each other so that switching between them is less expensive than between unrelated pipelines.

Extra knowledge

Quickly skimming over the topic, we can say that in OpenGL and D3D11 there was often a situation where changing a Raster State (fixed function setting such as alpha blending equation, or depth comparison mode) caused a shader recompile and a hick-up while rendering because most GPUs are not 100% as fixed function as the Graphics API specificiation (on AMD there are no hardware units for pixel shader interpolation, on mobile the blending is programmable).

Descriptor Set

DescriptorSet itself is a collection of resources conforming to the template given by DescriptorSetLayout and it has to have the exact same number and type of resources as specified by the Layout. Descriptor Sets do not provide the vertex shader inputs, or fragment shader outputs (or subpass inputs). They are an improvement over the legacy API (OpenGL, DirectX11) way of binding resources, whereas resources had to be bound 1 by 1, they are instead bound all at once for a bundle.

However because certain resources have a different frequency of binding (per mesh texture vs. a shadowmap for an entire scene) there are up to 4 separate slots for 4 separate descriptor sets so that all the resources can be broken down into descriptor sets which allow them to be bound somewhat more independently (but not on a 1 by 1 granularity).

Buffers, Images and Samplers all derive from IDescriptor and can be bound under different bindings in a DescriptorSet, so that they can be bound to the GPU API together in a single API call with efficiency.

How does it relate to shader?

a GLSL shader declares resources with:

layout(set = N, binding = M) TYPE name[K];

The DescriptorSetLayout shall at least contain a binding declaration specifying each resource for the same value of N with the correct M, TYPE, and K for that particular resource.

There should be as many DescriptorSetLayout objects as there are different values of N, obviously some DescriptorSetLayout objects would have the exact same contents/definition so those can be shared (no need to create duplicate objects).

The application has complete freedom in how it associates/binds DescriptorSets to the GLSL set= bindings (not so much for the binding or array count).

Descriptor Set Layout

The descriptor set layout is a template, it specifies the bindings (in the shader GLSL interfaces), counts and types of resources:

  • UBO
  • SSBO
  • combined image samplers
  • storage images

that will be used by the shader stages (the shader stage flag, vertex, fragment, etc.). This template is used to create descriptor sets (handle to a bundle of resources) which specify the actual resources (not the types) that will be used by a shader at execution time.

Shader

Shader is a program that will be executed by gpu side to make stuff occour visible on a screen or will just perform some computing processes. Shaders have to be attached to a pipeline in form of SpecializedShader (gpu or cpu, depends on your creation way). SpecializedShader consists of unspecialzed Shader containing glsl code, and creation information parameters such as entry point to a shader or stage of a shader.

Push Constants

Push Constants serve a similar purpose to a Uniform Buffer Object, however they serve as a fast path with regard to data upload from the CPU and data access from the GPU. They are implemented by taking some GPU core registers and pre-loading them with the appropriate values before the shader starts, register count is very limited so one needs to tightly specify the ranges of the "virtual memory" that will actually be used. The virtual memory range is also very small, just 128 bytes.

Push Constants are an alternative to an UBO where it performs really poorly, mostly very small and very frequent updates. Examples of which are:

  • Global Object ID
  • Material/Mesh flags implemented as bits
  • Unique per DrawCall indices or bit-flags

To get a value from a UBO the following needs to happen:

  1. The shader invocation needs to start
  2. The location corresponding to the GPU memory holding the UBO must be determined
  3. This memory needs to be fetched
  4. On first run it won't be in the cache so 1000 cycle latency is incurred, otherwise just 30

To get a value from a Push Constant the following needs to happen:

  1. Before the Draw is recorded in the CPU-side command buffer the push constant update is recorded
  2. When the Draw is recorded in the CPU-side command buffer, a register update is scheduled if push constant updated
  3. Command Buffer is executed by GPU-side (driver or hardware) scheduler and dispatcher
  4. The registers mapped to push constant ranges are updated with new values
  5. Shader Invocation starts
  6. Value already in registers, no additional operations on the GPU core are necessary

Push Constants are especially useful to specify a per-draw-call uint offset into a large UBO or SSBO that hold all the data for multiple draw calls. There is no way to get per-draw-call data from a large array without either:

  • sending the index into the array as a push constant
  • knowing the index into the array as a constant
  • uploading the index into an UBO (very slow and stalling)
  • using 1 descriptor set per each draw with same UBO bound at slightly different offsets
  • using 1 descriptor set with "dynamic UBO" but this set needs to be rebound with new dynamic offsets before every draw-call

The last two options are impossible when "UBO binding offset alignment requirement" doesn't hold, for example you have an array of tighly packed 196 byte structs of per-mesh-buffer data, but the Graphics API only wants the UBO buffers to be bound with offsets aligned to 256. So as you can see push constant is the only robust way to do this.