Ideal GUI system architecture for Any 3D Engine

There's one thing I admire in the UNIX philosophy, which is "do one thing and do it well", in the same spirit IrrlichtBAW does not try to re-invent the wheel or encourage anyone to do so, by rolling its/their own GUI system.

So IrrlichtBAW will never have a GUI system. You can still however, submit your 3rdparty GUI library bindings or self-made GUI frameworks in pull requests into the ext/gui folder.

There are plenty of libraries that do it, and do it well and have nice GUI tools to design the UI as well as generate the underlying C++ code.

What the following does, is to describe the ideal way to draw GUI elements and/or override the drawing of a 3rd party GUI solution.

GUI can and should be drawn in one (or two) API call(s).

What needs to be done for this to be achieved:

Pool all GUI element types' bitmaps/images into one 2D texture array and use a single shader to draw
Keep a (semi-static SSBO) semi-static list of GUI element type specifications
Keep a (streaming buffer SSBO/UBO) dynamic list of GUI element instances to draw
Pool all GUI element geometry (vertices and indices) in a single IGPUBuffer
Assign a Z-value to every GUI element instance

Now lets go through the list in reverse (no Z-order blending pun intended) order.

How is this possible in 1 (2) API Calls? (Assigning a Z-value)

Trivially, GPU will blend transparent triangles in the order that they are drawn. So all you really need is to draw all the triangles making up your GUI from background to foreground.

To not completely kill your blending unit, I'd advise to draw the elements in a pre-pass with alpha test (discard in pixel shader if to-be-output alpha<=244.0/255.0) foreground to background with Z-Buffering enabled as a prepass (also using a 1 channel alpha-only texture and a simpler shader).

Then for the final pass you can draw you GUI back to front with the full texture, Z-testing only and using the proper shader with early fragment tests explicitly forced.

This is why assigning a Z-value helps, and it gives you a "key" to sort-on so that you don't need to remember and dance around the GUI element draw submission order (like you do in Irrlicht or other GUI systems with no concept of a Z-layer).

Pool all GUI element geometry in a single IGPUBuffer

To be able to draw a number of objects in a single OpenGL API draw-call (or Vulkan command buffer submission call for that matter) you need to keep the following identical for all objects:

The shader they are drawn with
The vertex format and the source buffers of vertices and indices (VAO spec)
The descriptor set (textures, UBOs, SSBOs)
IGraphicsPipeline which is all of the above plus material states.

For this reason you need to make an IGPUBuffer large enough to contain all of the polygon data of you GUI elements, you can suballocate whatever ranges you like, and there's no need to fill the empty spaces. Just make sure all your offsets are aligned to the size of the thing you are storing (2 or 4 bytes for indices, and sizeof(YourGUIVertexStruct) for your vertices, so that you can use baseVertex to offset).

All of the meta-data (offsets in the buffer for gui element meshbuffers) for this can be kept in the list of GUI element type specifications, which will be discussed in the following sections.

Keep a Dynamic List of GUI element instances

When you traverse your GUI elements for drawing you most probably want to output a list of elements with the following attributes

struct GuiElement
{
    mat2 rotationAndScale;
    vec2 translation;
    float zLayer;
    uint32_t globalElementID; //can stuff extra flags in the top bits here
}

Then sort them by zLayer and draw them in that order (optionally in reverse first for the prepass).

Your vertex shader can access this list as an SSBO or UBO using gl_DrawID, and use the first 3 items to transform the GUI element like this:

gl_Position = vec4(drawList[gl_DrawID].rotationAndScale*meshCurrentVertexPos+drawList[gl_DrawID].translation,drawList[gl_DrawID].zLayer,1.0);

It can also forward the globalElementID to the pixel shader as a native uint with a flat interpolation qualifier.

Because you should really be grouping your GUI elements into bigger sets, you will have both too little (to justify the overhead for a dispatch and a barrier) and too many (256 invocations in a group Compute Shader limit) entries in this array to justify sorting on the GPU.

You then (or most proably at the same time) want to turn that sorted list into a MultiDrawIndirect buffer for an indirect draw (you can get the element counts, baseVertex offsets, index byte offsets, etc. from the list of GUI element specifications)!

You want that indirect draw buffer to live in a persistently mapped coherent buffer in host memory

Keep a semi-static list of GUI element type specifications

Now that you've pooled your GUI element meshbuffers into a single mesh as sub-sections, you should most probably store the information about these in a semi-static buffer in two copies, one on the CPU in plain memory (so you can prepare and one on the GPU in device-local memory.

You need to store, at least:

uint16_t originalTextureSize[2]; // in pixels
uint16_t texturePoolingXYOffset[2]; // the xy offset (in pixels) in the texture array of the pooled element texture
uint16_t texturePoolingLayer; // the z offset (in layers) in the texture array of the pooled element
uint16_t texturePoolingMipOffset; // high level hack for pooling across mip-maps
SMeshBuffer pooledMeshSpec; // a reduced form of IMeshBuffer storing only information needed to create associated indirect draw, such as baseVertex, element (triangle x3) count, index buffer offset

This SSBO (or UBO if you can manage under 4k GUI element types) should be accessed by the fragment shader to get the correct portion of the GUI texture array, and it could be done like this:

ivec2 texCoord = elementSpec[elementType].originalTextureSize*originalVertexTexCoord+elementSpec[elementType].texturePoolingXYOffset;
vec4 texColor = texture(texID,vec3(vec2(texCoord)/textureSize(texID),elementSpec[elementType].texturePoolingLayer),float(elementSpec[elementType].texturePoolingMipOffset));

You will want to create "virtual" GUI element types which would be used to combine hordes of other elements into batches corresponding entire windows, paragraphs of texts etc.

Take for example text, if you generate text from font gui atlases, then you would most probably have a GUI element type per letter. Now if you wanted to render over 9000 characters of text, you would not want over 9000 separate GUI elements. Ideally you'd pre-bake that into a virtual GUI element with its own ID, with all the vertices pre-transformed and combined relative to the text-box.

You could create these with the CPU or the GPU (compute shader and some atomics), really doesn't matter.

Pool all GUI element types' bitmaps/images into one 2D texture array

Obviously because of what I mentioned before about needing to keep the IGraphicsPipeline object identical for all GUI elements, you can most probably expect atlasing in a texture array.

However note that I propose keeping the per-vetex texture coordinates and independent of and orthogonal to the actual placement of the texture in the texture array pool via the use of the first uint16_t variables in the aforementioned GUI element type specifications.

The cool thing about keeping this list semi-static and keeping the texture offsets separate is that you can change the location of the texture in the texture array pool without having to modify every vertex texture coordinate in the pooled vertices, this enables you the option defragmenting the texture pool should you create or remove GUI element types.

Or more interestingly, keeping it like that and in texel coordinates (not the usual 0 to 1 range) helps if you resize the texture array (which serves as the pool) in the X and Y directions.

Since GUI doesn't deal with minification and huge scaling, mipmapping is really redundant. Moreover mip-mapping usually looks bad (blurry) so usually you prepare your GUI bitmaps to match display scale 1:1 . This is why instead of placing smaller textures at the base level of the texture array pool, you can place them higher up in the mip-chain to save on space and gain some flexibility in your pooling (and why I explicitly mentioned texturePoolingMipOffset).

BONUS: I don't have MultiDrawIndirect (OpenGL 4+) or SSBO

Call your local museum, maybe they have some AMD Radeon HD 5000 series GPUs they will donate to you. Or if you're really desperate emulate MultiDrawIndirect in software (ARB_multi_draw_indirect extension spec page even tells you how to with pseudo code!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly