Batching, instancing, and rendering performance stuff #765

superdump · 2023-10-16T15:23:39Z

No description provided.

rparrett · 2023-10-16T15:45:35Z

It's really cool to read through the explanation of some of these PRs that were a bit over my head :)

Some feedback on this section:

What is next for batching/instancing and beyond?

This is implemented on a branch. this phrasing seems potentially confusing. I know you mean that you or someone else has demonstrated this as a proof of concept (either privately or publicly) but I'm not sure if the blog-reader might assume this code is in the main branch, or there's an existing PR, or what.

Maybe consider splitting the bullet points into "stuff that has been demonstrated as a proof of concept, but needs work" and things that are possible but are still in the private / imagination stage, with some sort of clarifying text between?

Also, there was a great effort in previous sections to lay out the webgl2 story, and maybe it would be nice to mention that here as well -- which of these potential improvements are applicable there? How much are we just banking on webgpu saving us before those land? How is webgpu availability coming along, anyway? I have those sorts of questions after reading the section.

Rob Swain (@superdump) intends to implement an alternative method similar to what is done in rend3.

Absolutely no idea what rend3 is or how it works and I suspect the average blog-reader is in a similar position.

content/news/2023-10-21-bevy-0.12/index.md

superdump · 2023-10-16T18:27:09Z

It's really cool to read through the explanation of some of these PRs that were a bit over my head :)

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Some feedback on this section:

What is next for batching/instancing and beyond?

This is implemented on a branch. this phrasing seems potentially confusing. I know you mean that you or someone else has demonstrated this as a proof of concept (either privately or publicly) but I'm not sure if the blog-reader might assume this code is in the main branch, or there's an existing PR, or what.

I could reword that to something like "a proof of concept implementation shows promising results" or similar.

Maybe consider splitting the bullet points into "stuff that has been demonstrated as a proof of concept, but needs work" and things that are possible but are still in the private / imagination stage, with some sort of clarifying text between?

Good idea.

Also, there was a great effort in previous sections to lay out the webgl2 story, and maybe it would be nice to mention that here as well -- which of these potential improvements are applicable there? How much are we just banking on webgpu saving us before those land? How is webgpu availability coming along, anyway? I have those sorts of questions after reading the section.

I will clarify this, but noting the answer here: All of this is designed with WebGL2 support in mind and all the changes to avoid rebinding benefit WebGL2 as well. That said, WebGL2 stops at CPU-driven rendering. WebGPU and native can benefit from GPU-driven, although WebGPU does not have support for bindless texture arrays, nor does it have support for 'multi-draw indirect'. Bindless texture arrays can be 'replaced' to some degree by texture atlases in array textures. I will add a note about this. Multi-draw indirect can be replaced by many single-draw indirect, and indeed wgpu has to emulate this on some platforms (Metal) anyway. An indirect draw is one where the vertex offset/count, instance offset/count information is written to a GPU buffer (which is how it can be GPU-driven - a compute shader can write the values into a storage buffer) and then the CPU just says draw what it says at this offset in this buffer. WebGPU is already supported by the way. All I am talking about is optimisations and better leveraging what is available whether CPU-driven or enabling GPU-driven.

Rob Swain (@superdump) intends to implement an alternative method similar to what is done in rend3.

Absolutely no idea what rend3 is or how it works and I suspect the average blog-reader is in a similar position.

Fair point. I'll have a think about how to very very briefly express it.

rparrett · 2023-10-16T18:47:01Z

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Maybe all of the above, haha!

In particular, I feel like (haven't gone back and looked to check if my recollection matches reality) the necessity / results of reordering of the render sets was explained much better here than in that PR.

And the case for alternative render stuff storage was laid out much more compellingly.

I'd say that the hashmap section was the one my eyes glossed over the most. Maybe a bit too much detail there for me personally. But in general, I really appreciate laying out the case for users for these sorts of big changes so I don't really mind.

The third paragraph of the sprite instancing section is still pretty unclear to me. I think it would be a good idea to define "instance-rate buffer" a bit better somewhere.

superdump · 2023-10-16T19:02:47Z

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Maybe all of the above, haha!

:)

In particular, I feel like (haven't gone back and looked to check if my recollection matches reality) the necessity / results of reordering of the render sets was explained much better here than in that PR.

Great!

And the case for alternative render stuff storage was laid out much more compellingly.

Images will come too, hopefully, that illustrate the rebinding frequency difference.

I'd say that the hashmap section was the one my eyes glossed over the most. Maybe a bit too much detail there for me personally. But in general, I really appreciate laying out the case for users for these sorts of big changes so I don't really mind.

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

The third paragraph of the sprite instancing section is still pretty unclear to me. I think it would be a good idea to define "instance-rate buffer" a bit better somewhere.

I could remove the some of that as too much information and just tl;dr it faster.

rparrett · 2023-10-16T19:23:35Z

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

Hmm yeah, maybe another opinion is in order, but I think you could leave out a lot of the Vec<T> content and summarize it in a "future work" paragraph or something.

edit: Although it is sometimes nice to get in an early hedge against the "why don't you just ____" crowd.

superdump · 2023-10-16T21:06:12Z

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

Hmm yeah, maybe another opinion is in order, but I think you could leave out a lot of the Vec<T> content and summarize it in a "future work" paragraph or something.

Good suggestion to remove it. I mostly put Vec<T> in as a sort of "What is the best we can hope for? And why can't we use it?" But I'll skip all that and just say that some options were evaluated and we landed on EntityHashMap, that is a HashMap with a very fast hash function and that has good worst case behaviour.

edit: Although it is sometimes nice to get in an early hedge against the "why don't you just ____" crowd.

No arrogance intended - I think the amount of time and thought it took me to figure out how to rearchitect things for how Bevy works to be able to get to Vec<T> makes it somewhat unlikely that anyone would flippantly say "why don't you just." :) It's possible, but in that case I would probably not put too much weight in their perception of its obviousness. :)

JMS55

Overall, mostly fine? I'm just worried it's a bit too technical/disorganized. The differences split between the "general renderer perf improvements" and batching related perf improvements are a bit confusing, it feels like we're talking about the same thing twice. I feel like people will read the first section, and then get to the second and go "wait, I thought we already finished with the renderer perf improvements". Not sure what a better approach would be, and ultimately writing is hard, so /shrug.

Maybe a high level overview, just a bulleted list of perf improvements at the start, would be a good idea, before diving into specific changes.

content/news/2023-10-21-bevy-0.12/index.md

Co-authored-by: Alice Cecile <[email protected]>

cart · 2023-11-01T02:16:42Z

Just pushed two commits with the goals of:

Slimming down, especially on the technical details and "details about future work".
Moving relevant bits to the "Whats Next" section.
Reordering by "precedence / interest"
Adding clarity + easing people into the details
Adding documentation links

cart · 2023-11-01T02:20:46Z

I also removed the "total 0.11 vs 0.12 performance difference" section and other placeholder content. We can follow up later with those if we find time.

If I find time I also might re-do the graphs to use frame time instead of FPS (and to include values in the graphs). But for now I think this should be mergeable.

content/news/2023-10-21-bevy-0.12/index.md

superdump added 3 commits October 16, 2023 14:52

WIP rendering performance news

b5c5384

More

057fc3e

More

a3da578

rparrett reviewed Oct 16, 2023

View reviewed changes

content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved

superdump added 2 commits October 17, 2023 03:50

Feedback and more

86ca41a

More

27a9483

JMS55 self-requested a review October 17, 2023 20:03

Diagrams and benchmark graphs

5126d43

JMS55 suggested changes Oct 19, 2023

View reviewed changes

content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved

cart reviewed Oct 27, 2023

View reviewed changes

alice-i-cecile reviewed Oct 27, 2023

View reviewed changes