-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batching, instancing, and rendering performance stuff #765
Batching, instancing, and rendering performance stuff #765
Conversation
It's really cool to read through the explanation of some of these PRs that were a bit over my head :) Some feedback on this section:
Maybe consider splitting the bullet points into "stuff that has been demonstrated as a proof of concept, but needs work" and things that are possible but are still in the private / imagination stage, with some sort of clarifying text between? Also, there was a great effort in previous sections to lay out the webgl2 story, and maybe it would be nice to mention that here as well -- which of these potential improvements are applicable there? How much are we just banking on webgpu saving us before those land? How is webgpu availability coming along, anyway? I have those sorts of questions after reading the section.
Absolutely no idea what rend3 is or how it works and I suspect the average blog-reader is in a similar position. |
Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?
I could reword that to something like "a proof of concept implementation shows promising results" or similar.
Good idea.
I will clarify this, but noting the answer here: All of this is designed with WebGL2 support in mind and all the changes to avoid rebinding benefit WebGL2 as well. That said, WebGL2 stops at CPU-driven rendering. WebGPU and native can benefit from GPU-driven, although WebGPU does not have support for bindless texture arrays, nor does it have support for 'multi-draw indirect'. Bindless texture arrays can be 'replaced' to some degree by texture atlases in array textures. I will add a note about this. Multi-draw indirect can be replaced by many single-draw indirect, and indeed wgpu has to emulate this on some platforms (Metal) anyway. An indirect draw is one where the vertex offset/count, instance offset/count information is written to a GPU buffer (which is how it can be GPU-driven - a compute shader can write the values into a storage buffer) and then the CPU just says draw what it says at this offset in this buffer. WebGPU is already supported by the way. All I am talking about is optimisations and better leveraging what is available whether CPU-driven or enabling GPU-driven.
Fair point. I'll have a think about how to very very briefly express it. |
Maybe all of the above, haha! In particular, I feel like (haven't gone back and looked to check if my recollection matches reality) the necessity / results of reordering of the render sets was explained much better here than in that PR. And the case for alternative render stuff storage was laid out much more compellingly. I'd say that the hashmap section was the one my eyes glossed over the most. Maybe a bit too much detail there for me personally. But in general, I really appreciate laying out the case for users for these sorts of big changes so I don't really mind. The third paragraph of the sprite instancing section is still pretty unclear to me. I think it would be a good idea to define "instance-rate buffer" a bit better somewhere. |
:)
Great!
Images will come too, hopefully, that illustrate the rebinding frequency difference.
I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?
I could remove the some of that as too much information and just tl;dr it faster. |
Hmm yeah, maybe another opinion is in order, but I think you could leave out a lot of the edit: Although it is sometimes nice to get in an early hedge against the "why don't you just ____" crowd. |
Good suggestion to remove it. I mostly put
No arrogance intended - I think the amount of time and thought it took me to figure out how to rearchitect things for how Bevy works to be able to get to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, mostly fine? I'm just worried it's a bit too technical/disorganized. The differences split between the "general renderer perf improvements" and batching related perf improvements are a bit confusing, it feels like we're talking about the same thing twice. I feel like people will read the first section, and then get to the second and go "wait, I thought we already finished with the renderer perf improvements". Not sure what a better approach would be, and ultimately writing is hard, so /shrug.
Maybe a high level overview, just a bulleted list of perf improvements at the start, would be a good idea, before diving into specific changes.
Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: Alice Cecile <[email protected]>
Just pushed two commits with the goals of:
|
I also removed the "total 0.11 vs 0.12 performance difference" section and other placeholder content. We can follow up later with those if we find time. If I find time I also might re-do the graphs to use frame time instead of FPS (and to include values in the graphs). But for now I think this should be mergeable. |
No description provided.