Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching, instancing, and rendering performance stuff #765

Merged
merged 14 commits into from
Nov 1, 2023

Conversation

superdump
Copy link
Contributor

No description provided.

@rparrett
Copy link
Contributor

It's really cool to read through the explanation of some of these PRs that were a bit over my head :)

Some feedback on this section:

What is next for batching/instancing and beyond?

This is implemented on a branch. this phrasing seems potentially confusing. I know you mean that you or someone else has demonstrated this as a proof of concept (either privately or publicly) but I'm not sure if the blog-reader might assume this code is in the main branch, or there's an existing PR, or what.

Maybe consider splitting the bullet points into "stuff that has been demonstrated as a proof of concept, but needs work" and things that are possible but are still in the private / imagination stage, with some sort of clarifying text between?

Also, there was a great effort in previous sections to lay out the webgl2 story, and maybe it would be nice to mention that here as well -- which of these potential improvements are applicable there? How much are we just banking on webgpu saving us before those land? How is webgpu availability coming along, anyway? I have those sorts of questions after reading the section.

Rob Swain (@superdump) intends to implement an alternative method similar to what is done in rend3.

Absolutely no idea what rend3 is or how it works and I suspect the average blog-reader is in a similar position.

@superdump
Copy link
Contributor Author

It's really cool to read through the explanation of some of these PRs that were a bit over my head :)

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Some feedback on this section:

What is next for batching/instancing and beyond?

This is implemented on a branch. this phrasing seems potentially confusing. I know you mean that you or someone else has demonstrated this as a proof of concept (either privately or publicly) but I'm not sure if the blog-reader might assume this code is in the main branch, or there's an existing PR, or what.

I could reword that to something like "a proof of concept implementation shows promising results" or similar.

Maybe consider splitting the bullet points into "stuff that has been demonstrated as a proof of concept, but needs work" and things that are possible but are still in the private / imagination stage, with some sort of clarifying text between?

Good idea.

Also, there was a great effort in previous sections to lay out the webgl2 story, and maybe it would be nice to mention that here as well -- which of these potential improvements are applicable there? How much are we just banking on webgpu saving us before those land? How is webgpu availability coming along, anyway? I have those sorts of questions after reading the section.

I will clarify this, but noting the answer here: All of this is designed with WebGL2 support in mind and all the changes to avoid rebinding benefit WebGL2 as well. That said, WebGL2 stops at CPU-driven rendering. WebGPU and native can benefit from GPU-driven, although WebGPU does not have support for bindless texture arrays, nor does it have support for 'multi-draw indirect'. Bindless texture arrays can be 'replaced' to some degree by texture atlases in array textures. I will add a note about this. Multi-draw indirect can be replaced by many single-draw indirect, and indeed wgpu has to emulate this on some platforms (Metal) anyway. An indirect draw is one where the vertex offset/count, instance offset/count information is written to a GPU buffer (which is how it can be GPU-driven - a compute shader can write the values into a storage buffer) and then the CPU just says draw what it says at this offset in this buffer. WebGPU is already supported by the way. All I am talking about is optimisations and better leveraging what is available whether CPU-driven or enabling GPU-driven.

Rob Swain (@superdump) intends to implement an alternative method similar to what is done in rend3.

Absolutely no idea what rend3 is or how it works and I suspect the average blog-reader is in a similar position.

Fair point. I'll have a think about how to very very briefly express it.

@rparrett
Copy link
Contributor

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Maybe all of the above, haha!

In particular, I feel like (haven't gone back and looked to check if my recollection matches reality) the necessity / results of reordering of the render sets was explained much better here than in that PR.

And the case for alternative render stuff storage was laid out much more compellingly.

I'd say that the hashmap section was the one my eyes glossed over the most. Maybe a bit too much detail there for me personally. But in general, I really appreciate laying out the case for users for these sorts of big changes so I don't really mind.

The third paragraph of the sprite instancing section is still pretty unclear to me. I think it would be a good idea to define "instance-rate buffer" a bit better somewhere.

@superdump
Copy link
Contributor Author

Are these descriptions easier to understand? Or is it still over your head? Does it feel like too much detail?

Maybe all of the above, haha!

:)

In particular, I feel like (haven't gone back and looked to check if my recollection matches reality) the necessity / results of reordering of the render sets was explained much better here than in that PR.

Great!

And the case for alternative render stuff storage was laid out much more compellingly.

Images will come too, hopefully, that illustrate the rebinding frequency difference.

I'd say that the hashmap section was the one my eyes glossed over the most. Maybe a bit too much detail there for me personally. But in general, I really appreciate laying out the case for users for these sorts of big changes so I don't really mind.

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

The third paragraph of the sprite instancing section is still pretty unclear to me. I think it would be a good idea to define "instance-rate buffer" a bit better somewhere.

I could remove the some of that as too much information and just tl;dr it faster.

@rparrett
Copy link
Contributor

rparrett commented Oct 16, 2023

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

Hmm yeah, maybe another opinion is in order, but I think you could leave out a lot of the Vec<T> content and summarize it in a "future work" paragraph or something.

edit: Although it is sometimes nice to get in an early hedge against the "why don't you just ____" crowd.

@superdump
Copy link
Contributor Author

I very strongly suspected that as I wrote it. I could probably cut that one down to focus on just avoiding the table moves and this is the solution, rather than discussing all the details of how EntityHashMap was chosen. What do you think?

Hmm yeah, maybe another opinion is in order, but I think you could leave out a lot of the Vec<T> content and summarize it in a "future work" paragraph or something.

Good suggestion to remove it. I mostly put Vec<T> in as a sort of "What is the best we can hope for? And why can't we use it?" But I'll skip all that and just say that some options were evaluated and we landed on EntityHashMap, that is a HashMap with a very fast hash function and that has good worst case behaviour.

edit: Although it is sometimes nice to get in an early hedge against the "why don't you just ____" crowd.

No arrogance intended - I think the amount of time and thought it took me to figure out how to rearchitect things for how Bevy works to be able to get to Vec<T> makes it somewhat unlikely that anyone would flippantly say "why don't you just." :) It's possible, but in that case I would probably not put too much weight in their perception of its obviousness. :)

@JMS55 JMS55 self-requested a review October 17, 2023 20:03
Copy link
Contributor

@JMS55 JMS55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, mostly fine? I'm just worried it's a bit too technical/disorganized. The differences split between the "general renderer perf improvements" and batching related perf improvements are a bit confusing, it feels like we're talking about the same thing twice. I feel like people will read the first section, and then get to the second and go "wait, I thought we already finished with the renderer perf improvements". Not sure what a better approach would be, and ultimately writing is hard, so /shrug.

Maybe a high level overview, just a bulleted list of perf improvements at the start, would be a good idea, before diving into specific changes.

content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
content/news/2023-10-21-bevy-0.12/index.md Outdated Show resolved Hide resolved
@cart
Copy link
Member

cart commented Nov 1, 2023

Just pushed two commits with the goals of:

  1. Slimming down, especially on the technical details and "details about future work".
  2. Moving relevant bits to the "Whats Next" section.
  3. Reordering by "precedence / interest"
  4. Adding clarity + easing people into the details
  5. Adding documentation links

@cart
Copy link
Member

cart commented Nov 1, 2023

I also removed the "total 0.11 vs 0.12 performance difference" section and other placeholder content. We can follow up later with those if we find time.

If I find time I also might re-do the graphs to use frame time instead of FPS (and to include values in the graphs). But for now I think this should be mergeable.

@cart cart marked this pull request as ready for review November 1, 2023 19:10
@cart cart merged commit 32ef6b0 into bevyengine:release-0.12.0 Nov 1, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants