Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro-optimize queue_material_meshes, primarily to remove bit manipulation. #12791

Merged

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 29, 2024

This commit makes the following optimizations:

MeshPipelineKey/BaseMeshPipelineKey split

MeshPipelineKey has been split into BaseMeshPipelineKey, which lives in bevy_render and MeshPipelineKey, which lives in bevy_pbr. Conceptually, BaseMeshPipelineKey is a superclass of MeshPipelineKey. For BaseMeshPipelineKey, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for MeshPipelineKey, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a MeshPipelineKey.

specialize_slow

Previously, all of specialize() was marked as #[inline]. This bloated queue_material_meshes unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to specialize_slow().

Together, these two changes shave about 5% off queue_material_meshes:

Screenshot 2024-03-29 130002

Migration Guide

  • The primitive_topology field on GpuMesh is now an accessor method: GpuMesh::primitive_topology().
  • For performance reasons, MeshPipelineKey has been split into BaseMeshPipelineKey, which lives in bevy_render, and MeshPipelineKey, which lives in bevy_pbr. These two should be combined with bitwise-or to produce the final MeshPipelineKey.

manipulation.

This commit makes the following optimizations:

`MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives
in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`.
Conceptually, `BaseMeshPipelineKey` is a superclass of
`MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the
highest (most significant) bit and grow downward toward the lowest bit;
for `MeshPipelineKey`, the bits start at the lowest bit and grow upward
toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline
key for every mesh every frame. Instead, we can just use a bitwise or
operation to combine the pieces that make up a `MeshPipelineKey`.

Previously, all of `specialize()` was marked as `#[inline]`. This
bloated `queue_material_meshes` unnecessarily, as most of it is a slow
path that's rarely hit. This commit refactors the function to move the
slow path to `specialize_slow()`.

Together, these two changes shave about 5% off `queue_material_meshes`.
@pcwalton pcwalton requested review from superdump and james7132 March 29, 2024 20:03
@ItsDoot ItsDoot added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Mar 29, 2024
///
/// These are precalculated so that we can just "or" them together in
/// [`queue_material_meshes`].
pub mesh_pipeline_key_bits: MeshPipelineKey,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this hard-tie the material system to meshes? Or should we generalize MeshPipelineKey?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be a lot of untangling that would need to happen if we wanted to have meshes that didn't have materials, or materials that didn't have meshes. This doesn't really make things worse.

Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the aforementioned comments.

crates/bevy_render/src/mesh/mesh/mod.rs Outdated Show resolved Hide resolved
@pcwalton pcwalton added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Apr 1, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Apr 1, 2024
Merged via the queue into bevyengine:main with commit 37522fd Apr 1, 2024
32 checks passed
chompaa pushed a commit to chompaa/bevy that referenced this pull request Apr 5, 2024
…ulation. (bevyengine#12791)

This commit makes the following optimizations:

## `MeshPipelineKey`/`BaseMeshPipelineKey` split

`MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives
in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`.
Conceptually, `BaseMeshPipelineKey` is a superclass of
`MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the
highest (most significant) bit and grow downward toward the lowest bit;
for `MeshPipelineKey`, the bits start at the lowest bit and grow upward
toward the highest bit. This prevents them from colliding.

The goal of this is to avoid having to reassemble bits of the pipeline
key for every mesh every frame. Instead, we can just use a bitwise or
operation to combine the pieces that make up a `MeshPipelineKey`.

## `specialize_slow`

Previously, all of `specialize()` was marked as `#[inline]`. This
bloated `queue_material_meshes` unnecessarily, as a large chunk of it
ended up being a slow path that was rarely hit. This commit refactors
the function to move the slow path to `specialize_slow()`.

Together, these two changes shave about 5% off `queue_material_meshes`:

![Screenshot 2024-03-29
130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2)

## Migration Guide

- The `primitive_topology` field on `GpuMesh` is now an accessor method:
`GpuMesh::primitive_topology()`.
- For performance reasons, `MeshPipelineKey` has been split into
`BaseMeshPipelineKey`, which lives in `bevy_render`, and
`MeshPipelineKey`, which lives in `bevy_pbr`. These two should be
combined with bitwise-or to produce the final `MeshPipelineKey`.
github-merge-queue bot pushed a commit that referenced this pull request Apr 5, 2024
# Objective

- #12791 broke example `irradiance_volumes`
- Fixes #12876 

```
wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
      note: label = `pbr_opaque_mesh_pipeline`
    Color state [0] is invalid
    Sample count 8 is not supported by format Rgba8UnormSrgb on this device. The WebGPU spec guarentees [1, 4] samples are supported by this format. With the TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES feature your device supports [1, 2, 4].
```

## Solution

- Shift bits a bit more
ChristopherBiscardi added a commit to ChristopherBiscardi/bevy_ecs_tilemap that referenced this pull request Jun 7, 2024
[12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology`
rparrett added a commit to StarArawn/bevy_ecs_tilemap that referenced this pull request Jul 5, 2024
* Update to 0.14.0-rc.2

* [12997](bevyengine/bevy#12997): rename `multi-threaded` to `multi_threaded`

* RenderAssets<Image> is now RenderAssets<GpuImage>

Implemented in [12827](bevyengine/bevy#12827)

* FloatOrd is now in bevy_math

implemented in [12732](bevyengine/bevy#12732)

* convert Transparent2d::dynamic_offset to extra_index

[12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously

* RenderPhase<Transparent2d> -> ViewSortedRenderPhases<Transparent2d>

[12453](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy#12453): Render phases are now binned or sorted.

Following the changes in the `mesh2d_manual` [example](https://github.com/bevyengine/bevy/blob/ecdd1624f302c5f71aaed95b0984cbbecf8880b7/examples/2d/mesh2d_manual.rs#L357-L358): use the `ViewSortedRenderPhases` resource.

* get_sub_app_mut is now an Option

in [9202](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy/pull/9202) SubApp access has changed

* GpuImage::size f32 -> u32 via UVec2

[11698](bevyengine/bevy#11698) changed `GpuImage::size` to `UVec2`.

Right above this, `Extent3d` does the same thing, so I'm taking a small leap and assuming can `as`.

* GpuMesh::primitive_topology -> key_bits/BaseMeshPipeline

[12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology`

* RenderChunk2d::prepare requires &mut MeshVertexBufferLayouts now

[12216](bevyengine/bevy#12216) introduced an argument `&mut MeshVertexBufferLayouts` to `get_mesh_vertex_buffer_layout`, which bevy_ecs_tilemap calls in `RenderChunk2d::prepare`

* into_linear_f32 -> color.0.linear().to_f32_array(),

[12163](bevyengine/bevy#12163) bevy_color was created and Color handling has changed. Specifically Color::as_linear_rgba_f32 has been removed.

LinearRgba is now its own type that can be accessed via [`linear()`](https://docs.rs/bevy/0.14.0-rc.2/bevy/color/enum.Color.html#method.linear) and then converted.

* Must specify type of VisibleEntities when accessing

[12582](bevyengine/bevy#12582) divided `VisibleEntities` into separate lists. So now we have to specify which kind of entity we want. I think we want the Mesh here, and I think we can get rid of the `.index` calls on Entity since Entity [already compares bits](https://docs.rs/bevy_ecs/0.14.0-rc.2/src/bevy_ecs/entity/mod.rs.html#173) for optimized codegen purposes. Waiting to do that until the other changes are in though so as to not change functionality until post-upgrade.

* app.world access is functions now

- [9202](bevyengine/bevy#9202) changed world access to functions. [relevent line](https://github.com/bevyengine/bevy/pull/9202/files#diff-b2fba3a0c86e496085ce7f0e3f1de5960cb754c7d215ed0f087aa556e529f97fR640)
- This also surfaced [12655](bevyengine/bevy#12655) which removed `Into<AssetId<T>>` for `Handle<T>`. using a reference or .id() is the solution here.

* We don't need `World::cell`, and it doesn't exist anymore

In [12551](bevyengine/bevy#12551) `WorldCell` was removed.

...but it turns out we don't need it or its replacement anyway.

* examples error out unless this bevy bug is addressed with these features being added

bevyengine/bevy#13728

* check_visibility is required for the entity that is renderable

As a result of [12582](bevyengine/bevy#12582) `check_visibility` must be implemented for the "renderable" tilemap entities. Doing this is trivial by taking advantage of the
existing `check_visibility` type arguments, which accept a [`QF: QueryFilter + 'static`](https://docs.rs/bevy/0.14.0-rc.2/bevy/render/view/fn.check_visibility.html).

The same `QueryFilter`` is used when checking `VisibleEntities`. I've chosen `With<TilemapRenderSettings` because presumably if the entity doesn't have a `TilemapRenderSettings` then it will not be rendering, but this could be as sophisticated or simple as we want.

For example `WithLight` is currently implemented as

```rust
pub type WithLight = Or<(With<PointLight>, With<SpotLight>, With<DirectionalLight>)>;
```

* view.view_proj -> view.clip_from_world

[13289](bevyengine/bevy#13489) introduced matrix naming changes, including `view_proj` which becomes `clip_from_world`

* color changes to make tests runnable

* clippy fix

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* final clippy fixes

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* Simplify async loading in ldtk/tiled helpers

See Bevy #12550

* remove second allow lint

* rc.3 bump

* bump version for major release

* remove unused features

---------

Co-authored-by: Rob Parrett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants