Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster small table/archetype into single Task in parallel iteration #12846

Merged
merged 10 commits into from
Apr 4, 2024

Conversation

re0312
Copy link
Contributor

@re0312 re0312 commented Apr 2, 2024

Objective

Solution

  • collect small storage into one task

@pablo-lua pablo-lua added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Apr 2, 2024
@alice-i-cecile alice-i-cecile requested a review from james7132 April 2, 2024 13:22
@alice-i-cecile
Copy link
Member

Looks sensible, but I'd really like to see benchmarks on the impact of this change.

@alice-i-cecile alice-i-cecile added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Apr 2, 2024
@re0312
Copy link
Contributor Author

re0312 commented Apr 2, 2024

Looks sensible, but I'd really like to see benchmarks on the impact of this change.

Added benchmark for fragment par-iter
image

have slight loss without fragment for par_iter , but a huge gain when fragments exceeds available threads.

I also test many_foxes (which is not friendly for this PR's changes), it still showed a slight performance improvement. , the most significant performance gains came from the check_visibility
image

@alice-i-cecile alice-i-cecile added this to the 0.14 milestone Apr 2, 2024
@alice-i-cecile alice-i-cecile removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Apr 2, 2024
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned what this will do with hierarchical parallel queries (i.e. transform propagation), where the top level parallel query has only a few values.

Otherwise looks good! Just one bit about allocation.

let table = &tables[table_id];
if table.is_empty() {
continue;
let mut batch_queue = vec![];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of needing to allocate a growable vec repeatedly. Could you try using arrayvec, with a reasonable max queue size (128 seems more than reasonable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark using arrayvec

image

crates/bevy_ecs/Cargo.toml Outdated Show resolved Hide resolved
@james7132 james7132 requested a review from SkiFire13 April 4, 2024 04:19
@james7132 james7132 added this pull request to the merge queue Apr 4, 2024
Merged via the queue into bevyengine:main with commit 4ca8cf5 Apr 4, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster small archetypes in parallel iteration
5 participants