Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove petgraph from bevy_ecs #15519

Merged

Conversation

bushrat011899
Copy link
Contributor

Objective

Solution

  • Removed petgraph as a dependency from the bevy_ecs crate.
  • Replaced TarjanScc and GraphMap with specialised in-tree alternatives.

Testing

  • Ran CI locally.
  • Added new unit tests to check ordering invariants.
  • Confirmed petgraph is no longer present in cargo tree -p bevy_ecs

Migration Guide

The Dag::graph method no longer returns a petgraph DiGraph and instead returns the new DiGraph type within bevy_ecs. Edge and node iteration methods are provided so conversion to the petgraph type should be trivial if required.

Notes

  • indexmap was already in the dependency graph for bevy_ecs, so its inclusion here makes no difference to compilation time for Bevy.
  • The implementation for Graph is heavily inspired from the petgraph original, with specialisations added to simplify and improve the type.
  • petgraph does have public plans for no_std support, however there is no timeframe on if or when that functionality will be available. Moving to an in-house solution in the interim allows Bevy to continue developing its no_std offerings and further explore alternate graphing options.

Replaced with specialised solution based on original types.
@bushrat011899 bushrat011899 added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events X-Contentious There are nontrivial implications that should be thought through D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Sep 29, 2024
@clarfonthey
Copy link
Contributor

You probably want to verify that petgraph isn't implicitly included by bevy_reflect for any crates either, assuming that ECS was the primary use case. I think that keeping the reflect impls there would be valuable (and maybe worth adding new ones for these new graph types), but it should remain an optional feature unused anywhere else.

@bushrat011899
Copy link
Contributor Author

You probably want to verify that petgraph isn't implicitly included by bevy_reflect for any crates either, assuming that ECS was the primary use case. I think that keeping the reflect impls there would be valuable (and maybe worth adding new ones for these new graph types), but it should remain an optional feature unused anywhere else.

I do agree, bevy_reflect will need to be updated to feature-gate the inclusion of petgraph. However, I'm not including that in this PR, since petgraph is still included explicitly by bevy_animation (and implicitly via naga). Since reflection is an optional feature my focus is on making the required parts of Bevy work in a no_std context.

Additionally, adding a Reflect implementation for Graph would be largely pointless, since NodeId doesn't implement it, and none of the types that contain a Graph (e.g., Dag) do either. I would personally like to restrict Graph to even be a private type, but Dag exposes it via a read-only method Dag::graph, so to avoid more substantive breaking changes I'm leaving the immutable methods public.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very pleased to see this trimmed down. I think there's a good chance we want a more prinicipled bevy_graph at some point (or just use systems-as-entities), but this is an excellent ECS code quality / maintainability change in its own right, independent of the no_std implications.

@alice-i-cecile
Copy link
Member

@DasLixou if you're around, I'd appreciate your eyes here :)

Copy link
Contributor

@DasLixou DasLixou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting change, I wonder how it performs and how much is really stripped away.
Also good to know that a bevy graph crate could need such fine grained specialization, that's something I had in mind, but I'm currently a bit "frustrated" at coding, no bevy graph in near time so this is cool to have :D

crates/bevy_ecs/src/schedule/graph/graph_map.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/graph/graph_map.rs Outdated Show resolved Hide resolved

iter.copied()
.map(CompactNodeIdAndDirection::load)
.filter_map(|(n, dir)| (!DIRECTED || dir == Outgoing).then_some(n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only use Outgoing edges for directed graphs? aren't neighbors also incoming edges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hangover from petgraph, but I believe the rationale is that a neighbour is something you can go to from your current node. So in a <-> b, both a and b are neighbours. Whereas, in a --> b, b is a neighbour of a, but not the other way around. I could change this, but I wanted to keep the API here as-close-as possible to petgraph to avoid subtle translation errors for contributors moving to the new type.

crates/bevy_ecs/src/schedule/graph/mod.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/graph/graph_map.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/schedule.rs Outdated Show resolved Hide resolved
bushrat011899 and others added 2 commits October 1, 2024 08:39
- Removed instances of `&NodeId` in function arguments since `NodeId` is copy and only two `usize`'s wide.
- Changed documentation references to `O(V)` to `O(N)` (nodes instead of verticies)

Co-Authored-By: Lixou <[email protected]>
@bushrat011899
Copy link
Contributor Author

Interesting change, I wonder how it performs and how much is really stripped away.

I haven't tested the performance myself (currently only have access to a pretty mediocre laptop!) but I suspect a marginal performance improvement at runtime, and a negligible compilation boost (I observed bevy_ecs compiling in 54 seconds instead of 57).

Packing the NodeId pairs and NodeId + Direction pairs might help with caching, since it saves 8 bytes per pair, but if it did then I'd imagine switching to SmallVec would help even more.

Also good to know that a bevy graph crate could need such fine grained specialization, that's something I had in mind, but I'm currently a bit "frustrated" at coding, no bevy graph in near time so this is cool to have :D

I am hoping that bringing this type in-tree and specialising to just empty edges and NodeId nodes inspires the ECS people to further optimise it for their uses. For example, I saw that topological sorting was fairly important to certain Schedule algorithms, perhaps embedding that sorting into the Graph type might help? Who knows!

Allows for a minimal-allocation iterator interface. Also switched to `SmallVec` to improve performance in sparse graphs (the more likely use-case)
@bushrat011899
Copy link
Contributor Author

Ok so I did some benchmarking and it appears that this change might've been more impactful than I had planned!

cargo bench -- schedule --load-baseline NoPetgraph --baseline Main
schedule/base           time:   [43.312 µs 44.248 µs 45.381 µs]
                        change: [+5.8590% +8.9380% +12.480%] (p = 0.00 < 0.05)
                        Performance has regressed.

build_schedule/100_schedule_noconstraints
                        time:   [593.91 µs 608.04 µs 623.50 µs]
                        change: [-10.527% -8.0888% -5.4824%] (p = 0.00 < 0.05)
                        Performance has improved.

build_schedule/100_schedule
                        time:   [7.9308 ms 8.0336 ms 8.1467 ms]
                        change: [-28.847% -27.477% -26.112%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/500_schedule_noconstraints
                        time:   [8.8864 ms 9.1508 ms 9.4297 ms]
                        change: [-6.0768% -1.7064% +2.8474%] (p = 0.45 > 0.05)
                        No change in performance detected.
                        
build_schedule/500_schedule
                        time:   [309.55 ms 315.09 ms 320.96 ms]
                        change: [-34.776% -33.453% -32.085%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/1000_schedule_noconstraints
                        time:   [35.563 ms 36.884 ms 38.279 ms]
                        change: [-0.7079% +4.0610% +8.9812%] (p = 0.11 > 0.05)
                        No change in performance detected.
                        
build_schedule/1000_schedule
                        time:   [1.7366 s 1.7451 s 1.7546 s]
                        change: [-37.218% -36.879% -36.460%] (p = 0.00 < 0.05)
                        Performance has improved.

run_empty_schedule/SingleThreaded
                        time:   [8.1788 ns 8.2399 ns 8.3024 ns]
                        change: [+10.090% +11.954% +13.710%] (p = 0.00 < 0.05)
                        Performance has regressed.
                        
run_empty_schedule/MultiThreaded
                        time:   [3.3104 ns 3.3396 ns 3.3715 ns]
                        change: [-1.0874% +0.5337% +1.9344%] (p = 0.49 > 0.05)
                        No change in performance detected.

run_empty_schedule/Simple
                        time:   [4.2122 ns 4.2494 ns 4.2901 ns]
                        change: [+5.3116% +6.4836% +7.6806%] (p = 0.00 < 0.05)
                        Performance has regressed.

For benchmarks that run in less a few milliseconds, I'd call that a wash (plus or minus 10% is within variance that I'd expect from the binary layout changing). However, for the build_schedule benchmarks (where the new Graph type would be most stress-tested), there's a 25% - 35% performance improvement. I tested this on a laptop so I wouldn't take the 30% number as gospel, but it certainly looks like there is a performance win here on top of the no_std benefits.

@bushrat011899 bushrat011899 added the C-Performance A change motivated by improving speed, memory usage or compile times label Oct 1, 2024
@bushrat011899
Copy link
Contributor Author

Did some additional testing on my desktop (Ryzen 5950X, Windows 10, 64GB or RAM) to see if the benchmark results were repeatable:

cargo bench -- schedule --load-baseline NoPetgraph --baseline Main
schedule/base           time:   [26.106 µs 26.267 µs 26.440 µs]
                        change: [-25.186% -22.469% -19.743%] (p = 0.00 < 0.05)
                        Performance has improved.

build_schedule/100_schedule_noconstraints
                        time:   [386.27 µs 388.67 µs 390.90 µs]
                        change: [-36.845% -35.554% -34.235%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/100_schedule
                        time:   [8.2987 ms 8.3541 ms 8.4162 ms]
                        change: [-47.137% -46.149% -45.162%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/500_schedule_noconstraints
                        time:   [8.2515 ms 8.3350 ms 8.4234 ms]
                        change: [-9.9522% -8.5799% -7.1712%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/500_schedule
                        time:   [371.61 ms 374.18 ms 376.73 ms]
                        change: [-37.420% -36.675% -35.913%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
build_schedule/1000_schedule_noconstraints
                        time:   [33.189 ms 33.378 ms 33.564 ms]
                        change: [+5.6402% +6.7365% +7.7713%] (p = 0.00 < 0.05)
                        Performance has regressed.
                        
build_schedule/1000_schedule
                        time:   [1.8785 s 1.8916 s 1.9049 s]
                        change: [-40.756% -39.568% -38.449%] (p = 0.00 < 0.05)
                        Performance has improved.

run_empty_schedule/SingleThreaded
                        time:   [8.6623 ns 8.6738 ns 8.6861 ns]
                        change: [-10.623% -8.9102% -7.5576%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
run_empty_schedule/MultiThreaded
                        time:   [4.9534 ns 4.9645 ns 4.9754 ns]
                        change: [-3.7699% -2.9636% -2.2897%] (p = 0.00 < 0.05)
                        Performance has improved.

run_empty_schedule/Simple
                        time:   [4.9958 ns 5.0017 ns 5.0080 ns]
                        change: [-10.705% -9.5771% -8.3263%] (p = 0.00 < 0.05)
                        Performance has improved.

Looks like that ~30% performance uplift is repeatable (in this case up to 45% even!), so I'm more confident in saying there's a fairly significant performance boost here. Looking at the build_schedule/x_schedule plots in particular:

build_schedule/100_schedule

relative_pdf_small

build_schedule/500_schedule

relative_pdf_small

build_schedule/1000_schedule

relative_pdf_small

As to why there's a performance uplift, I can think of a couple changes I made which might be the reason:

  1. Switched to storing SCCs in SmallVec<[NodeId; 4]> instead of Vec. Might help, but I don't think so since in these benchmarks the schedule graph is overly connected, so the small vector optimisation won't be used.
  2. Changed the Tarjan SCC algorithm to use looping rather than recursion. Might help since I'm explicitly storing only the stack variables needed for the algorithm rather than the entire function's scope?
  3. The new Graph type uses a HashSet rather than an IndexMap for the edges. Edge ordering isn't required for the algorithms the scheduling systems employs. I imagine it's easier to manage a HashSet when order isn't explicitly required.
  4. In the Graph type I store compacted versions of (NodeId, Direction) and (NodeId, NodeId) which save 8 bytes each. Could help with cache locality?

Aside from the above, I'm not sure what else could really affect performance this drastically. Aside from the TarjanScc algorithm, I haven't changed much in how this type works vs petgraph.

@hymm
Copy link
Contributor

hymm commented Oct 1, 2024

The new Graph type uses a HashSet rather than an IndexMap for the edges.

Will this effect the toposort? The current toposort isn't stable, but we do eventually want to make the topo sort stable. So users don't get their game randomly behaving differently in different cargo run's.

@bushrat011899
Copy link
Contributor Author

Will this effect the toposort? The current toposort isn't stable, but we do eventually want to make the topo sort stable. So users don't get their game randomly behaving differently in different cargo run's.

No, actually! The reason is the nodes are all sorted in an IndexMap, and their adjacency caches are still just Vec's. Methods that iterate over edges use the cached adjacency list rather than the master list of all edges. In fact, the only thing the edges HashSet is actually used for right now is checking if an edge exists in constant time.

@bushrat011899
Copy link
Contributor Author

And for completeness, there's no meaningful frame-time difference. Tested using the alien_cake_addict example:
image

crates/bevy_ecs/src/schedule/graph/tarjan_scc.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/graph/tarjan_scc.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/graph/tarjan_scc.rs Outdated Show resolved Hide resolved
bushrat011899 and others added 2 commits October 2, 2024 11:04
The `nocase` naming was a hangover from the `petgraph` migration.

Co-Authored-By: vero <[email protected]>
@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Oct 9, 2024
Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a super close review of the graph algorithms, but otherwise looks good.

@bushrat011899 bushrat011899 added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Oct 9, 2024
@alice-i-cecile
Copy link
Member

Waiting until 0.16 for stability reasons, but I will merge this.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 3, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 3, 2024
Merged via the queue into bevyengine:main with commit c9fa975 Dec 3, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it X-Contentious There are nontrivial implications that should be thought through
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants