-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bench for backtracking #281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look forward to seeing these numbers improve!
bed4e71
to
c6db4bc
Compare
Can we improve prioritization to avoid going into the pathological case? The following bad hack already gives a 1.5x speedup, could we avoid reassigning all the other packages after (bad) solana versions by improving fn prioritize(
&mut self,
package: Package,
range: VersionSet,
arena: &PackageArena<Self::P>,
) -> Self::Priority {
let mut x = ((range.count() as u64) << 32) + package.get() as u64;
if !arena.pkg(package).unwrap().to_string().contains("solana") {
x = x + 1 << 62
}
Reverse(x)
}
Re Eh2406/pubgrub-crates-benchmark#6 (comment): In the backtracking itself, why don't we learn the incompatibilities, shouldn't we be able to memoize that only a certain range works and skip over it in the next attempt? For Eh2406/pubgrub-crates-benchmark#6 (comment), what's the representation for features and virtual packages, can we improve that? In uv, we found a backtracking bug that was fixed by improving the ordering and bounds around feature packages. For adding large, generated files to the repository, we should add information on how they were generated and how they should be interpreted: As a non-cargo dev, I currently see only a large tree of numbers, and can't tell how they should be resolved or what it means if we're taking a certain path through the packages and versions. |
Yes it should be possible, but we need to find a generic method without knowing the shape of the dependency tree. Maybe we can compute the current tree depth and prioritize packages with smaller depths ?
These examples are using the current resolver of cargo, which doesn't use pubgrub yet.
I updated the benchmark which is now synthetic and not based on cargo anymore (see #281 (comment)). |
There's a big space of possibilities here! For inspiration, here's uv's heuristic: https://github.com/astral-sh/uv/blob/14507a1793f12bf884786be49d716ca8f8322340/crates/uv-resolver/src/pubgrub/priority.rs. We pick more tightly constrained packages first. Another option is to prioritize package that caused conflicts earlier (by counting conflicts). I'm sure @Eh2406 has good ideas specifically for cargo, too. |
Given that the benchmark is synthetic, can we generate it on-the-fly in the bench code? This would also make documenting it easier, together with how this was extracted from the real solana crate case. (Large generated files in the repo, especially when we have different types of syntaxes for benchmarks, tend to become a maintenance burden). Generally, with more context on the change and more documentation on the feature, i could give you a more helpful review here. |
I updated the PR: the bench data is now generated in the code. |
cc1a9dc
to
1d01141
Compare
Thanks, that's much easier to follow! Can you add some docstrings explaining the benchmarks functions? Both what package structure they build, and how this relates to the real world case we are modelling here, like telling they story around how we get from trying to fix the slowest cargo-pubgrub cases to this small function. As a maintainer, it's important to have context on the changes, so we can reason about future change to this code. Imagine someone with no cargo background sees changes in these benchmarks from their pubgrub changes, and tries to make sense of what they mean. Or the other way round, someone finds a way to speed up these benchmarks, can they be confident that this does indeed improve solana or have they accidentally just hacked the benchmark? |
These new benchmarks are not specific to the cargo implementation nor a specific crate resolution: they describe general patterns which can cause backtracking, so they should be useful even for |
I think it's good to have such benchmarks (they are definitely way more realistic than sudoku dummies), but in general we should aim at benchmarks that represent real world use cases. While pubgrub is a SAT solver technically, packaging often follows very specific patterns and often there are only a few hard patterns per ecosystem, which we should try to capture in benchmarks. See e.g. this talk about the perf oriented js package manager oregene: https://youtu.be/eh3VME3opnE?si=vuUuRkJJVsoSOqWZ&t=323. In uv, this was notably #135, other representative slow example these were boto3/botocore/urllib (backtracking over two packages, largely a prefetching problem plus also #135), apache-airflow (here package prioritization mattered a lot) and transformers with all extras (a huge dep tree). We optimized on bio-embeddings, too, which was a somewhat artificial case, but exercised a lot of the problems in resolving scientic/ML projects, including transformers. It allowed profiling for #177, which in turn unblocked astral-sh/uv#2452. In cargo, solana is the exceptionally difficult case, which seems to follow a pattern we don't observe in uv: Even the pathological bio-embeddings benchmark now takes only 3s on my machine, and i don't know any real package that would even take 1s with cached metadata on my machine. Again, I'm not trying to say that these benchmarks are bad (we should have diverse benchmark coverage!), but I want to explain why sharing context and adding documentation around benchmarks is important. |
I added some docstrings to explain how these benchmarks were derived from a real cargo resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like reasonable things for our project to keep an eye on.
Part of #274.
Add a new bench to follow backtracking performance.