Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bench for backtracking #281

Merged
merged 1 commit into from
Nov 18, 2024
Merged

Add bench for backtracking #281

merged 1 commit into from
Nov 18, 2024

Conversation

x-hgg-x
Copy link
Contributor

@x-hgg-x x-hgg-x commented Nov 8, 2024

Part of #274.

Add a new bench to follow backtracking performance.

Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to seeing these numbers improve!

benches/backtracking.rs Outdated Show resolved Hide resolved
benches/backtracking.rs Outdated Show resolved Hide resolved
test-examples/backtracking.ron Outdated Show resolved Hide resolved
@x-hgg-x x-hgg-x force-pushed the new-bench branch 3 times, most recently from bed4e71 to c6db4bc Compare November 10, 2024 08:59
@konstin
Copy link
Member

konstin commented Nov 10, 2024

Can we improve prioritization to avoid going into the pathological case? The following bad hack already gives a 1.5x speedup, could we avoid reassigning all the other packages after (bad) solana versions by improving prioritize?

    fn prioritize(
        &mut self,
        package: Package,
        range: VersionSet,
        arena: &PackageArena<Self::P>,
    ) -> Self::Priority {
        let mut x = ((range.count() as u64) << 32) + package.get() as u64;
        if !arena.pkg(package).unwrap().to_string().contains("solana") {
            x = x + 1 << 62
        }
        Reverse(x)
    }
hyperfine --runs 3 "./pubgrub-bench-base" "./pubgrub-bench-prio"
Benchmark 1: ./pubgrub-bench-base
  Time (mean ± σ):      4.013 s ±  0.003 s    [User: 3.907 s, System: 0.104 s]
  Range (min … max):    4.011 s …  4.017 s    3 runs
 
Benchmark 2: ./pubgrub-bench-prio
  Time (mean ± σ):      2.643 s ±  0.004 s    [User: 2.500 s, System: 0.142 s]
  Range (min … max):    2.639 s …  2.647 s    3 runs
 
Summary
  ./pubgrub-bench-prio ran
    1.52 ± 0.00 times faster than ./pubgrub-bench-base

Re Eh2406/pubgrub-crates-benchmark#6 (comment): In the backtracking itself, why don't we learn the incompatibilities, shouldn't we be able to memoize that only a certain range works and skip over it in the next attempt? For Eh2406/pubgrub-crates-benchmark#6 (comment), what's the representation for features and virtual packages, can we improve that? In uv, we found a backtracking bug that was fixed by improving the ordering and bounds around feature packages.

For adding large, generated files to the repository, we should add information on how they were generated and how they should be interpreted: As a non-cargo dev, I currently see only a large tree of numbers, and can't tell how they should be resolved or what it means if we're taking a certain path through the packages and versions.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 10, 2024

Can we improve prioritization to avoid going into the pathological case?

Yes it should be possible, but we need to find a generic method without knowing the shape of the dependency tree.

Maybe we can compute the current tree depth and prioritize packages with smaller depths ?

Re Eh2406/pubgrub-crates-benchmark#6 (comment): In the backtracking itself, why don't we learn the incompatibilities, shouldn't we be able to memoize that only a certain range works and skip over it in the next attempt? For Eh2406/pubgrub-crates-benchmark#6 (comment), what's the representation for features and virtual packages, can we improve that?

These examples are using the current resolver of cargo, which doesn't use pubgrub yet.

For adding large, generated files to the repository, we should add information on how they were generated and how they should be interpreted.

I updated the benchmark which is now synthetic and not based on cargo anymore (see #281 (comment)).

@konstin
Copy link
Member

konstin commented Nov 10, 2024

Yes it should be possible, but we need to find a generic method without knowing the shape of the dependency tree.

Maybe we can compute the current tree depth and prioritize packages with smaller depths ?

There's a big space of possibilities here! For inspiration, here's uv's heuristic: https://github.com/astral-sh/uv/blob/14507a1793f12bf884786be49d716ca8f8322340/crates/uv-resolver/src/pubgrub/priority.rs. We pick more tightly constrained packages first. Another option is to prioritize package that caused conflicts earlier (by counting conflicts). I'm sure @Eh2406 has good ideas specifically for cargo, too.

@konstin
Copy link
Member

konstin commented Nov 12, 2024

Given that the benchmark is synthetic, can we generate it on-the-fly in the bench code? This would also make documenting it easier, together with how this was extracted from the real solana crate case. (Large generated files in the repo, especially when we have different types of syntaxes for benchmarks, tend to become a maintenance burden).

Generally, with more context on the change and more documentation on the feature, i could give you a more helpful review here.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 14, 2024

I updated the PR: the bench data is now generated in the code.

benches/backtracking.rs Outdated Show resolved Hide resolved
benches/backtracking.rs Outdated Show resolved Hide resolved
@x-hgg-x x-hgg-x force-pushed the new-bench branch 7 times, most recently from cc1a9dc to 1d01141 Compare November 16, 2024 19:25
@konstin
Copy link
Member

konstin commented Nov 18, 2024

I updated the PR: the bench data is now generated in the code.

Thanks, that's much easier to follow!

Can you add some docstrings explaining the benchmarks functions? Both what package structure they build, and how this relates to the real world case we are modelling here, like telling they story around how we get from trying to fix the slowest cargo-pubgrub cases to this small function. As a maintainer, it's important to have context on the changes, so we can reason about future change to this code. Imagine someone with no cargo background sees changes in these benchmarks from their pubgrub changes, and tries to make sense of what they mean. Or the other way round, someone finds a way to speed up these benchmarks, can they be confident that this does indeed improve solana or have they accidentally just hacked the benchmark?

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 18, 2024

These new benchmarks are not specific to the cargo implementation nor a specific crate resolution: they describe general patterns which can cause backtracking, so they should be useful even for uv.

@konstin
Copy link
Member

konstin commented Nov 18, 2024

I think it's good to have such benchmarks (they are definitely way more realistic than sudoku dummies), but in general we should aim at benchmarks that represent real world use cases. While pubgrub is a SAT solver technically, packaging often follows very specific patterns and often there are only a few hard patterns per ecosystem, which we should try to capture in benchmarks. See e.g. this talk about the perf oriented js package manager oregene: https://youtu.be/eh3VME3opnE?si=vuUuRkJJVsoSOqWZ&t=323.

In uv, this was notably #135, other representative slow example these were boto3/botocore/urllib (backtracking over two packages, largely a prefetching problem plus also #135), apache-airflow (here package prioritization mattered a lot) and transformers with all extras (a huge dep tree). We optimized on bio-embeddings, too, which was a somewhat artificial case, but exercised a lot of the problems in resolving scientic/ML projects, including transformers. It allowed profiling for #177, which in turn unblocked astral-sh/uv#2452.

In cargo, solana is the exceptionally difficult case, which seems to follow a pattern we don't observe in uv: Even the pathological bio-embeddings benchmark now takes only 3s on my machine, and i don't know any real package that would even take 1s with cached metadata on my machine.

Again, I'm not trying to say that these benchmarks are bad (we should have diverse benchmark coverage!), but I want to explain why sharing context and adding documentation around benchmarks is important.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Nov 18, 2024

I added some docstrings to explain how these benchmarks were derived from a real cargo resolution.

Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like reasonable things for our project to keep an eye on.

@Eh2406 Eh2406 added this pull request to the merge queue Nov 18, 2024
Merged via the queue into pubgrub-rs:dev with commit 8a29d57 Nov 18, 2024
6 checks passed
@x-hgg-x x-hgg-x deleted the new-bench branch November 18, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants