Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resolver speed again #14665

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

x-hgg-x
Copy link
Contributor

@x-hgg-x x-hgg-x commented Oct 10, 2024

What does this PR try to resolve?

Follow-up of #14663 which has been splitted.

How should we test and review this PR?

Commit 1 enables the clippy::derive_ord_xor_partial_ord lint to be sure that Ord and PartialOrd are implemented correctly everywhere in the repository.

Commit 2 moves the ActivationKey type in core.

Commit 3 adds interning for the resolver activation keys so they can be used with nohash-hasher, decreasing resolving time by another 30%. This is possible since the interning is implemented with a static HashSet, so each interned element can be reduced to its address.

Performance comparison with the test added in the previous PR by setting LAST_CRATE_VERSION_COUNT = 100:

commit duration
master 16s
with rustc-hash 6.7s
with nohash 4.6s

Firefox profiles, can be read with https://profiler.firefox.com:
perf.tar.gz

r? Eh2406

@rustbot rustbot added A-dependency-resolution Area: dependency resolution and the resolver S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2024
@x-hgg-x x-hgg-x mentioned this pull request Oct 10, 2024
@Eh2406
Copy link
Contributor

Eh2406 commented Oct 10, 2024

I benchmarked against an earlier version of this code. Running across all of crates.io (including solana) went from 235.88min to 218.03min that is a solid 7.5% improvement. solana-transaction-status v1.3.10 improve the most going from 49.2s to 35.1s a 14s improvement! On the other hand solana-core v1.0.5 went from 99s to 114s that is 14s worse. Here is the disaggregated data for crates that took >=0.02 to build.
sort_join.csv

@x-hgg-x x-hgg-x force-pushed the resolver-perf-2 branch 2 times, most recently from 8979c6c to 0ff9d29 Compare October 10, 2024 18:15
@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Oct 10, 2024

I tested solana-core v1.0.5 on my machine, it resolved in exactly the same time before and after this PR.

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 10, 2024

Good to know! I will have to rerun when I get a chance. The most likely explanation is a random hiccup on my computer, that's pretty likely when carefully measuring time of a process that runs for an hour.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Oct 10, 2024

I tested solana-core v1.0.5 on my machine, it resolved in exactly the same time before and after this PR.

I ran solana-core v1.0.5 on another machine where I can do a perf, and I observe a 30% improvement with this PR going from 6m25 to 4m40. I may have some throttling problems on the first machine because the timings are not stable.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Oct 10, 2024

For solana-core v1.0.5, cargo spent more than 30s dropping the ResolverContext:
flamegraph.svg.tar.gz

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 10, 2024

A more targeted run, that only looked at crates containing the word solana went from 180.61min from master to 162.47min on this PR. With this run solana-core v1.0.5 went from 139s to 112s! So that one may have been a fluke on my part.
out_join.csv

type Activations = im_rc::HashMap<
ActivationKey,
(Summary, ContextAge),
nohash_hasher::BuildNoHashHasher<ActivationKey>,
Copy link
Contributor

@Eh2406 Eh2406 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This PR does a number of different things, and I'm trying to understand which ones are critical to the performance win. I tried switching this one nohash_hasher::BuildNoHashHasher back to rustc_hash::FxBuildHasher, and the Solana only benchmark I was running yesterday runs in 162.53min (or 164.63min there seems to be a surprising amount of run to run variability).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that rustc-hash is already very optimized when hashing a single u64:
https://github.com/rust-lang/rustc-hash/blob/master/src/lib.rs#L99.

In this case the performance increase may only be due to fact that we don't hash the full ActivationKeyInner but only its address.

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 12, 2024

I wanted to spend some time teasing apart exactly what part of the old hash was slow. The old hash but modified to hash the pointer of the InternedString instead of its contents got me 25% of the savings. It would be hard to believe that the SemverCompatibility is slow, it's basically two numbers. Hashing for SourceId is surprisingly complicated. (note to self, using SourceId::full_hash is fast but gets the wrong answer.) But I kept getting distracted. Not hashing the SourceId (with the InternedString fix) got about 80% of the savings. If keys only differ by their SourceId a hash map becomes O(n), but n is the number of packages with the same name and major version that only differ by source. Makes me wonder if hashing only by the pointer of the InternedString would also be effective.

With the holiday I will probably not work on this till Monday. Sorry for the delay.

@x-hgg-x
Copy link
Contributor Author

x-hgg-x commented Oct 12, 2024

I think we should also define a custom Hash for Dependency, as it is used in the conflict cache and hashing it amounts to 5-10% of the total resolving time.

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 14, 2024

So I was able to get about 50% of benefit was much less churn. I change the order so that we only do the Eq test for SourceId after the others are determined to be equal, and I overrode Hash to look at the InternedString address and not look at SourceId.

#[derive(Clone, PartialEq, Eq, Debug)]
pub struct ActivationsKey(InternedString, SemverCompatibility, SourceId);

impl std::hash::Hash for ActivationsKey {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        std::ptr::NonNull::from(self.0.as_str()).hash(state);
        self.1.hash(state);
        // self.2.hash(state); // Packages that only differ by SourceId are rare enough to not be worth hashing
    }
}

So the other half the performance is because despite being interned SourceId::Eq is not a ptr::eq. Which in turn is because SourceIds concept of identity is a complete mess. The code relies on having to IDs that are Eq but do not have the same values for their fields.

So that leads to two questions about maintainability. Pinging @epage for strong and useful opinions about cleaning up our code base:

  1. Any ideas about how to clean up the definition of "what is a SourceId" so that the structs we've defined actually match the semantics were using? The only thing I can think of is splitting it up into a CanonicalSource that just has the kind and canonical URL, and a SourceInfo that has the other fields. Then struct SourceId {canonical: &'static CanonicalSource, info: &'static SourceInfo}. Then EQ and fast_hash (a new method for us to use in ActivationsKey) can look at the ptr to canonical. But I'm open to other suggestions.
  2. Is the remaining performance benefit worth merging this PR as is, specifically involving adding core::activation_key::ActivationKey with the bike shedding and documentation work? I'm leaning toward no for now. @x-hgg-x has come across a number of good leads for making resolution faster and we should reevaluate this after we followed up on those. Ed had some initial feedback in Improve resolver speed #14663, so I wanted his opinion here.

@bors
Copy link
Contributor

bors commented Oct 15, 2024

☔ The latest upstream changes (presumably #14692) made this pull request unmergeable. Please resolve the merge conflicts.

@epage
Copy link
Contributor

epage commented Nov 8, 2024

If we can get the performance benefit without storing the activation key inside of a PackageId, we shouldn't. The activation key is specific to the resolver and should be scoped to it.

If there are cheap things we can do, like re-order fields, that seems like an easy win.

Cleaning up SourceId would be nice and if it means we can then hash / eq on a cheaper value, then great!

github-merge-queue bot pushed a commit that referenced this pull request Dec 10, 2024
This is a perf improvement I suggested
#14665 (comment)

I mostly want this landed to make it easier to compare the cost and
benefits of more complicated changes. As this is the thing to compare
against.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dependency-resolution Area: dependency resolution and the resolver S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants