-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve resolver speed again #14665
base: master
Are you sure you want to change the base?
Improve resolver speed again #14665
Conversation
I benchmarked against an earlier version of this code. Running across all of crates.io (including solana) went from |
8979c6c
to
0ff9d29
Compare
I tested |
Good to know! I will have to rerun when I get a chance. The most likely explanation is a random hiccup on my computer, that's pretty likely when carefully measuring time of a process that runs for an hour. |
I ran |
0ff9d29
to
ee74105
Compare
For |
A more targeted run, that only looked at crates containing the word |
type Activations = im_rc::HashMap< | ||
ActivationKey, | ||
(Summary, ContextAge), | ||
nohash_hasher::BuildNoHashHasher<ActivationKey>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This PR does a number of different things, and I'm trying to understand which ones are critical to the performance win. I tried switching this one nohash_hasher::BuildNoHashHasher
back to rustc_hash::FxBuildHasher
, and the Solana only benchmark I was running yesterday runs in 162.53min (or 164.63min there seems to be a surprising amount of run to run variability).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems that rustc-hash
is already very optimized when hashing a single u64
:
https://github.com/rust-lang/rustc-hash/blob/master/src/lib.rs#L99.
In this case the performance increase may only be due to fact that we don't hash the full ActivationKeyInner
but only its address.
I wanted to spend some time teasing apart exactly what part of the old hash was slow. The old hash but modified to hash the pointer of the With the holiday I will probably not work on this till Monday. Sorry for the delay. |
I think we should also define a custom |
So I was able to get about 50% of benefit was much less churn. I change the order so that we only do the #[derive(Clone, PartialEq, Eq, Debug)]
pub struct ActivationsKey(InternedString, SemverCompatibility, SourceId);
impl std::hash::Hash for ActivationsKey {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
std::ptr::NonNull::from(self.0.as_str()).hash(state);
self.1.hash(state);
// self.2.hash(state); // Packages that only differ by SourceId are rare enough to not be worth hashing
}
} So the other half the performance is because despite being interned So that leads to two questions about maintainability. Pinging @epage for strong and useful opinions about cleaning up our code base:
|
☔ The latest upstream changes (presumably #14692) made this pull request unmergeable. Please resolve the merge conflicts. |
If we can get the performance benefit without storing the activation key inside of a If there are cheap things we can do, like re-order fields, that seems like an easy win. Cleaning up |
This is a perf improvement I suggested #14665 (comment) I mostly want this landed to make it easier to compare the cost and benefits of more complicated changes. As this is the thing to compare against.
What does this PR try to resolve?
Follow-up of #14663 which has been splitted.
How should we test and review this PR?
Commit 1 enables the
clippy::derive_ord_xor_partial_ord
lint to be sure thatOrd
andPartialOrd
are implemented correctly everywhere in the repository.Commit 2 moves the
ActivationKey
type incore
.Commit 3 adds interning for the resolver activation keys so they can be used with
nohash-hasher
, decreasing resolving time by another 30%. This is possible since the interning is implemented with a staticHashSet
, so each interned element can be reduced to its address.Performance comparison with the test added in the previous PR by setting
LAST_CRATE_VERSION_COUNT = 100
:Firefox profiles, can be read with https://profiler.firefox.com:
perf.tar.gz
r? Eh2406