Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(fingerprint): Don't throwaway the cache on RUSTFLAGS changes #14830

Merged
merged 9 commits into from
Dec 6, 2024
153 changes: 106 additions & 47 deletions src/cargo/core/compiler/build_runner/compilation_files.rs
epage marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -41,24 +41,25 @@ impl fmt::Debug for UnitHash {
}
}

/// The `Metadata` is a hash used to make unique file names for each unit in a
/// build. It is also used for symbol mangling.
/// [`Metadata`] tracks several [`UnitHash`]s, including
/// [`Metadata::unit_id`], [`Metadata::c_metadata`], and [`Metadata::c_extra_filename`].
///
/// For example:
/// We use a hash because it is an easy way to guarantee
/// that all the inputs can be converted to a valid path.
///
/// [`Metadata::unit_id`] is used to uniquely identify a unit in the build graph.
/// This serves as a similar role as [`Metadata::c_extra_filename`] in that it uniquely identifies output
/// on the filesystem except that its always present.
///
/// [`Metadata::c_extra_filename`] is needed for cases like:
/// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
/// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
///
/// In general this must include all things that need to be distinguished in different parts of
/// This also acts as the main layer of caching provided by Cargo
/// so this must include all things that need to be distinguished in different parts of
/// the same build. This is absolutely required or we override things before
/// we get chance to use them.
///
/// It is also used for symbol mangling, because if you have two versions of
/// the same crate linked together, their symbols need to be differentiated.
///
/// We use a hash because it is an easy way to guarantee
/// that all the inputs can be converted to a valid path.
///
/// This also acts as the main layer of caching provided by Cargo.
/// For example, we want to cache `cargo build` and `cargo doc` separately, so that running one
/// does not invalidate the artifacts for the other. We do this by including [`CompileMode`] in the
/// hash, thus the artifacts go in different folders and do not override each other.
Expand All @@ -73,37 +74,41 @@ impl fmt::Debug for UnitHash {
/// more space than needed. This makes not including something in `Metadata`
/// a form of cache invalidation.
///
/// You should also avoid anything that would interfere with reproducible
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
/// rebuild is needed.
///
/// [`Metadata::c_metadata`] is used for symbol mangling, because if you have two versions of
/// the same crate linked together, their symbols need to be differentiated.
///
/// You should avoid anything that would interfere with reproducible
/// builds. For example, *any* absolute path should be avoided. This is one
/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
/// reason that `RUSTFLAGS` is not in [`Metadata::c_metadata`], because it often has
/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
/// reproducible builds and has absolute paths in it). Also, in some cases the
/// mangled symbols need to be stable between different builds with different
/// settings. For example, profile-guided optimizations need to swap
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
///
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
/// rebuild is needed.
#[derive(Copy, Clone, Debug)]
pub struct Metadata {
meta_hash: UnitHash,
use_extra_filename: bool,
unit_id: UnitHash,
c_metadata: UnitHash,
c_extra_filename: Option<UnitHash>,
}

impl Metadata {
/// A hash to identify a given [`Unit`] in the build graph
pub fn unit_id(&self) -> UnitHash {
self.meta_hash
self.unit_id
}

/// A hash to add to symbol naming through `-C metadata`
pub fn c_metadata(&self) -> UnitHash {
self.meta_hash
self.c_metadata
}

/// A hash to add to file names through `-C extra-filename`
pub fn c_extra_filename(&self) -> Option<UnitHash> {
self.use_extra_filename.then_some(self.meta_hash)
self.c_extra_filename
}
}

Expand Down Expand Up @@ -588,56 +593,54 @@ fn compute_metadata(
metas: &mut HashMap<Unit, Metadata>,
) -> Metadata {
let bcx = &build_runner.bcx;
let mut hasher = StableHasher::new();
let deps_metadata = build_runner
.unit_deps(unit)
.iter()
.map(|dep| *metadata_of(&dep.unit, build_runner, metas))
.collect::<Vec<_>>();
let use_extra_filename = use_extra_filename(bcx, unit);

METADATA_VERSION.hash(&mut hasher);
let mut shared_hasher = StableHasher::new();

METADATA_VERSION.hash(&mut shared_hasher);

// Unique metadata per (name, source, version) triple. This'll allow us
// to pull crates from anywhere without worrying about conflicts.
unit.pkg
.package_id()
.stable_hash(bcx.ws.root())
.hash(&mut hasher);
.hash(&mut shared_hasher);

// Also mix in enabled features to our metadata. This'll ensure that
// when changing feature sets each lib is separately cached.
unit.features.hash(&mut hasher);

// Mix in the target-metadata of all the dependencies of this target.
let mut deps_metadata = build_runner
.unit_deps(unit)
.iter()
.map(|dep| metadata_of(&dep.unit, build_runner, metas).meta_hash)
.collect::<Vec<_>>();
deps_metadata.sort();
deps_metadata.hash(&mut hasher);
unit.features.hash(&mut shared_hasher);

// Throw in the profile we're compiling with. This helps caching
// `panic=abort` and `panic=unwind` artifacts, additionally with various
// settings like debuginfo and whatnot.
unit.profile.hash(&mut hasher);
unit.mode.hash(&mut hasher);
build_runner.lto[unit].hash(&mut hasher);
unit.profile.hash(&mut shared_hasher);
unit.mode.hash(&mut shared_hasher);
build_runner.lto[unit].hash(&mut shared_hasher);

// Artifacts compiled for the host should have a different
// metadata piece than those compiled for the target, so make sure
// we throw in the unit's `kind` as well. Use `fingerprint_hash`
// so that the StableHash doesn't change based on the pathnames
// of the custom target JSON spec files.
unit.kind.fingerprint_hash().hash(&mut hasher);
unit.kind.fingerprint_hash().hash(&mut shared_hasher);

// Finally throw in the target name/kind. This ensures that concurrent
// compiles of targets in the same crate don't collide.
unit.target.name().hash(&mut hasher);
unit.target.kind().hash(&mut hasher);
unit.target.name().hash(&mut shared_hasher);
unit.target.kind().hash(&mut shared_hasher);

hash_rustc_version(bcx, &mut hasher, unit);
hash_rustc_version(bcx, &mut shared_hasher, unit);

if build_runner.bcx.ws.is_member(&unit.pkg) {
// This is primarily here for clippy. This ensures that the clippy
// artifacts are separate from the `check` ones.
if let Some(path) = &build_runner.bcx.rustc().workspace_wrapper {
path.hash(&mut hasher);
path.hash(&mut shared_hasher);
}
}

Expand All @@ -648,7 +651,7 @@ fn compute_metadata(
.gctx
.get_env("__CARGO_DEFAULT_LIB_METADATA")
{
channel.hash(&mut hasher);
channel.hash(&mut shared_hasher);
}

// std units need to be kept separate from user dependencies. std crates
Expand All @@ -658,7 +661,7 @@ fn compute_metadata(
// don't need unstable support. A future experiment might be to set
// `is_std` to false for build dependencies so that they can be shared
// with user dependencies.
unit.is_std.hash(&mut hasher);
unit.is_std.hash(&mut shared_hasher);

// While we don't hash RUSTFLAGS because it may contain absolute paths that
// hurts reproducibility, we track whether a unit's RUSTFLAGS is from host
Expand All @@ -677,15 +680,71 @@ fn compute_metadata(
.target_config(CompileKind::Host)
.links_overrides
!= unit.links_overrides;
target_configs_are_different.hash(&mut hasher);
target_configs_are_different.hash(&mut shared_hasher);
}

let mut c_metadata_hasher = shared_hasher.clone();
// Mix in the target-metadata of all the dependencies of this target.
let mut dep_c_metadata_hashes = deps_metadata
.iter()
.map(|m| m.c_metadata)
.collect::<Vec<_>>();
dep_c_metadata_hashes.sort();
dep_c_metadata_hashes.hash(&mut c_metadata_hasher);

let mut c_extra_filename_hasher = shared_hasher.clone();
// Mix in the target-metadata of all the dependencies of this target.
let mut dep_c_extra_filename_hashes = deps_metadata
.iter()
.map(|m| m.c_extra_filename)
.collect::<Vec<_>>();
dep_c_extra_filename_hashes.sort();
dep_c_extra_filename_hashes.hash(&mut c_extra_filename_hasher);
// Avoid trashing the caches on RUSTFLAGS changing via `c_extra_filename`
//
// Limited to `c_extra_filename` to help with reproducible build / PGO issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should (probably) avoid the PGO problems, since the symbols won't change. However, this could still cause problems with reproducible builds if the rlib filename somehow leaks into the resulting binary (I don't have a reason to think it does, but it seems risky).

I believe this does because the filenames of object files show up as the codegen unit names in dwarf debug info (and probably PDB?). I suspect that everything else in the binary will be the same, but builds differing only in extra-filename are likely to still produce different binaries.

Have we checked if it doesn't affect PGO?
We would like to check if anything is leaking rlib filename in debug info files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to go about that? I have no experience with PGO and we have no tests focusing on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some experiments this week and see if we can have tests written down!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted #14859

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes with this PR for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meeting notes, it says we were going to move forward with the hack where we conditionally add RUSTFLAGS to -Cextra-filename

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am forgetting things more often recently…
Given that, this PR is ready to go!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this from the queue as I'd like to add that hack before this gets merged so we don't accidentally break people.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move forward with ed's hack.

I am confused by the meeting note. I assumed Ed's hack is this PR. The “hack” you are going to add is #6966, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6966 filters out --remap-path-prefix from being hashed.

The hack I proposed is that we skip hashing all RUSTFLAGS if something that looks like --remap-path-prefix is present.

let default = Vec::new();
let extra_args = build_runner.bcx.extra_args_for(unit).unwrap_or(&default);
if !has_remap_path_prefix(&extra_args) {
extra_args.hash(&mut c_extra_filename_hasher);
}
if unit.mode.is_doc() || unit.mode.is_doc_scrape() {
if !has_remap_path_prefix(&unit.rustdocflags) {
unit.rustdocflags.hash(&mut c_extra_filename_hasher);
}
} else {
if !has_remap_path_prefix(&unit.rustflags) {
unit.rustflags.hash(&mut c_extra_filename_hasher);
}
}

let c_metadata = UnitHash(c_metadata_hasher.finish());
let c_extra_filename = UnitHash(c_extra_filename_hasher.finish());
let unit_id = c_extra_filename;

let c_extra_filename = use_extra_filename.then_some(c_extra_filename);

Metadata {
meta_hash: UnitHash(hasher.finish()),
use_extra_filename: use_extra_filename(bcx, unit),
unit_id,
c_metadata,
c_extra_filename,
}
}

/// HACK: Detect the *potential* presence of `--remap-path-prefix`
///
/// As CLI parsing is contextual and dependent on the CLI definition to understand the context, we
/// can't say for sure whether `--remap-path-prefix` is present, so we guess if anything looks like
/// it.
/// If we could, we'd strip it out for hashing.
/// Instead, we use this to avoid hashing rustflags if it might be present to avoid the risk of taking
/// a flag that is trying to make things reproducible and making things less reproducible by the
/// `-Cextra-filename` showing up in the rlib, even with `split-debuginfo`.
fn has_remap_path_prefix(args: &[String]) -> bool {
args.iter()
.any(|s| s.starts_with("--remap-path-prefix=") || s == "--remap-path-prefix")
}

/// Hash the version of rustc being used during the build process.
fn hash_rustc_version(bcx: &BuildContext<'_, '_>, hasher: &mut StableHasher, unit: &Unit) {
let vers = &bcx.rustc().version;
Expand Down
69 changes: 39 additions & 30 deletions src/cargo/core/compiler/fingerprint/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,46 +47,48 @@
//! reliable and reproducible builds at the cost of being complex, slow, and
//! platform-dependent.
//!
//! ## Fingerprints and Metadata
//! ## Fingerprints and [`UnitHash`]s
//!
//! [`Metadata`] tracks several [`UnitHash`]s, including
//! [`Metadata::unit_id`], [`Metadata::c_metadata`], and [`Metadata::c_extra_filename`].
//! See its documentation for more details.
//!
//! The [`Metadata`] hash is a hash added to the output filenames to isolate
//! each unit. See its documentationfor more details.
//! NOTE: Not all output files are isolated via filename hashes (like dylibs).
//! The fingerprint directory uses a hash, but sometimes units share the same
//! fingerprint directory (when they don't have Metadata) so care should be
//! taken to handle this!
//!
//! Fingerprints and Metadata are similar, and track some of the same things.
//! The Metadata contains information that is required to keep Units separate.
//! Fingerprints and [`UnitHash`]s are similar, and track some of the same things.
//! [`UnitHash`]s contains information that is required to keep Units separate.
//! The Fingerprint includes additional information that should cause a
//! recompile, but it is desired to reuse the same filenames. A comparison
//! of what is tracked:
//!
//! Value | Fingerprint | Metadata
//! -------------------------------------------|-------------|----------
//! rustc | ✓ | ✓
//! [`Profile`] | ✓ | ✓
//! `cargo rustc` extra args | ✓ |
//! [`CompileMode`] | ✓ | ✓
//! Target Name | ✓ | ✓
//! `TargetKind` (bin/lib/etc.) | ✓ | ✓
//! Enabled Features | ✓ | ✓
//! Declared Features | ✓ |
//! Immediate dependency’s hashes | ✓[^1] | ✓
//! [`CompileKind`] (host/target) | ✓ | ✓
//! `__CARGO_DEFAULT_LIB_METADATA`[^4] | | ✓
//! `package_id` | | ✓
//! authors, description, homepage, repo | ✓ |
//! Target src path relative to ws | ✓ |
//! Target flags (test/bench/for_host/edition) | ✓ |
//! -C incremental=… flag | ✓ |
//! mtime of sources | ✓[^3] |
//! RUSTFLAGS/RUSTDOCFLAGS | ✓ |
//! [`Lto`] flags | ✓ | ✓
//! config settings[^5] | ✓ |
//! `is_std` | | ✓
//! `[lints]` table[^6] | ✓ |
//! `[lints.rust.unexpected_cfgs.check-cfg]` | ✓ |
//! Value | Fingerprint | `Metadata::unit_id` | `Metadata::c_metadata` | `Metadata::c_extra_filename`
//! -------------------------------------------|-------------|---------------------|------------------------|----------
//! rustc | ✓ | ✓ | ✓ | ✓
//! [`Profile`] | ✓ | ✓ | ✓ | ✓
//! `cargo rustc` extra args | ✓ | ✓[^7] | | ✓[^7]
//! [`CompileMode`] | ✓ | ✓ | ✓ | ✓
//! Target Name | ✓ | ✓ | ✓ | ✓
//! `TargetKind` (bin/lib/etc.) | ✓ | ✓ | ✓ | ✓
//! Enabled Features | ✓ | ✓ | ✓ | ✓
//! Declared Features | ✓ | | |
//! Immediate dependency’s hashes | ✓[^1] | ✓ | ✓ | ✓
//! [`CompileKind`] (host/target) | ✓ | ✓ | ✓ | ✓
//! `__CARGO_DEFAULT_LIB_METADATA`[^4] | | ✓ | ✓ | ✓
//! `package_id` | | ✓ | ✓ | ✓
//! authors, description, homepage, repo | ✓ | | |
//! Target src path relative to ws | ✓ | | |
//! Target flags (test/bench/for_host/edition) | ✓ | | |
//! -C incremental=… flag | ✓ | | |
//! mtime of sources | ✓[^3] | | |
//! RUSTFLAGS/RUSTDOCFLAGS | ✓ | ✓[^7] | | ✓[^7]
//! [`Lto`] flags | ✓ | ✓ | ✓ | ✓
//! config settings[^5] | ✓ | | |
//! `is_std` | | ✓ | ✓ | ✓
//! `[lints]` table[^6] | ✓ | | |
//! `[lints.rust.unexpected_cfgs.check-cfg]` | ✓ | | |
//!
//! [^1]: Build script and bin dependencies are not included.
//!
Expand All @@ -100,6 +102,9 @@
//!
//! [^6]: Via [`Manifest::lint_rustflags`][crate::core::Manifest::lint_rustflags]
//!
//! [^7]: extra-flags and RUSTFLAGS are conditionally excluded when `--remap-path-prefix` is
//! present to avoid breaking build reproducibility while we wait for trim-paths
//!
//! When deciding what should go in the Metadata vs the Fingerprint, consider
//! that some files (like dylibs) do not have a hash in their filename. Thus,
//! if a value changes, only the fingerprint will detect the change (consider,
Expand Down Expand Up @@ -348,6 +353,10 @@
//!
//! [`check_filesystem`]: Fingerprint::check_filesystem
//! [`Metadata`]: crate::core::compiler::Metadata
//! [`Metadata::unit_id`]: crate::core::compiler::Metadata::unit_id
//! [`Metadata::c_metadata`]: crate::core::compiler::Metadata::c_metadata
//! [`Metadata::c_extra_filename`]: crate::core::compiler::Metadata::c_extra_filename
//! [`UnitHash`]: crate::core::compiler::UnitHash
//! [`Profile`]: crate::core::profiles::Profile
//! [`CompileMode`]: crate::core::compiler::CompileMode
//! [`Lto`]: crate::core::compiler::Lto
Expand Down
1 change: 1 addition & 0 deletions src/cargo/util/hasher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

use std::hash::{Hasher, SipHasher};

#[derive(Clone)]
pub struct StableHasher(SipHasher);

impl StableHasher {
Expand Down
1 change: 0 additions & 1 deletion tests/testsuite/config_include.rs
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,6 @@ fn works_with_cli() {
p.cargo("check -v -Z config-include")
.masquerade_as_nightly_cargo(&["config-include"])
.with_stderr_data(str![[r#"
[DIRTY] foo v0.0.1 ([ROOT]/foo): the rustflags changed
[CHECKING] foo v0.0.1 ([ROOT]/foo)
[RUNNING] `rustc [..]-W unsafe-code -W unused`
[FINISHED] `dev` profile [unoptimized + debuginfo] target(s) in [ELAPSED]s
Expand Down
Loading
Loading