-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement RFC 3553 to add SBOM support #13709
base: master
Are you sure you want to change the base?
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
74dafa0
to
190682e
Compare
Much respect for your contribution. From my kind reminders, it seems appropriate to modify the documentation of the corresponding sections, e.g. Configuration, Environment Variables. |
Thanks for the reminder, @heisen-li. Would love to see a doc update, though we should probably focus on the design discussion first, as the location of the configuration is not yet decided. (See rust-lang/rfcs#3553 (comment)). |
One approach for the docs (if this is looking to be merged) is to put the env and config documentation fragments in the Unstable docs. |
190682e
to
ae0881c
Compare
tests/testsuite/sbom.rs
Outdated
assert!(p.bin("foo").with_extension("cargo-sbom.json").is_file()); | ||
assert_eq!( | ||
1, | ||
p.glob(p.target_debug_dir().join("libfoo.cargo-sbom.json")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we might need to deal with different naming convention on different platform. (Windows specifically?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the glob
call can be simplified in a way that it works for all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just note that I reviewed this as-is, didn't really think too much for the design itself. Thank you for working on this!
☔ The latest upstream changes (presumably #13571) made this pull request unmergeable. Please resolve the merge conflicts. |
1cfd71a
to
376fe1e
Compare
67332d6
to
0aa10e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I like the idea of having this PR to explore SBOM format. I'll post back issues we've found so far to the RFC. Thank you :)
} | ||
|
||
#[derive(Serialize)] | ||
struct Sbom { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a dependencies field for this top-level Sbom
?
(Just a question. I don't really know if other SBOM formats need it to recover the dependency graph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, yes. Copying my comment from the RFC:
Note that there are two ways of looking at dependencies: what each package needs, and the final resolved graph.
For example, if one package depends on
rand
withfeatures = ["std", "getrandom"]
, and another withfeatures = ["std", "simd_support"]
, the final resolved features will be["std", "getrandom", "simd_support"]
. Depending on the use case you may need either or both representations (direct package dependencies and the resolved graph).
cargo metadata
exposes both (under "packages" and "resolve" fields), but inaccurately:
- cargo-metadata always resolves features at the workspace level #7754
- Under resolver v2, it conflates the normal and build-dependency trees
I think it would be best for the SBOM to also expose both, accurately this time.
So what I would like to see is two resolved dependency trees: one for normal dependencies and one for build dependencies, matching the way feature resolver v2 works.
c8e1bc8
to
8d5fa4d
Compare
adf4f19
to
0c7f60f
Compare
496d62e
to
bc95299
Compare
☔ The latest upstream changes (presumably #14576) made this pull request unmergeable. Please resolve the merge conflicts. |
Similar to the generation of `depinfo` files, a function is called to generated SBOM precursor file named `output_sbom`. It takes the `BuildRunner` & the current `Unit`. The `sbom` flag can be specified as a cargo build option, but it's currently not configured fully. To test the generation the flag is set to `true`. * use SBOM types to serialize data Output source, profile & dependencies Trying to fetch all dependencies This ignores dependencies for custom build scripts. The output should be similar to what `cargo tree` reports. Output package dependencies This is similar to what the `cargo metadata` command outputs. Extract logic to fetch sbom output files This extracts the logic to get the list of SBOM output file paths into its own function in `BuildRunner` for a given Unit. Add test file to check sbom output * add test to check project with bin & lib * extract sbom config into helper function Add build type to dependency Add test to read JSON Still needs to check output. Guard sbom logic behind unstable feature Add test with custom build script Integrate review feedback * disable `sbom` config when `-Zsbom` is not passed as unstable option * refactor tests * add test Expand end-to-end tests This expands the tests to reflect end-to-end tests by comparing the generated JSON output files with expected strings. * add test helper to compare actual & expected JSON content * refactor setup of packages in test Add 'sbom' section to unstable features doc Append SBOM file suffix instead of replacing Instead of replacing the file extension, the `.cargo-sbom.json` suffix is appended to the output file. This is to keep existing file extensions in place. * refactor logic to set `sbom` property from build config * expand build script related test to check JSON output Integrate review feedback * use `PackageIdSpec` instead of only `PackageId` in SBOM output * change `version` of a dependency to `Option<Version>` * output `Vec<CrateType>` instead of only the first found crate type * output rustc workspace wrapper * update 'warning' string in test using `[WARNING]` * use `serde_json::to_writer` to serialize SBOM * set sbom suffix in tests explicitely, instead of using `with_extension` Output additional fields to JSON In case a unit's profile differs from the profile information on root level, it's added to the package information to the JSON output. The verbose output for `rustc -vV` is also written to the `rustc` field in the SBOM. * rename `fetch_packages` to `collect_packages` * update JSON in tests to include profile information Add test to check multiple crate types Add test to check artifact name conflict Use SbomProfile to wrap Profile type This adds the `SbomProfile` to convert the existing `Profile` into, to expose relevant fields. For now it removes the `strip` field, while serializing all other fields. It should keep the output consistent, even when fields in the `Profile` change, e.g. new field added. Document package profile * only export `profile` field in case it differs from root profile Add test to check different features The added test uses a crate with multiple features. The main crate uses the dependency in the normal build & the custom build script with different features. Refactor storing of package dependencies All dependencies for a package are indices into the `packages` list now. This sets the correct association between a dependency & its associated package. * remove `SbomDependency` struct Refactor tests to use snapbox
bc95299
to
848cc32
Compare
#[derive(Serialize, Clone, Debug, Copy)] | ||
#[serde(rename_all = "kebab-case")] | ||
enum SbomBuildType { | ||
/// A package dependency | ||
Normal, | ||
/// A build script dependency | ||
Build, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be consistent with cargo metadata
wrt th schema for this?
#[derive(Serialize, Clone, Debug)] | ||
struct SbomProfile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No case is put on this. Is snake_case
intentional? Looks like thats what we use for cargo metadata
impl From<&Profile> for SbomProfile { | ||
fn from(profile: &Profile) -> Self { | ||
let rustflags = profile | ||
.rustflags | ||
.iter() | ||
.map(|x| x.to_string()) | ||
.collect_vec(); | ||
|
||
Self { | ||
name: profile.name.to_string(), | ||
opt_level: profile.opt_level.to_string(), | ||
lto: profile.lto, | ||
codegen_backend: profile.codegen_backend.map(|x| x.to_string()), | ||
codegen_units: profile.codegen_units.clone(), | ||
debuginfo: profile.debuginfo.clone(), | ||
split_debuginfo: profile.split_debuginfo.map(|x| x.to_string()), | ||
debug_assertions: profile.debug_assertions, | ||
overflow_checks: profile.overflow_checks, | ||
rpath: profile.rpath, | ||
incremental: profile.incremental, | ||
panic: profile.panic.to_string(), | ||
rustflags, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A part of me wants to have us profile.clone()
and destructure Profile
so we make sure we update this whenever a new Profile
field is added (and to make it easier to review to make sure all fields are present)
version: Option<Version>, | ||
features: Vec<String>, | ||
build_type: SbomBuildType, | ||
extern_crate_name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
package_id: PackageIdSpec, | ||
package: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats package
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its the name, but why include it (and version
) when its in package_id
?
let build_type = if unit_dep.unit.mode.is_run_custom_build() { | ||
SbomBuildType::Build | ||
} else { | ||
SbomBuildType::Normal | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for is_run_custom_build
says
/// Returns `true` if this is the *execution* of a `build.rs` script.
So this isn't saying whether this is a normal or build instance but whether this will be run as a build script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in build scripts show up twice, once for the Build mode and once for the Run mode
|
||
/// Describes a package dependency | ||
#[derive(Serialize, Clone, Debug)] | ||
struct SbomPackage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide whether the sbom will track packages instead of crates?
|
||
#[derive(Serialize)] | ||
struct Sbom { | ||
format_version: SbomFormatVersion<1>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not-blocking: eventually we should move this out into cargo-util-schemas
and provide Deserialize
impls so people can use that to read this file. SbomFormatVersion
isn't compatible with that.
package_id: PackageIdSpec, | ||
name: String, | ||
version: String, | ||
source: String, | ||
target: SbomTarget, | ||
profile: SbomProfile, | ||
packages: Vec<SbomPackage>, | ||
features: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't most of this redundant with packages
, is there a reason we don't just have a root_package
field?
build_type: SbomBuildType, | ||
extern_crate_name: String, | ||
/// Indices into the `packages` array | ||
dependencies: Vec<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we newtype the usize
? On one hand, it will make it easy to find all all usize
s that fill those role. On the other hand, it will make indexing more annoying.
for (index, package) in visited_dependencies.iter().enumerate() { | ||
let dependencies: BTreeSet<&UnitDep> = unit_graph[&package.unit].iter().collect(); | ||
|
||
let mut indices = dependencies | ||
.iter() | ||
.filter_map(|dep| { | ||
visited_dependencies | ||
.iter() | ||
.position(|unit_dep| dep == unit_dep) | ||
}) | ||
.collect::<Vec<_>>(); | ||
|
||
if let Some(package) = packages.get_mut(index) { | ||
package.dependencies.append(&mut indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the order we iterate through visitied_dependencies
(BTreeSet
) is the same order things appear in packages (
Vec`).
Should visited_dependencies
be an IndexSet
?
for (index, package) in visited_dependencies.iter().enumerate() { | ||
let dependencies: BTreeSet<&UnitDep> = unit_graph[&package.unit].iter().collect(); | ||
|
||
let mut indices = dependencies | ||
.iter() | ||
.filter_map(|dep| { | ||
visited_dependencies | ||
.iter() | ||
.position(|unit_dep| dep == unit_dep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this O(n^3)
? Is the equality check doing a straight pointer comparison or a deep check?
@@ -396,6 +397,32 @@ It's possible to update `my-dependency` to a pre-release with `update -Zunstable | |||
This is because `0.1.2-pre.0` is considered compatible with `0.1.1`. | |||
It would not be possible to upgrade to `0.2.0-pre.0` from `0.1.1` in the same way. | |||
|
|||
## sbom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document the schema in here? yes, the RFC will have it but that will then be a snapshot in time and eventually we'll want to have it documented.
|
||
let file = with_sbom_suffix(&p.bin("foo")); | ||
let output = std::fs::read_to_string(file).unwrap(); | ||
assert_e2e().eq(output, snapbox::file!["build_sbom_using_cargo_config.json"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, having external snapshots will make it a lot harder to review these tests because I have to find the right test and right file and jump back and forth between them
assert_eq!( | ||
2, | ||
p.glob(p.target_debug_dir().join("*.cargo-sbom.json")) | ||
.count() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions is an rlib sbom created?
fn configured_project() -> ProjectBuilder { | ||
project().file( | ||
".cargo/config.toml", | ||
r#" | ||
[build] | ||
sbom = true | ||
"#, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find helpers like this not too helpful
- They obfuscate what the test is doing (it should at least say what the config is)
- They don't scale if you then want to do something else with the config
#[cargo_test] | ||
fn build_sbom_with_artifact_name_conflict() { | ||
Package::new("deps", "0.1.0") | ||
.file( | ||
"Cargo.toml", | ||
r#" | ||
[package] | ||
name = "deps" # name conflict | ||
version = "0.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this test showing?
let profile = if &unit_dep.unit.profile != root_profile { | ||
Some((&unit_dep.unit.profile).into()) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing a test that demonstrates this conditional profile
What does this PR try to resolve?
This PR is an implementation of RFC 3553 to add support to generate pre-cursor SBOM files for compiled artifacts in Cargo.
How should we test and review this PR?
The RFC 3553 adds a new option to Cargo to emit SBOM pre-cursor files. A project can be configured either by the new Cargo config field
sbom
.or using the environment variable
CARGO_BUILD_SBOM=true
. Thesbom
option is an unstable feature and requires the-Zsbom
flag to enable it.Check out this branch & compile Cargo. Pick a Cargo project to test it on, then run:
All generated
*.cargo-sbom.json
files are located in thetarget
folder alongside their artifacts. To list all generated files use:then check their content. To see the current output format, see these examples.
What does the PR not solve?
The PR leaves a task(s) open that are either out of scope or should be done in a follow-up PRs.
Additional information
There are a few things that I would like to get feedback on, in particular the generated JSON format is not final. Currently it holds the information listed in the RFC 3553, but it could be further enriched with information only available during builds.
During the implementation a number of questions arose.
UnitGraph
the right structure to determine all dependencies?compile
method to generate the SBOM files appropriate?testsuite
, are useful tests missing?Thanks @arlosi, @RobJellinghaus and @lfrancke for initial guidance & feedback.
serde_json
&axum-core
crates)