-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: cargo-sbom #3553
base: master
Are you sure you want to change the base?
RFC: cargo-sbom #3553
Conversation
text/0000-cargo-sbom.md
Outdated
- Name | ||
- Version | ||
- Source (registry / git / path etc.) | ||
- Checksum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the checksum? For third-party SBOM formats, I would instead encourage them to own the checksum generation and worry people will reuse this and put their own expectations on what this means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checksum is an optional field, since only crates from registries have checksums. If a checksum is needed for a crate that comes from a path dependency for example, it will be up to the post-processing tool to produce an appropriate value.
This text makes me wonder if "Checksum" is trying to capture version information for dependencies taken from a repository. Maybe "Checksum" and "Version" could be merged, so "Version" is the git sha when using git as your source for a crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the challenge with replacing "version" entirely with "checksum" is that a RUSTC_WORKSPACE_WRAPPER
might want to consume the version information to do their own logic, so it's not just useful for validation purposes (which is what the checksum provides).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the git case the sha is already part of the source
, an example from cargo metadata
after cargo add futures --git=https://github.com/rust-lang/futures-rs
:
"name": "futures",
"version": "0.4.0-alpha.0",
"id": "futures 0.4.0-alpha.0 (git+https://github.com/rust-lang/futures-rs#f9f8e690504529c2813caadabd85506756f8dc67)",
"source": "git+https://github.com/rust-lang/futures-rs#f9f8e690504529c2813caadabd85506756f8dc67",
text/0000-cargo-sbom.md
Outdated
- ID (opaque identifier) | ||
- Name | ||
- Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we uniquely identify one of several crates / build units within a package?
The main situations for this
- Top-level build unit is a bin and it needs to depend on its lib
- A bin or lib that needs to depend on its build script
text/0000-cargo-sbom.md
Outdated
- Rust toolchain version | ||
- `RUSTFLAGS` | ||
- Current build profile name | ||
- Selected profile values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any other config
relevant to include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we defer out all config to a future possibility so long as we make sure the format can support it?
text/0000-cargo-sbom.md
Outdated
- Checksum | ||
- Dependencies (list of IDs) | ||
- Type (normal, build) | ||
- Activated features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include env variables (I assume rustc reports to us what it read for us to fingerprint) or file paths (again, I assume rustc tells us what it read to fingerprint)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustc mentions all read env vars in depinfo. This does not include env vars read by proc macros through std::env::var.
Depinfo also contains the read files, but as far as I'm aware, the standard library sources aren't included yet (-Zbinary-dep-depinfo enables that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this can also be moved to a future possibility so long as we ensure the format can support this.
text/0000-cargo-sbom.md
Outdated
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Since there is no consensus on a single SBOM format within the software industry, and existing formats are still evolving, Cargo should not pick an existing SBOM format. If Cargo were to use existing SBOM formats, multiple formats (and multiple versions of each format) would need to be supported. The task of generating a specific SBOM format is best left to applications outside Cargo or Cargo extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call out that its not just SBOM format but also being compliant with internal and external regulations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing what text resolved this so unresolving it
text/0000-cargo-sbom.md
Outdated
- Name | ||
- Version | ||
- Source (registry / git / path etc.) | ||
- Checksum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these details need to be written to the sbom, or could they just be queried from cargo metadata
based on the Id as is proposed for other information like the license?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is one detail here that is not available from cargo metadata
: the checksum. (But as has been mentioned elsewhere, this checksum may not be that useful, instead tools may want to be following the manifest_path
from the metadata and generating their own source file hash from that).
text/0000-cargo-sbom.md
Outdated
```toml | ||
[build] | ||
sbom = true | ||
``` | ||
|
||
Or use the environment variable `CARGO_BUILD_SBOM=true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we actually call this?
And is this a build param or a profile param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a build
parameter made most sense, since there's 1 place to easily control it.
- A CI environment wants to collect SBOM information on all binaries produced, so it sets
CARGO_BUILD_SBOM=true
- A Cargo subcommand wants to enable SBOM generation when re-invoking Cargo, so it sets the flag.
If we use profiles, then it becomes harder for tooling wrapping Cargo to unconditionally enable it for the current run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profile does not preclude environment variables because the manifest profile is layered with the config profile.
$ CARGO_PROFILE_DEV_OPT_LEVEL=10 cargo check
Checking utf8parse v0.2.1
Checking anstyle-query v1.0.0
Checking colorchoice v1.0.0
Checking anstyle v1.0.2
Checking strsim v0.10.0
Checking clap_lex v0.6.0 (/home/epage/src/personal/clap/clap_lex)
error: optimization level needs to be between 0-3, s or z (instead was `10`)
error: could not compile `anstyle-query` (lib) due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `clap_lex` (lib) due to previous error
error: could not compile `strsim` (lib) due to previous error
error: could not compile `utf8parse` (lib) due to previous error
error: could not compile `anstyle` (lib) due to previous error
error: could not compile `colorchoice` (lib) due to previous error
But it looks like the layering is all-or-nothing so setting one value might be ignored or cause other values to be ignored. That might be too disruptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should capture this reasoning within the RFC's rationale section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for having this in a profile, because I very much expect people will want to have this in some profiles and not in others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profile for now is a set of compiler settings. I am not sure if we want to expand the meaning of it to include SBOM. Also given SBOM is only meaningful to final artifacts, if we put it in profiles we need to document profile.debug.sbom
not working for dependencies. We already have some confusions for lto
and abort
(rust-lang/cargo#9330).
If people want to switch build configurations, should we work on stabilizing config-include
instead?
text/0000-cargo-sbom.md
Outdated
|
||
A SBOM (software bill of materials) is a list of all components and dependencies used to build a piece of software. The two leading SBOM formats being adopted by industry are SPDX and CycloneDX. Both are still evolving and have multiple specification versions & data formats (JSON, XML). | ||
|
||
New government initiatives aimed at improving the security of the software supply chain such as the US "Executive Order on Improving the Nation's Cybersecurity" or the EU "Cyber Resilience Act" require a Software Bill of Materials. Generating accurate SBOMs with Cargo is currently difficult because, depending on target selection or activated features, the dependencies may be different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we aim to participate in such schemes does that impose any new burdens on the project. E.g. would be inaccurate reports be considered a critical issue because it could possibly let security issues go undetected?
Would, if individual EU countries implement directives on a national level which use more expansive wording that imposes additional requirements that cargo does not fulfill, that become a priority because we committed to providing "useful" SBOMs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would, if individual EU countries implement directives on a national level which use more expansive wording that imposes additional requirements that cargo does not fulfill, that become a priority because we committed to providing "useful" SBOMs?
As-is, we aren't providing the final SBOM artifact but information that can feed into it. This intentionally leaves a lot of that information to the caller to get.
I expect the mix of end user and regulatory requirements to contradict (they already were in the Pre-RFC thread) which is why I'd want stricken from the RFC a future possibility of providing a final SBOM. We likely can't keep up, we likely can't maintain the compatibility requirements, and we likely can't satisfy them without knobs for everything.
If we aim to participate in such schemes does that impose any new burdens on the project. E.g. would be inaccurate reports be considered a critical issue because it could possibly let security issues go undetected?
The fun of "fit for use".
We'll be providing a report of what information we have. There is more information, like from build scripts linking external libraries, that we can't provide. The usefulness of any of this is dependent on all parties involved cooperating.
That said, for what we do provide, if there is a bug, does it need a CVE? Unsure? I'd personally just consult the security folks when it happens. This is less about direct attacks and more about the quality of monitoring.
I do wonder if this would be useful as a more general unit-graph report (which it isn't far from). For example, watch tools could use this information to know what changes to watch for for future builds. That might ease some of the pressure on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fun of "fit for use".
That said, for what we do provide, if there is a bug, does it need a CVE? Unsure? I'd personally just consult the security folks when it happens. This is less about direct attacks and more about the quality of monitoring.
I think it would be good to clarify in advance how much we're promising here. I suspect that down the road some large institutional users will start relying heavily on SBOMs and make noises when it's not working as they want it to.
text/0000-cargo-sbom.md
Outdated
|
||
The SBOM file generated by Cargo is *not* intended as a final SBOM artifact, but rather a precursor. Post-processing tooling can use the information produced here as part of building a final SBOM. | ||
|
||
The SBOM file will be written to disk before `rustc` is executed for the each artifact. This enables [`RUSTC_WORKSPACE_WRAPPER`](https://doc.rust-lang.org/cargo/reference/config.html#buildrustc-workspace-wrapper) to point at a program that can utilize the SBOM file to embed information into the binary if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the SBOM precursor file will be written first, is there an intention to remove it if the production of the artifact, including perhaps the execution of any RUSTC_WORKSPACE_WRAPPER
, fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this. Do you think it needs to be specified in the RFC?
text/0000-cargo-sbom.md
Outdated
- Name | ||
- Version | ||
- Source (registry / git / path etc.) | ||
- Checksum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the challenge with replacing "version" entirely with "checksum" is that a RUSTC_WORKSPACE_WRAPPER
might want to consume the version information to do their own logic, so it's not just useful for validation purposes (which is what the checksum provides).
I am wondering what this is worth at all. If there is the slightest chance to fake this, it's less meaningful. E.g. Say there's an exploitable dependency and it takes a newer version to fix that. But an artifact wants to provide that exploitability while pretending not to have it. Easy, if it fakes the SBOM to declare the newer version, while Cargo.toml uses the older one. OTOH, if I am wrong, and SBOM were ok: Features matter! The used crates can be vulnerable or not, depending on which features are activated. |
text/0000-cargo-sbom.md
Outdated
- Checksum (if available) | ||
- Dependencies | ||
- Type (normal, build) | ||
- Activated features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NTIA released Minimum Elements for a SBOM. Currently, mainstream SBOM formats (e.g SPDX
, CycloneDX
, SWID
) are all extend from this minimum elements and add some optional fields.
May be we can also set the format align with the NITA minimum elements, like add author
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 3052 intentionally made the author field optional.
Any SBOM initiative that can't cope with anonymously/pseudonymously authored code is overreach imo. The code can be perfectly viable. And it doesn't change the fact that authors can vanish or give false contact information anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is mentioned below this section, you can query other metadata by looking up the dependency in cargo metadata
based on the id, if the crate has published author metadata it will be available there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 3052 intentionally made the author field optional.
Any SBOM initiative that can't cope with anonymously/pseudonymously authored code is overreach imo. The code can be perfectly viable. And it doesn't change the fact that authors can vanish or give false contact information anyway.
I know what you mean. Maybe we're talking about two dimensions, and you're right from a security point of view. Simply from the point of view of this being a tool, providing the author field may be a better way for developers to use this tool to generate SBOM. :)
At some point, you have to trust something. You also have to deal with the chain of trust after this file is written. For tests, I at least assume people are most likely to read this file in dedicated "production build" jobs which, in my experience, do not run tests. For |
The question is, whether this is only for validating one’s own artifacts’ dependencies by the like of Blackduck. Then it might be helpful by giving developers and maintainers a vulnerabilities overview. Which still leaves room for “the enemy within” attacks. Or is it to accompany binaries on the internet, to prove something about them? How could it? Either way, a chain of proof is hard to establish. If the SBOM production doesn’t have hard guarantees, it’s the weak link… |
If what you are looking for is a guarantee that a generated dependency list is the one generated for the binary, from inception to your system, then this is not that feature. #2801 isn't even that feature (atm). |
This RFC is intended to expose accurate dependency information for other tools to consume. It's not intended to guard against malicious crates or build scripts. SBOMs are only part of the solution to software supply chain security. |
Is the premise here that It just seems like something that should be an installable cargo command (like |
I feel like this is covered in Alternatives, particularly:
This is basically a dump of cargo's unit graph at the end of the build so other people can build their own tools on top of this. There is no other way to get information like this at this time. This also opens the door for |
I think some of the questions around possible adversarial manipulation of SBOM data are basically asking for a threat model. Do we expect packages to be malicious, do we do anything to protect against malicious action by them? It's probably worth writing down explicitly in the RFC (I'm interested in helping if help is desired), at least so there's clarity. |
We could start with something like:
|
In the Pre-RFC I saw this referred to as an "SBOM fragment". Would using that language help with some of the worries and confusion between regulatory SBOM, existing SBOM formats, and SBOM data that is easiest to access from Cargo? |
text/0000-cargo-sbom.md
Outdated
The SBOM file will be written to disk before `rustc` is executed for the each artifact. This enables [`RUSTC_WORKSPACE_WRAPPER`](https://doc.rust-lang.org/cargo/reference/config.html#buildrustc-workspace-wrapper) to point at a program that can utilize the SBOM file to embed information into the binary if desired. | ||
|
||
## Format | ||
The format will use JSON, but the exact format is not specified in this RFC. Additional fields can be added as needed. The JSON will include a `format-version` in case breaking changes are necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the config needs to support specifying the format version so people can opt-in to the new version or leave that as a future possibility? Do we tie this to the default (when its true
) to the edition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets a hearty 👍 from me as the author of cargo auditable
and the current maintainer of cargo cyclonedx
.
Nearly everything needed for these tools to work in a precise and robust manner is covered, and it would resolve long-standing issues in both tools. Off the top of my head, this would resolve:
- Cargo Resolver V2 (different feature sets for build and runtime dependencies) is not supported rust-secure-code/cargo-auditable#38
- Features are always resolved at workspace level rust-secure-code/cargo-auditable#66
- Can't build recent
gitoxide
versions rust-secure-code/cargo-auditable#124 - cargo metadata tries to collect dev dependencies rust-secure-code/cargo-auditable#128
- Capture data only available during the build process CycloneDX/cyclonedx-rust-cargo#532
All of which are impractical or infeasible to fix without this RFC.
My only concern is that the RFC doesn't describe a mechanism for the SBOM file to be discovered by RUSTC_WORKSPACE_WRAPPER
. If cargo auditable
were made to consume these SBOMs, it would require some way to discover the SBOM file.
An environment variable with a path to the SBOM set by cargo
when it executes RUSTC_WORKSPACE_WRAPPER
would be ideal.
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
The generation of SBOM information is controlled by Cargo's configuration. To enable SBOM generation, set the following configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest changing the name SBOM. The current text is a bit misleading, for example “enable SBOM generation”. Though, I don't have a better name in mind :(
text/0000-cargo-sbom.md
Outdated
```toml | ||
[build] | ||
sbom = true | ||
``` | ||
|
||
Or use the environment variable `CARGO_BUILD_SBOM=true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profile for now is a set of compiler settings. I am not sure if we want to expand the meaning of it to include SBOM. Also given SBOM is only meaningful to final artifacts, if we put it in profiles we need to document profile.debug.sbom
not working for dependencies. We already have some confusions for lto
and abort
(rust-lang/cargo#9330).
If people want to switch build configurations, should we work on stabilizing config-include
instead?
## Build scripts | ||
Build scripts could communicate back to Cargo to inject additional dependencies into the SBOM. For example, if a crate builds `C` code and then links with it, it would emit a build script instruction that causes Cargo to read in a file describing the `C` dependency. | ||
``` | ||
cargo::sbom=<PATH> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Cargo is not able to handle multiple values under the same instruction key, should we explicitly call out that multiple paths must be joined via std::env::join_paths
?
|
||
Or use the environment variable `CARGO_BUILD_SBOM=true`. | ||
|
||
If enabled, an SBOM file will be placed next to each compiled artifact for `bin`, `staticlib`, `cdylib` crate types in the `target` directory with the name `<artifact>.cargo-sbom.json`. The SBOM will contain information about dependencies used to build the compiled artifact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to deal with duplicate artifact name. Cargo doesn't really handle the issue at this moment. In SBOM it is more unacceptable and must be resolved. See rust-lang/cargo#13709 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated SBOM could link back to the artifact that it corresponds to in some way.
To be useful for cargo auditable
use case, it needs to be generated before the final binary, so things like a hash of the binary aren't workable. I think a field in the JSON indicating which file name it corresponds to is best.
- Source (registry / git / path etc.) | ||
- Checksum (if available) | ||
- Dependencies | ||
- Type (normal, build) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to distinguish build dependencies from normal ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intent here is so that dependencies that were only used to build build scripts could be easily filtered out. However, it's also possible that this field could be removed if it's easy enough to build dependencies based on crate-type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important to know whether e.g. OpenSSL was used at build time to compute something once, or is actually included in the generated binary. That determines whether you need to patch it or even take it offline ASAP because of a new critical CVE or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some notes on the representation of the dependency tree, PTAL.
- Version | ||
- Source (registry / git / path etc.) | ||
- Checksum (if available) | ||
- Dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there are two ways of looking at dependencies: what each package needs, and the final resolved graph.
For example, if one package depends on rand
with features = ["std", "getrandom"]
, and another with features = ["std", "simd_support"]
, the final resolved features will be ["std", "getrandom", "simd_support"]
. Depending on the use case you may need either or both representations (direct package dependencies and the resolved graph).
cargo metadata
exposes both (under "packages" and "resolve" fields), but inaccurately:
- cargo-metadata always resolves features at the workspace level cargo#7754
- Under resolver v2, it conflates the normal and build-dependency trees
I think it would be best for the SBOM to also expose both, accurately this time.
- Selected profile values | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Selected profile values | |
- Selected profile values | |
- Two resolved dependency trees: one for normal dependencies, another for build dependencies, matching the behavior of [cargo features v2](https://rust-lang.github.io/rfcs/2957-cargo-features2.html) | |
|
||
Checksum is an optional field, since only crates from registries have checksums. If a checksum is needed for a crate that comes from a path dependency for example, it will be up to the post-processing tool to produce an appropriate value. | ||
|
||
If further information is needed (such as license), then the post-processing tool can use `cargo metadata` or another mechanism to find it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not required for the MVP, I think it would be best if the SBOM file could contain all the info from cargo metadata
eventually. It is desirable for SBOMs to list the licenses of the dependencies. And there are situations where you can run the build and get the SBOM file, but cannot run cargo metadata
: rust-secure-code/cargo-auditable#128
Alternatively this could be addressed by evolving cargo metadata
, but having to only deal with one input file and one data format would be easier on the post-processing tools.
What are the next steps and approximate timeline for this to be merged? |
This RFC adds an option to Cargo that emits a Software Bill of Materials (SBOM) alongside compiled artifacts. Similar to how Cargo emits split debug info or "dep-info" (.d) files, this change emits an SBOM in a Cargo-specific format alongside outputs in the target directory. External tooling or Cargo subcommands can consume this Cargo SBOM file and transform it into other SBOM formats such as SPDX or CycloneDX.
Originally posted on internals as a pre-RFC, now moved to an RFC.
Rendered