Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specification first cut #9

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 265 additions & 1 deletion specification.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,279 @@
# Specification

This document serves as the 0.1 specification for SBOMit (SBOM on in-toto).
The specification should not be considered stable at this time.

## Introduction

### Motivation
A major problem with the state of an SBOM today is that the documents are
often inaccurate. This is largely because the documents are derived by looking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but "inaccurate" in what sense? When practitioners like @hosseinsia say that an SBOM is "inaccurate," they mean that the list of dependencies may be incomplete or even inaccurate. The meaning written here is quite different, and not yet what people who need SBOMs worry about in practice. I think we need a stronger motivation with a better story.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @trishankatdatadog mentioned, at the moment inaccurate means that different tooling might produce SBOM with different list of dependencies, some of which do not really exist or are not used in a software, and some others might not be detected due to them being added dynamically later or through chains of dependencies.

However, one thing that is lacking in almost all of SBOM reports (and I believe in-toto can help) is a mechanism to verify their claims. At the moment, you have to reproduce SBOM using the tooling or other's tooling and compare two SBOM files (semantically, as the formats and order of them might be different). In my ideal world, if SBOMit could provide a proof/reasoning why a SBOM tool believes that this dependency should be included would be really useful. Basically that is my interpretation of 'verifiable SBOM'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, would you like to take a cut? I don't have enough context on how people are using SBOMs in different steps.

at the resulting software and trying to understand what happened in the past.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different places that SBOM is generated. One could be sourcecode-based SBOM. In that case, this statement does not really fit. I think the only place this makes sense is when SBOM is generated after build (like the way Software Composition Analysis (SCA tooling) works). There are also other tooling that can extract SBOMs at runtime, and that have the capability of extracting list of dynamic dependencies.


This specification proposes a means to generate metadata for an SBOM while the
software is being created. Furthermore, the means by which this information

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called build-time SBOM. Do you want to limit SBOMit to that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we want to cover all cases, as in-toto does.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Then the phrase SBOM while the software is being created might need to be revised to prevent others implying build-time SBOM.

is captured uses (in-toto)[https://in-toto.io] attestations and layouts. This
provides cryptograpic validation that this information is correct, handles
key distribution and management to indicate which parties should be trusted
for each step, and captures information about the environment in which the
steps are run.

As a result, using SBOMit provides a more accurate SBOM when parties are
honest. When malicious parties interfere in the process, SBOMit provides
a mix of traceability (knowing which party was malicious) and prevention
(blocking malicious software from being trusted), depending on how the
in-toto steps are configured..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also depends on what the attacker has compromised. A compromised key/malicious user could be traced, while an unauthorized user could be prevented, etc



### Definitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about these names. They are not immediately self-descriptive, and I think we can do better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to changing them, especially SIT. I just threw that together on the flight.


SIT -- An SBOM which is derived from a SBOMit document. It can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting: by "derived," do you mean that an SBOMIT document is something like a high-level intermediate representation from SBOMs that can then be used to generate low-level "machine code," so to speak?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can think of it this way. It is like the complete "source" document which contains things like in-toto metadata which will not be in the SBOM (SIT).

be in any SBOM format. It points back to the SBOMit document, which was used
to generate it.

SBOMit document -- A long form document which contains sufficient information
to verify the supply chain process is correct. It may be used to generate one
or more SIT.

## Threat model

### Root of trust for layouts
One key goal of our design is to provide as much security as possible in all
cases. Security should degrade gracefully when the attacker gains new
capabilities. Hence, we will consider attacker models where the attacker is
exceedingly powerful and try to restrict the damage they can do even in those
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exceedingly powerful" is still a little bit wishy-washy. Can they, for example, compromise all of the keys used to sign root layouts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I list that case below.

cases.

There are two types of protection SBOMit commonly provides:
* traceability -- this enables a party with an SBOMit document and the supply
chain to later determine which parties acted in a malicious manner. Note
that this may require some further manual analysis (e.g., to determine if
an emitted binary is actually a correct output of a build server given
a set of source code). However, importantly, properties like non-repudiation
hold, so that a party cannot appear to be innocent given a re-execution of a
supply chain with all honest participants.

* prevention -- this is when an attack is blocked from having any impact. This

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include attacks such as dependency confusion attacks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talk about this below with the OSS risks...

often occurs because an attack causes some aspect of validation to fail,
resulting in no impact to the end user. This can include situations like an
inability to inject malicious metadata into the supply chain, a policy check
in an in-toto layout causing a resulting SBOMit document to be rejected,
or an inability to sign because of a lack of access to a private key trusted
for that signature.

Note that a party verifying a SIT (only) does not gain the same security
guarantees. They instead gain the same sort of guarantees that signed SBOMs
have today. However, the SIT contains a reference to the SBOM document, so
it may be used to obtain the necessary information to perform more complete
verification.


The **actors** of the system are the following, each of whom possess a unique
private key:
* a series of in-toto functionaries that perform the actual steps of the
software supply chain. For example, a build server is functionary.

* a SIT mutator (per SIT), which is a process that adds supplemental information
that will appear in an SIT. This is likely performed by a combination
of human actions (e.g., listing supplemental information) and automated
tooling (e.g., adding information for a specific extended SBOM format).

* a SIT generator (per SIT), which generates the actual SIT file.

* an in-toto layout creator, who specifies the keys used by other parties
and writes the policy (in-toto layout) that indicates how different steps
of the software supply chain interrelate. This party also serves as the
SBOMit root-of-trust.

* a (possibly empty) set of in-toto sub-layout creators. These are identical
in action to the in-toto layout creator, but they are only trusted for the
subset of the layout which the layout creator delegates.



We are able to limit attacker damage and/or trace the location of the
compromise(s) in cases where:
* an attacker may possess cryptographic keys for any of the functionaries (the
parties performing the software supply chain steps) in the system.
**Impact:** Prevents modification of items not allowed by the in-toto layout.
Provides traceability in all cases. No impact without the ability to
interject this metadata into the supply chain.
* an attacker may tamper with one or more steps of the software supply chain.
For example, the build process, testing, packaging, etc.
**Impact:** Identical to the prior case.
* an attacker obtains a sub-layout key. Note that this also requires the
ability to inject sublayout metadata into the supply chain such that other
parties include it. **Impact:** The impact could be prevented depending on
the restrictions placed upon the delegated sub-layout. However, traceability
always exists. This is also similar to a functionary compromise.
* an attacker may become a man-in-the-middle between any steps of the
system. **Impact:** Without further capabilities, this has no impact.
* an attacker may possess the key used to sign the SBOM resulting from the
SBOMit process. **Impact:** Prevention for parties performing full SBOMit
verification. Traceability otherwise.
* an attacker may compromise an SBOM mutator key or an SBOM mutator may act
maliciously. **Impact:** Traceability in all cases. Depending on the
changes made to the SBOM, this may be detected by review of the resulting
SBOM. Changes that override in-toto derived information are specifically
flagged and unlikely to be accepted. XXXXXX
* an attacker is able to compromise a SIT generator key or a SIT generator
directly. **Impact:** Protection for clients who obtain the SBOMit document.
Traceability for other clients. Note that the SIT will contain a refernce to
the SBOMit document, but this also may be modified in this case. However,
the client should have the in-toto layout key and so will notice this
action if retrieving the SBOMit document from the the SIT.
* an attacker is able to compromise the in-toto layout key, which serves as
the root-of-trust for the system. **Impact:** Traceability. Later analysis
can show this was the cause, but users will trust a new, maliciously
generated layout which replaces signing keys.

From (Endor Lab's top 10 OSS risks)[https://www.endorlabs.com/blog/introducing-the-top-10-open-source-software-oss-risks],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should come much earlier. I really don't understand what this threat model is trying to say. What sort of attacker goals are we most worried about? Those tampering with just the list of dependencies, or does the tampering extend to the software artefacts themselves?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers both cases. I don't want to describe someone else's work here, but wanted to give some pointers to an external list to show we're handling a broad set of cases.

our design largely addresses:

* OSS-RISK-1
* OSS-RISK-2
* OSS-RISK-5
* OSS-RISK-9


Orthogonal systems that can be used along with SBOMit:

* A software supply chain tool which attests to the quality of steps, such
as FRSCA and SLSA. This helps with knowing that an individual action is
actually a good security practice, could judge the quality of implementation
for different tools, determine if the code base metrics are indicative of
quality, understand if the licence is appropriate, etc. This helps to address
OSS-RISK-4, OSS-RISK-8, and OSS-RISK-10. OSS-RISK-7 may also be addressed
either by this or by end user diligence.

* Some means like Sigstore's root of trust is needed to Handle how users know
the correct name / root of trust for the software they are installing. This
relates to OSS-RISK-3.

* Recursing into components like the packages inside of a container image when
the build process does not otherwise do so. This relates to OSS-RISK-6. As
both tooling and in-toto adoption increase, this issue should naturally be
addressed as more metadata becomes available.

There is also (a more detailed analysis which shows why these properties
hold)[TODO]

## Format/required fields

### SBOMit Document

#### Overview

There are four items in a SBOMit document:

* An in-toto layout. This specifies the policy that the supply chain followed.
This includes information about the key used to sign each SIT and
corresponding the SIT generation process for each. It must be signed by the
layout key trusted for the SBOMit document.

* A collection of in-toto metadata. This includes sub-layouts, link metadata,
and attestations. This must validate according to the in-toto layout and
so must be signed by the appropriate sub-layout creators and functionaries.
JustinCappos marked this conversation as resolved.
Show resolved Hide resolved

* The SBOM mutation. This metadata describes how to mutate the SBOM
information derived from the in-toto metadata. It essentially serves to
flesh out the derived SBOM to make it more complete that what can be
strictly verified. This is in ?ISO 1234 JSON patch? format.

* Zero or more addendums. These are represented as diffs to the prior sections
of the document. It enables one to easily append or modify information while
retaining historical information. These are also in JSON patch format.

#### Detailed format

TODO

### SIT

#### Overview

A SIT is a valid SBOM (of any type), with the folloeing two constraints:
JustinCappos marked this conversation as resolved.
Show resolved Hide resolved

* it has a field indicating the SBOMit document that it was derived from.

* it is signed by the functionary key listed in the related step of the
in-toto layout

#### Detailed format

TODO...



## Workflows

There are two main workflows for SBOMit documents and SITs: generation and
verification.

### Generation

#### SBOMit document
To generate a SBOMit document, one must generate the consituent parts. For the
in-toto layouts, sub-layouts, attestations, and links, this process is
described in the in-toto project's documentation. Note however, that for the
purposes of SBOMit, the layout is required to include certain information
so that this information may be used to populate the SIT.

Generating the mutator by hand may done by starting with a null mutator and
then using tooling to generate a SIT in the correct format. The SIT file may
be modified and the JSON diff may be computed.

Similarly, if a SBOMit document is modified, one may elect to add an addendum
to modify it instead of generating a fresh document. This is easily done by
generating the new SBOMit document and then creating a JSON diff.

#### SIT

The SIT is generated by first deriving common information from in-toto and
placing it in an intermediate format called X. This format is effectively
a SBOM itself which matches the NTIA format....

Then the tool specified in the layout may be used to derive the SIT in the
appropriate end format. Note also, that multiple SITs may be generated from
the same SBOMit document. So, for example, one SBOMit document could generate
both a SPDX SIT and a CycloneDX SIT.

An important part of generating the SIT is applying the mutator to the output
of the tool used to derive the SIT. This adds or modifies information in the
resulting SIT as instructed. This works as though applying a JSON patch to
the SIT document.

It is recommended that the SIT generation tool provides a warning if the
mutator modifies (e.g., replaces or removes) portions of the SIT derived from
the in-toto information. This is becuase this will be flagged as a security
risk when verification of the SBOMit document is performed.

### Verification

#### SBOMit document

SBOMit document verification proceeds by the following steps:

* verifying the in-toto layout over the provided in-toto metadata.
* checking each SIT to ensure they are signed by the correct SIT generator
* generate each SIT using the relevant tooling to examine the mutator
behavior. Any parts of the SIT that the mutator has chosen to overwrite or
remove represent a substantial security risk. These are specifically
flagged during validation. The user must specify a flag like
--insecure-allow-in-toto-override or respond to a similarly scary alert to
install the software.

If this matches the generated SITs, then the SBOMit document verification
has succeeded.

#### SIT

A SIT may be verified in two ways.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who verifies an SIT vs an SBOMit document? Should the SIT verifier get some assurance that someone checked the SBOMit document?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is possible to do. A SIT is an SBOM (CycloneDX, SPDX, etc.) and I don't think we can add custom verification (as I understand it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could at least add an attestation that someone verified the full SBOMit document (maybe as a wrapper around the SBOM verification). Otherwise the end user is only getting a signed SBOM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine but if they have to add functionality to their SBOM verification, then it seems like they could just do the SBOMit document verification. Or do you think that people will want to just check the attestation and then say it is okay?

I see checking an attestation as roughly equivalent to signing the SBOM with the attestation key.


First, the SBOMit document may be downloaded and then verified. This provides
strong security guarantees, but is a fairly heavyweight operation.

Second, the SIT may have its signature checked, much like any signed SBOM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the key for this distributed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm imagining the same as you would usually for an SBOM. I'm actually not completely clear how this process works in every case today. I imagine it is like in-toto layout keys, where they are just assumed to exist on the client, but I would love to learn the "correct" way, if I'm wrong.

This provides a much weaker set of security guarantees. However, assuming the
SIT generator performs verification correctly and is uncompromised, this does
provide a meaningful level of security.