-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support layer deltas #902
base: main
Are you sure you want to change the base?
Support layer deltas #902
Conversation
To test this I've been using "skopeo copy", it works fine if you just https://github.com/alexlarsson/skopeo/tree/wip/deltas I also create a set of test images on docker hub: This has a few different versions of fedora 32, including deltas between then, which were created like this:
As an example of using this is:
There are some outstanding questions. I put tar-diff on my github. Should we move it to the containers The names of the media type and annotations now contain "redhat", The use of tags for the deltas makes it necessary to manually garbage Also, currently the code is always using a regular OCI image config |
Force-pushed with sign-off and adding the test scaffolding for the new interface methods. |
Hi @alexlarsson, thanks! I am quite excited about your work. At the moment, I am deeply buried in Podman work, so I'll have a hard time to find enough time to review such a fairly complex PR. I'll try to allocate a couple of hours next week. @giuseppe @mtrmac @nalind PTAL CC @fatherlinux @mrunalp @rhatdan as it's super rad. |
Also, @josephschorr, you might want to take a look at this as you're involved with both quay and OCI artifacts which are related to this. |
Maybe we can use a single tag |
Also, CC @owtaylor |
CC @vbatts |
ideally it should be part of the OCI specs, but since it is not a breaking change and if there are no deltas we just fallback to the existing code path, we could probably carry it without blocking on OCI. @vbatts is https://github.com/opencontainers/artifacts the proper place to discuss and develop such stuff? |
The artifacts spec is more about detailing how in general you would store something other than pure docker images on a registry, they are not involved in standardizing the format of individual artifacts (other than possibly to ratify/list known media types). If anything this should be related to the image spec itself, but as you say it is quite independent of it and it doesn't necessarily have to be affected at all by this, which is nice as getting such changes in would be a very slow work... |
I pushed a new commit (which hasn't shown up here yet, whats up with github?) that changes it to use a single branch "deltas" which has an OCI index and then for each image in that a The one issue with this is that updating the deltas tag seems a bit racy. We download the old one, update it and put it back, and if someone else does this at the same time we would lose one of the updates. Is there any way to do atomic updates of oci images? Like, set the tag to this new blob, but fail if the current version is not $digest. Then we could loop it until we successfully updated it. |
2d26699
to
1f8944d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I need to handle a few bugs in the coming days, so for now here just a few quick thoughts on things that caught my eye.
// It may use a remote (= slow) service. | ||
// If instanceDigest is not nil, it contains a digest of the specific manifest instance to retrieve deltas for (when the primary manifest is a manifest list); | ||
// this never happens if the primary manifest is not a manifest list (e.g. if the source never returns manifest lists). | ||
GetDeltaManifest(ctx context.Context, instanceDigest *digest.Digest) ([]byte, string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, an API break.
I suppose we’ll now have to figure out how to extend these interfaces without breaking API, it had to come one day.
(Let’s figure out the right approach first and deal with the API later, just highlighting that this will have to happen.)
copy/copy.go
Outdated
var matchingLayers []*types.BlobInfo | ||
for i := range deltaLayers { | ||
deltaLayer := &deltaLayers[i] | ||
to := deltaLayer.Annotations["com.redhat.deltaTo"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation name, and the concept of “RootFS”, feels like something that should be a part of image.genericManifest
/types.Image
instead of assumed to be generally applicable. I’m not immediately sure how practical that is.
@alexlarsson The referenced blobs in the |
Gah! Please don't design calls that need to loop against registries! If you need atomic updates, then it likely needs to be done as an extension to the registry protocol. |
The "from" blobs need not be served at all. The setup is like this:
Here, |
It seems like a standard way to do an atomic update though. You get the current version of the index, update it locally, and the PUT it with an If-Match header containing the current ETAG. The PUT will then succeed only if (atomically) the current version of the tag is the same as the one in the header. If it is not it will fail, and we loop. Its not like we're busy looping, we will only loop in the case where there is an actual conflict. |
It could, however, result in a lot of clients looping because they are all hitting against one another. This is a real concern with any design consideration for an API that will likely be used by millions of clients.
Yeah, be aware Quay will (correctly) fail to validate this. Also, it feels like if this "delta" manifest is a manifest representing the delta, then the information about
Please don't fallback. If you do, someone is invariably going to push the content with the "wrong" media type to a registry that does not do validation, then they'll try to mirror that content into a registry that does do validation, it will fail, and the mirroring registry will get blamed. We should not be abusing the content types simply because registries do not yet support artifacts.
I think using an OCI Index here is the way to go: The index could contain the "normal" manifest as an entry, as well as the delta manifest as a secondary entry. That way, the delta+normal manifest can be tagged as a unit, and it'll also hold the references to things as a single GCable unit. If a new delta needs to be generated, the previous should be contained in the new index as well (up to some limit, after which point you could compact the deltas or somesuch). |
(That does not quite work in full generality; OCI in theory supports many possible shapes of index trees, but in practice clients are not going to be interperable by doing a recursive search and combining the metadata from various levels of the tree using completely undefined criteria. There are going to be more restrictive “profiles” of the standard. As far as c/image is concerned, OCI use mirrors the Docker schema2 use: index ~ manifest list, a single-level lookup for multi-arch images. Sure, adding delta pseudo-architectures or something like that to the existing single level of an OCI index would be possible, and it would contain registry-recognizable references to deltas.) But… Ultimately “garbage collection” is going to be problematic either way. If the registry supports deletion at all (many don’t), it won’t by default delete untagged manifests (because digest references exist, and if nothing else there is a race between looking up an index by tag and reading the component manifest by digest vs. deleting the index tag). So, manifests that reference deltas are going to keep piling up and prevent deletion of the layer deltas, unless maybe the user explicitly opts in into deleting untagged manifests, and we shouldn’t assume such an opt in. So:
|
It doesn't seem likely to me to have millions, or a handful constantly updating a single repository. Two clients trying to update a single repository at once is a rare situation - and the registry load of an extra rejected HTTP request - trivial.
If I understand you, you are suggestion that the deltas appear in the image index that tags point to - This isn't ideal for a couple of reasons:
Maintenance of deltas might also be also be something that is offered as a service on some registries - a user could define a policy (deltas from the last 5 versions of the 'latest' tag) and let the registry or an external service handle creating the deltas. In this case, you wouldn't want the registry creating a new index and changing where the tag points. Those are reasons why we have generally thinking of deltas as being side-car information to an image rather than being part of the tagged image/image-manifest. |
The way I see it there are three ways to do an atomic update API
The last one seems vastly more complex to even specify. I think CAS would perform better in the non-contended case, and locking if you have contention. However, I doubt you'll actually see contention in updating a specific tag, so imho CAS is the better choice. But anyway that is beyond this discussion.
If we claim it is of type
It could easily be in the config (only, or in both), and maybe that is conceptually more correct. However, in practice all that means is that we have to load and update two files (manifest and config) instead of just one.
What is the current status of artifacts in quay? I'm working on this because redhat uses OCI instead of ofstree for the fedora flatpak runtime and the performance (vs ostree) is horrible due to the lack of deltas. Lots of people are complaining about this, and if the solution is going to have to wait until everyone implements oci artifacts that may be a problem.
Yeah, this would definately work. I'm not really aware of how oci indexes are used in the wild. In ostree, a delta is a completely independent optimization and doesn't affect the digest of the content (which in ostree parlance would be the commit id). So, If at some point we upload a version of an image the tag will point to some digest, and then later we add a delta, this will change the digest of the index that the tag points to (even thought the final image digest is the same). Is this a problem to typical OCI use? (I honestly don't know) |
Not a breaking problem, but something to consider: Cluster orchestrators like OpenShift may well resolve tags to digests “early”, so that multi-node software deployments all use exactly the same image even if the tag moves. In that case, deltas added “later” would not be found by nodes that are pinned to a specific digest. |
Yeah, this seems bad to me, deltas are an independent transport thing an not part of the image itself. The original images are reconstructed perfectly, so nobody need to even be aware of the deltas. So, I think considering them a sidecar thing is the correct thing to do. Additionally, as mentioned before, referencing the deltas from the main manifests makes it very hard to later remove deltas, and deltas really only are useful for the latest versions of things. You rarely pull older versions, so its not worth wasting space on them. |
What we could do is have the |
@mtrmac We absolutely do delete untagged manifests. In fact, in Quay, if you have a manifest that is not referenced by a tag (either directly or indirectly via a manifest list), the manifest immediately becomes unpullable and will eventually be GCed.
@mtrmac Based on numerous discussions, it seems that this will not last once we have artifacts supported. Many individuals I've spoke to have stated they intend to place, for example, signatures at the top level along images, or source references, or other even Helm charts, etc, all found in the same index, at the same level; client tooling would then choose to make use of whatever it supports.
@owtaylor Sure... until you have automated tooling hitting 10,000 repositories on the same multi-tenant registry service, backed by the same database: then locking across indexes can become a major concern. We know this, because this has already been a concern for Quay, and I've argued strenuously against polling-based APIs as a result.
@owtaylor Yes, since they represent an alternative "view" of the same manifest layers. Otherwise, if we just use another specially-named tag, there is the major concern about overwriting, consistency and scaling. Basically, if I were generating deltas for a manifest, it would make sense to pull that manifest, compute deltas, then push the manifest back with the deltas "added", for lack of a better term.
@alexlarsson Experimentally, but not in production today. Artifacts spec is, after all, not yet standardized (heck, neither is OCI distribution, for that matter)
@alexlarsson Yeah, hence why I said its better spec-wise, but not a requirement in my mind.
So, ultimately, I think this discussion is showing why storing deltas-as-artifacts is conceptually not appealing. If the intention is to have this delta support be generic, then we should not be using deltas-in-manifests at all, but rather creating an extension to the OCI image distribution protocol that provides delta computation automatically based on the API call's parameters. That would have the benefit of working for all existing manifests, it would allow for far less storage of duplicative information, and it could be added to existing registries without too much overhead; any registry that didn't want to support it, simply wouldn't. |
That’s good to know; still, docker/distribution doesn’t by default (although there is an option), so the design has to account for that.
Sure; c/image is a library for handling container images, not arbitrary payloads (and deltas of this kind are clearly not applicable to arbitrary payloads). It may well not make any sense to support Helm charts in that codebase. It all depends on how much they end up having in common and what contributors are interested in.
It doesn’t make much sense to me to make delta computation a registry responsibility: the publisher of the image is at least as much invested in making images quick to pull, and can invest the resources (delta creation is fairly resource-intensive) instead of the registry. Registries computing deltas completely on-demand seems fairly unlikely given the resource requirements anyway. (Registries may want to publish deltas even if the image publisher didn’t, because they would benefit from the deltas — but that would be a better fit as a data-based targeted process.) It is also definitely easier and simpler to deploy deltas to and make them easy to mirror if that doesn’t require server-side support. Sure, that argument is treacherous because it can motivate all kinds of nasty hacks, but in this case I’m not sure that integrated server-side support would give us benefits that are not easily obtainable otherwise, and outweigh the limited/delayed support. |
That does seem possible in the API. |
How would i do this in containers/image? There is a ImageReference.DeleteImage() operation, but it works on named references, and that would in this case be the |
This is based on the OCI delta work at: containers/image#902
I merged the flatpak support for this, hope to get it into an unstable release so we can start using it. |
@alexlarsson Needs a rebase. |
@lumjjb PTAL |
Deltas are a way to avoid downloading a full copy if a layer tar file if you have a previous version of the layer available locally. In testing these deltas have been shown to be around 10x to 100x smaller than the .tar.gz files for typical Linux base images. In the typical client-side case we have some previous version of the image stored in container-storage somewhere, which means that we have an uncompressed files available, but not the actual tarball (compressed or not). This means we can use github.com/containers/tar-diff which takes two tar files and produces a delta file which when applied on the untar:ed content of the first tarfile produces the (bitwise identical) content of the uncompressed second tarfile. It just happens that the uncompressed tarfile is exactly what we need to reproduce, because that is how the layers are refered to in the image config (the DiffIDs). How this works is that we use OCI artifacts to store, for each regular image a manifest with information about the available deltas for the image. This image looks like a regular manifest, except each layer contains a tar-diff (as a blob) an uses the existing annotations key to record which DiffIDs the layer applies to. For example, a manifest would look like this: ``` { "schemaVersion": 2, "config": { "mediaType": "application/vnd.oci.image.config.v1+json", "digest": "sha256:<config file hash>", "size": 3 }, "annotations": { "io.github.containers.delta.target": "sha256:<image_manifest_hash>", }, "layers": [ { "mediaType": "application/vnd.tar-diff", "digest": "sha256:<tar-diff delta file hash>", "size": 7059734, "annotations": { "io.github.containers.delta.from": "sha256:<old layer diffid>", "io.github.containers.delta.to": "sha256:<new layer diffid>" } } ] } ``` The config blob is just empty. Ideally it should not be of type application/vnd.oci.image.config.v1+json, because that is reserved for docker-style images. However, docker hub (and other registries) currently don't support any other type. For registries that support OCI artifacts we should instead use some other type so that tooling can know that this is not a regular image. The way we attach the delta manifest to the image is that we store it in the same repo and the we use a single tag named `_deltaindex` pointing to an index with all the delta manifest in the repository, with the digest of each target image in the `io.github.containers.delta.target` annotation key. The delta layers record which DiffID they apply to, which is what we want to use to look up the pre-existing layers to use as delta source material, and it is what the delta apply will generate. This means however that using the deltas only works if we're allowed to substitute blobs, but this doesn't seem to be an issue in the typical case. Signed-off-by: Alexander Larsson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexlarsson This is a nice feature to have! I'm trying to think about how this would play with the OCI layer suffixes. My understanding that it is a diff on the tar level, so it will work on compressed/encrypted layers and that will be handled on the client side from the build perspective.
I would imagine that this would work very well with encryption as well... that the deltas themselves could be encrypted as well... So from a user perspective they could use deltas even on encrypted images.
@rhatdan @mtrmac I think there may be some interactions between the two flows that need to be worked out (possibly a separate PR in the future?). I am wondering if for now we add a couple comments on encryption interactions, and maybe add a few checks to make sure the features are not used together yet.
I've put in a few comments where I think we could check on some of these interactions.
// Convert deltaStream to uncompressed tar layer stream | ||
pr, pw := io.Pipe() | ||
go func() { | ||
if err := tar_patch.Apply(wrappedDeltaStream, deltaDataSource, pw); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the deltas always uncompressed/unencrypted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deltas are inherently compressed. I.e. they are not wrapped in a compression layer but instead compressed on the parts inside that need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, they are always unencrypted, so layering wise we might want to wrap some encryption around them.
switch srcInfo.MediaType { | ||
case manifest.DockerV2Schema2LayerMediaType, manifest.DockerV2SchemaLayerMediaTypeUncompressed: | ||
return true, manifest.DockerV2SchemaLayerMediaTypeUncompressed | ||
case imgspecv1.MediaTypeImageLayer, imgspecv1.MediaTypeImageLayerGzip, imgspecv1.MediaTypeImageLayerZstd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interaction for encrypting path with decryption routine
Encryption mediatypes https://github.com/containers/ocicrypt/blob/master/spec/spec.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canUseDeltas() currently returns false (due to the above media checks) for encrypted manifests, because for now we don't support encrypted deltas. However, once we do these checks need to be widened, yes.
@@ -1064,6 +1145,52 @@ func (ic *imageCopier) copyLayer(ctx context.Context, srcInfo types.BlobInfo, to | |||
} | |||
} | |||
|
|||
// First look for a delta that matches this layer and substitute the result of that | |||
if ok, deltaResultMediaType := ic.canUseDeltas(srcInfo); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interaction for encrypting path with toEncrypt
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here, it seem alright to me. Applying deltas naturally create the uncompressed layers, so we get the diffid etc. That said, I'm not well versed in how crypto works here, so I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... so the toEncrypt
flag tells the copy process to encrypt the blob when passed into copyBlobFromStream
. My concern was if any deltas are created by this process, that they would need to support encryption. If they are not, then it should probably be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy never generates deltas, it just applies them to regenerate the tar. Creating deltas is currently done by a separate patch to skopeo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of encrypted images, skopeo generate-delta
will have to take encryption key as an additional argument to encrypt the deltas. skopeo copy
already supports taking decryption key as an argument to decrypt the image layers, so that can be re-used to decrypt deltas too.
@alexlarsson Thanks for the responses! Agreed on the points and design, just kind of going through the thought exercise.. In general, it looks like there are no design conflicts. I am fine with maybe having a comment or line in a docs to make this more apparent that encrypted layers are not currently supported with deltas. And once this is in, I can open up an issue to keep track of some of the TODOs for support once both technologies become mature and there is an ask. |
@alexlarsson I wasn't able to test https://github.com/alexlarsson/skopeo/tree/wip/deltas using local registry. ./skopeo generate-delta docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] Error creating image: error pinging docker registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client Just like skopeo $ ./skopeo generate-delta --src-tls-verify=false docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] unknown flag: --src-tls-verify
[harshal@localhost skopeo]$ ./skopeo generate-delta --dest-tls-verify=false docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] unknown flag: --dest-tls-verify |
@harche Added a commit to the skopeo branch that does this. |
I'm off on PTO from some time next week, so might not see comments here. |
Implements the logic in containers/image#902 by adapting it for Image-Serve local registry. Use custom mediaTypes for TripleO tar-diff blob and media/index OCI artifacts. This indicates that layer deltas are only for local UC-to-OC containers images distribution purposes and have no plans to support fetching deltas from external registries, like dockerhub. Change-Id: Id50bb4fc91e1bcb8f3eb53c3601c358b51b1a468 Signed-off-by: Bogdan Dobrelya <[email protected]>
I noticed that containerd 1.4 is out with stargz support. Would have been cool to get delta support in before that... |
Could this have some attention please? @mtrmac |
This came up again when discussing ostreedev/ostree-rs-ext#69 I'll just write this here briefly; there are 3 distinct things which make sense separately, and can be combined as they're conceptually orthogonal:
The way I'd say this is that "split reproducible blobs" aids both "zstd dynamic" and "pre-computed deltas" - it means less work for deltas. "zstd dynamic" and "pre-computed deltas" more heavily overlap, but they have somewhat opposite advantages/disadvantages, so it depends on the situation. |
Stumbled on this, and it occurs to me that this may be easier to do using the diff blobs as artifacts, now that OCI referrer/"artifact" support exists (or is very close to existing). |
Ah, rad. I was briefly concerned for how people would expect this to interact with Clair, but if the invariant that "a client can always pull the original blob" is maintained, it shouldn't be an issue. |
Deltas are a way to avoid downloading a full copy if a layer tar file
if you have a previous version of the layer available locally. In
testing these deltas have been shown to be around 10x to 100x smaller
than the .tar.gz files for typical Linux base images.
In the typical client-side case we have some previous version of the
image stored in container-storage somewhere, which means that we have
an uncompressed files available, but not the actual tarball
(compressed or not).
This means we can use github.com/alexlarsson/tar-diff which takes
two tar files and produces a delta file which when applied on the
untar:ed content of the first tarfile produces the (bitwise identical)
content of the uncompressed second tarfile. It just happens that the
uncompressed tarfile is exactly what we need to reproduce, because that
is how the layers are refered to in the image config (the DiffIDs).
How this works is that we use OCI artifacts to store, for each regular
image a manifest with information about the available deltas for the
image. This image looks like a regular manifest, except each layer
contains a tar-diff (as a blob) an uses the existing annotations key
to record which DiffIDs the layer applies to.
For example, a manifest would look like this:
The config blob is just an json file containing "{}". Ideally it
should not be of type application/vnd.oci.image.config.v1+json,
because that is reserved for docker-style images. However, as
explained in oras-project/oras#129, docker hub
doesn't currently support any other type. For registries that support
OCI artifacts we should instead use some other type so that tooling
can know that this is not a regular image.
The way we attach the delta manifest to the image is that we store it
in the same repo and a tag name based on the manifest digest like
"delta-${shortid}".
The delta layers record which DiffID they apply to, which is what we
want to use to look up the pre-existing layers to use as delta source
material, and it is what the delta apply will generate. This means
however that using the deltas only works if we're allowed to
substitute blobs, but this doesn't seem to be an issue in the typical
case.