Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance OCI Artifact support #1247

Open
errordeveloper opened this issue Sep 29, 2023 · 9 comments
Open

Enhance OCI Artifact support #1247

errordeveloper opened this issue Sep 29, 2023 · 9 comments
Labels
area/oci OCI related issues and pull requests

Comments

@errordeveloper
Copy link
Contributor

errordeveloper commented Sep 29, 2023

There is limited support for OCI artifacts that hadn't been built with Flux CLI.

One can use spec.layerSelector.mediaType, however it doesn't handle searching images with an index.
There is also spec.layerSelector.operation that can be set to copy or extract.

There are also some non-configurable assumptions:

  • it's only possible to select either just the first layer or the first one that matches the given media type
  • spec.layerSelector.operation: copy writes files with .tgz extension, layer content be any kind of format, e.g. JSON

Proposal:

  • allow serach from index, with support for nested indecies also
  • change ops constant names to Copy and Extract to match Kubernetes style
  • provide some way of setting name of the destination files for Copy, by default it could be derrived from the media type name, e.g. tar+gz vs json+gz
  • allow specifying layer offset with default being 0
  • allow exacting multiple layers, perhaps this could be accomplished with operation: MergeDir and operation: LayerPerDir

Some of these changes may be more complex than others, just opening this for a discussion.

@errordeveloper
Copy link
Contributor Author

allow serach from index, with support for nested indecies also

This may need additional parameters, e.g. it's possible that an image has two components that have the same media types, we will need a way to decide which one to pick. We might have to just error out.
Using an offset like with layers won't suffice in this case, as order is not an inherent property of an index, unlike with layers. Annotation could be used, but that requires modifying the image, which may not be possible, so perhaps an error message would be the best start. That error message could suggest the user set a reference to an exact digest instead.

@errordeveloper
Copy link
Contributor Author

provide some way of setting name of the destination files for Copy

The key question to me here is how names and formats of files are surfaced in the interface with other controllers. Is kustomize-controller expecting <revision>.tgz files?

@souleb
Copy link
Member

souleb commented Sep 29, 2023

This may need additional parameters, e.g. it's possible that an image has two components that have the same media types, we will need a way to decide which one to pick. We might have to just error out.

Care to explain a bit more? I don't understand this.

allow serach from index, with support for nested indecies also

This could help where several artifacts form a product to be deployed. Operators could provide an index with all artifacts. With the other proposed, we could then have a ready directory to be passed to downstream consumers.

@stefanprodan
Copy link
Member

stefanprodan commented Sep 29, 2023

I think first we should optimize the pulling, instead of downloading the whole artifact like we do here

img, err := remote.Image(ref, opts...)

We should get the manifest, and download only the selected layer.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Sep 29, 2023

I think first we should optimize the pulling, instead of downloading the whole artifact...

@stefanprodan it's possibe that just using remote.Image instead of crane.Pull (per #1244) might have solved that issue already, let's verify that. What I inteded with this issue is to start a discussion about the functionality the API could offer for supporting wide range of images.

@stefanprodan
Copy link
Member

stefanprodan commented Sep 29, 2023

@errordeveloper you're right, the comment was left in place from when we used crane, but it doesn't pull anything else but the manifest, following the code from remote.Image it ends up doing:

	u := f.url("manifests", ref.Identifier())
	req, err := http.NewRequest(http.MethodGet, u.String(), nil)
	if err != nil {
		return nil, nil, err
	}

@errordeveloper
Copy link
Contributor Author

This may need additional parameters, e.g. it's possible that an image has two components that have the same media types, we will need a way to decide which one to pick. We might have to just error out.

Care to explain a bit more? I don't understand this.

It's in the case of what do you do if index points at several artefacts that all match selection criteria. Using a directory per your suggestions

allow serach from index, with support for nested indecies also

This could help where several artifacts form a product to be deployed. Operators could provide an index with all artifacts. With the other proposed, we could then have a ready directory to be passed to downstream consumers.

My primary use-case for this one was image with attestations stored along with configs. But it can be what you descirbed also, for sure, but I am not sure what sort of directory structure could be derived. I guess this is also similar to the idea of merge operation for multiple layers...
Maybe we could use annotations for directory names, but that will mean images will need to built in a very specific ways with those annotations.

On the other hand, it might be wise to put composition out of scope, maybe there is room for another abstractions like RepositoryComposition?

@stefanprodan had there been requests about composing Git repos? Are submodules the go-to recommendation?

I guess that there is fundamental difference with OCI, in the way that you can staff anything into it without meanigful path keys where uniqueness is guaranteed. With Git, every object has a meaningful and unique paths. With OCI it's just hashes and paths are only defined within a tarball, which is below OCI spec, so there can be paths conflicts, if you have an image with multiple tarballs.

@stefanprodan
Copy link
Member

had there been requests about composing Git repos? Are submodules the go-to recommendation?

Yes, see my answer here: #854 (comment)

@stefanprodan
Copy link
Member

stefanprodan commented Sep 29, 2023

operation: MergeDir

I vote for operation: ExtractAll

With OCI it's just hashes and paths are only defined within a tarball, which is below OCI spec, so there can be paths conflicts, if you have an image with multiple tarballs.

For Timoni modules, I opted to use the same media type for all content layers, and at pull time, the filesystem gets unified.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.timoni.config.v1+json",
    "size": 342,
    "digest": "sha256:4808b5d01cb574e2d5c823e3af98f41af2e4965e87ddb90a3df311a47e302b0a"
  },
  "layers": [
    {
      "mediaType": "application/vnd.timoni.content.v1.tar+gzip",
      "size": 189009,
      "digest": "sha256:d865c7899bccf1bce740ccfbfa4513f9f97acbdb7c03fbc60dd2153ee9157a68",
      "annotations": {
        "sh.timoni.content.type": "module/vendor"
      }
    },
    {
      "mediaType": "application/vnd.timoni.content.v1.tar+gzip",
      "size": 4490,
      "digest": "sha256:1ccb6accebd348dee5047de0316e229156525487b86611945c148fda9e5dfc22",
      "annotations": {
        "sh.timoni.content.type": "module"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2023-09-28T23:58:44+03:00",
    "org.opencontainers.image.revision": "1.1.0",
    "org.opencontainers.image.source": "https://github.com/stefanprodan/timoni.git"
  }
}

If source-controller would unpack all the application/vnd.timoni.content.v1.tar+gzip layers in the same dir, then it would do the same thing as Timoni and the resulting Flux artifact would be Ok. The layer annotations here are just hints to humans, but the actual files inside will never overlap, the sha256:d865c7899bccf1bce740ccfbfa4513f9f97acbdb7c03fbc60dd2153ee9157a68 layer contains only the cue.mod dir, and the sha256:1ccb6accebd348dee5047de0316e229156525487b86611945c148fda9e5dfc22 layer the templates, values, etc.

I would really like for Flux to be able to download and extract all the layers based on the media type selector, in this case it would be application/vnd.timoni.content.v1.tar+gzip.

Push code is here https://github.com/stefanprodan/timoni/blob/a27dada5ab6e3bb356f343edd1c675ff75f34ca4/internal/oci/push_module.go#L42

@stefanprodan stefanprodan changed the title enhance OCI artficat support Enhance OCI Artifact support Sep 30, 2023
@stefanprodan stefanprodan added the area/oci OCI related issues and pull requests label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/oci OCI related issues and pull requests
Projects
None yet
Development

No branches or pull requests

3 participants