Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spec for go package URLs #338

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions PURL-TYPES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,22 @@ github
pkg:github/package-url/purl-spec@244fd47e07d1004
pkg:github/package-url/purl-spec@244fd47e07d1004#everybody/loves/dogs

go
------
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the field empty or does it imply the go mod proxy? It can't be both.

- The ``name`` will be the full module path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably specify that it is case sensitive. pkg:golang incorrectly states that it is not case sensitive and must be lowercased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.

Copy link
Member

@jkowalleck jkowalleck Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``name`` will be the full module path.
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_.

this change would close #308

Copy link
Contributor

@matt-phylum matt-phylum Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- - The ``name`` will be the full module path.
+ - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.

this change would close #308

I don't think this is correct.

  1. I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
  2. The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will modify my change-suggestion accordingly. does it fit better, now?

- The ``subpath`` will represent the package path within a module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``subpath`` will represent the package path within a module.
- The ``subpath`` is the unmodified package path within a module.

- The ``version`` will be a valid go version or pseudoversion, or empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``version`` will be a valid go version or pseudoversion, or empty.
- The ``version`` may be a valid go version or pseudoversion, omitted when empty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why may here?

- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional information should be explicitly defined here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactlty. be specific in the spec, so we all are on the same page.

- Examples::

pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%[email protected]#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a likely problem with the use of subpath: there is no way to determine where the module ends and the package starts in the general case, is there?
For instance, in the path google.golang.org/genproto/googleapis/api/annotations how can I determine safely that google.golang.org/genproto is a module and that googleapis/api/annotations is a package inside this module? I need either a go proxy lookup or a full filesystem to locate a go.mod/go.sum file, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way, if the module's code is available to you, to determine from a package import where module path ends and the package path begins by making HTTP requests.

I think the use of the subpath here is good because it puts the burden of determining this on whatever generates the PURL, which is likely aware of Go and either has the module paths or is most likely to be able to find the module path from the full package path. Then if you want to use a tool that checks PURLs against a database of information about modules (eg vulnerabilities), the tool already has all the information it needs. Otherwise, either the tool would need to make external API calls to figure out the module path of the PURL or the database would need to have an entry for every package in the module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add to @matt-phylum's comment. If a tool is producing a PURL for a Go artifact, then it can use go version, Debug.BuildInfo, or packages.Load to get information about the package and its corresponding module. The encoding proposed here then makes it clear what the modules and packages are.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is true for the general use case of PURLs. E.g. we do static analysis of binaries and while we can get information about linked packages, there's no indication of which part of the paths correspond to modules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you are looking at a Go symbol from the symbol table, you can get its package. You can get the module correctly by prefix-matching it with module information from debug.Buildinfo of the binary, unless there are several modules that are prefixes of the package. My inclination is that it should not affect what is proposed here. (Arguably, there should be a way to get module information for a symbol in the binary, just the way one can do it for the source analysis.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to include all the buildinfo structure as qualifiers?
If so, this would only apply in a built binary?

Copy link
Member

@jkowalleck jkowalleck Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
If so, all the qualifiers MUST be documented in the type-spec.

currently it reads:

Additional Build information for binaries can be included as qualifiers (i.e VCS info, go version info, GoArch/GoOS info etc)

I am afraid this documentation is insufficient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will expand on this.

pkg:go/golang.org%2Fx%[email protected]?goversion=1.23.2#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Go module versions always to be prefixed with a v?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version.
-- https://go.dev/ref/mod#versions

version could also be a pseudo-version -- a git-tag, a git-commit-hash, or something like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pseudoversion is a special kind of version that also starts with a v: https://go.dev/doc/modules/version-numbers#pseudo-version-number

I think for Go modules, including when using the Go module system to refer to something that predates modules, the version always starts with a v. In which case, versions that don't start with v would only be used with older tools like Dep?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a version exists, it should be a valid Go module version. It should start with a v.

Note that hashes should not be permitted, they are not a valid Go version (resolution of hash commits in go tooling is a convenience feature).


golang
------
``golang`` for Go packages:
Expand Down