-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new spec for go
package URLs
#338
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang
except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.
------ | ||
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the field empty or does it imply the go mod proxy? It can't be both.
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. | ||
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional information should be explicitly defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactlty. be specific in the spec, so we all are on the same page.
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably specify that it is case sensitive. pkg:golang
incorrectly states that it is not case sensitive and must be lowercased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.
Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of Say you have a module with the path |
I think the better solution to this problem is that However, "a go module cannot be represented by different PURLs" is not generally the case:
|
I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced. |
It is breaking because no existing PURL software expects |
this is true to every newly proposed PURL Type :-) this PR is trying to add a new type |
The problem is that this is not a new type. The |
it is not? Could you point me to the existing
I wonder how you come to this conclusion. |
From the PR description:
|
which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.
exactly this paragraph makes it clear: this is a non-breaking change. Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one. |
i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them The only way to fix
|
Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use If you start writing SBOMs that have Keeping You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.
What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that. |
how? If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
So?
A PR tells a story, and the effective patch gets updated along with the discussions on a PR. (PS: I review the content of the PR. and at the time of review, I saw no breaking change.
how comes?
the curerent |
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``name`` will be the full module path. | |
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_. |
this change would close #308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- - The ``name`` will be the full module path. + - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.this change would close #308
I don't think this is correct.
- I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
- The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will modify my change-suggestion accordingly. does it fit better, now?
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``subpath`` will represent the package path within a module. | |
- The ``subpath`` is the unmodified package path within a module. |
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``version`` will be a valid go version or pseudoversion, or empty. | |
- The ``version`` may be a valid go version or pseudoversion, omitted when empty. |
Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:
In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation. This causes the same alignment problems as introducing a ¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical. |
Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:
|
If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does. There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec. |
@maceonthompson Thanks for putting this together! this makes a lot sense, and we have an issue with Go alright. Let me look at the comments in details and come back with my 2 cents! |
@matt-phylum re:
I am not sure that's hte case, but a new type vs. updating the existing type demands some careful thinking :) |
|
||
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a likely problem with the use of subpath: there is no way to determine where the module ends and the package starts in the general case, is there?
For instance, in the path google.golang.org/genproto/googleapis/api/annotations
how can I determine safely that google.golang.org/genproto
is a module and that googleapis/api/annotations
is a package inside this module? I need either a go proxy lookup or a full filesystem to locate a go.mod/go.sum file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a way, if the module's code is available to you, to determine from a package import where module path ends and the package path begins by making HTTP requests.
I think the use of the subpath here is good because it puts the burden of determining this on whatever generates the PURL, which is likely aware of Go and either has the module paths or is most likely to be able to find the module path from the full package path. Then if you want to use a tool that checks PURLs against a database of information about modules (eg vulnerabilities), the tool already has all the information it needs. Otherwise, either the tool would need to make external API calls to figure out the module path of the PURL or the database would need to have an entry for every package in the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add to @matt-phylum's comment. If a tool is producing a PURL for a Go artifact, then it can use go version
, Debug.BuildInfo, or packages.Load to get information about the package and its corresponding module. The encoding proposed here then makes it clear what the modules and packages are.
BTW, an elephant in the room is whether the distinction between a namespace and name makes sense not only here, but also in the whole spec, globally. I found myself using a variable with a "namespace/name" substring more often than not.
With this the whole It could have a minimal impact on the spec. |
|
||
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to include all the buildinfo structure as qualifiers?
If so, this would only apply in a built binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
If so, all the qualifiers MUST be documented in the type-spec.
currently it reads:
Additional Build information for binaries can be included as
qualifiers
(i.e VCS info, go version info, GoArch/GoOS info etc)
I am afraid this documentation is insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will expand on this.
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck | ||
pkg:go/golang.org%2Fx%[email protected]?goversion=1.23.2#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Go module versions always to be prefixed with a v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version.
-- https://go.dev/ref/mod#versions
version
could also be a pseudo-version -- a git-tag, a git-commit-hash, or something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pseudoversion is a special kind of version that also starts with a v: https://go.dev/doc/modules/version-numbers#pseudo-version-number
I think for Go modules, including when using the Go module system to refer to something that predates modules, the version always starts with a v. In which case, versions that don't start with v would only be used with older tools like Dep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a version exists, it should be a valid Go module version. It should start with a v
.
Note that hashes should not be permitted, they are not a valid Go version (resolution of hash commits in go tooling is a convenience feature).
@matt-phylum you wrote
Thanks for the links! I tend to think along the same lines, and we can likely salvage the golang type.
I need to pounder this. See my other comment wrt. the namespace/name above in #338 (comment) |
It is fine that PURL spec allows for more flexibility, but there should be only one way the Go module and package information is encoded. This simplifies the work for clients. It is easy to drop qualifiers from a PURL. It is annoying to generate multiple module+package encodings to see if the incoming PURL applies to your code. In general, this proposal tries to make it simple and clear to generate and accurately check against PURLs. It might not be the most user-friendly solution, but tools that render PURLs can easily prettify the output. We believe this is worth the sacrifice. |
Could it also be clarified how standard library packages are to be represented? Go has special handling for these, and the 'module' is never explicitly required when using them. But the module does exist for Go uses but the exact module name would make more sense we believe. |
OSV is already using |
I'm not sure I follow how it would make it easier for tools? |
Are they modules? They have go.mod files in their source code, but they aren't included in your go.mod and when you import their packages the compiler recognizes that you're trying to use the standard library and provides its own copy of the code from |
The I think we can safely use |
I believe I get the argument, and I will defer to your judgement as the more knowledgeable about Go concepts and internals. However, I do still feel that it would make sense for PURLs to use the "modules". PURLs are primarily unique identifiers so does it really matter what Go tooling does and how the internals work?
But I'm guessing that would cause issues, and it also doesn't follow the current convention for PURLs. As I said, I'll defer to better judgements, just giving a mostly uneducated opinion. :) |
The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.
Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.
In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.
The
go
type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages