Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package management rfc #1983

Merged
merged 11 commits into from
Oct 24, 2024
243 changes: 243 additions & 0 deletions rfcs/006-package-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
---
feature: package management
start-date: 2024-06-28
author: Joe Neeman
---

# Package management

People want to reuse code, but Nickel doesn't currently have a good way to do
it. We should have a way to fetch packages and make them easily available to
Nickel code. The mechanism needs to be predictable (it should fetch the code
that the user expects to fetch) and reliable (if it works on my machine it
should work on your machine).
jneem marked this conversation as resolved.
Show resolved Hide resolved

## The manifest file

We will require a manifest file in order to import packages. Manifest files must
be named `package.ncl`, and they are found by searching up from the file being
jneem marked this conversation as resolved.
Show resolved Hide resolved
evaluated. That is, when the user invokes `nickel export path/to/foo.ncl`, we
look for a manifest at `path/to/package.ncl` and then at `path/package.ncl`, and
so on.

The manifest file format is defined by the contract `std.package.Manifest`,
while is defined as
jneem marked this conversation as resolved.
Show resolved Hide resolved

```nickel
{
name | String,
version | Semver,
nickel-version | Semver,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed?
Should this be a version constraint? e.g. >=1, ^1, >1.1.3,<2 etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I guess version should be SemverConstraint (or VersionConstraint). nickel-version is the minimum nickel interpreter version required by the package. Maybe min-nickel-version is clearer?


dependencies
| { _:
aherrmann marked this conversation as resolved.
Show resolved Hide resolved
[|
'Path String,
'Git { url | String, branch | optional | String, rev | optional | String },
'Index { name | String, version | Semver },
jneem marked this conversation as resolved.
Show resolved Hide resolved
|]
}
| default
= {},
}
```

So an example manifest might look like

```nickel
{
# ...
jneem marked this conversation as resolved.
Show resolved Hide resolved
dependencies = {
foo = 'Index { package = "github/tweag/foo", version = "1.2.0" }
bar = 'Path "../my-bar",
}
} | std.package.Manifest
```

### Alternative: inline dependencies

Nix and Dhall allow for importing dependencies dynamically, using things like
`fetchGit`. There was some discussion
[here](https://github.com/tweag/nickel/issues/329#issuecomment-967372858)
on the advantages and disadvantages of inline imports.

### Alternative: toml manifest

Maybe the manifest should be in some plain-data format like toml. This would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see it mentioned explicitly, so I wanted to raise it. Will import be forbidden in the manifest itself? I would imagine that that would be a good idea. But I haven't thought about it very deeply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I had originally thought that the manifest wouldn't support package management, but it would be allowed to import normal paths. But maybe let's start by forbidding imports altogether.

be easier to modify programmatically.

### Alternative: shorthand for registry imports

Cargo allows a shorthand like

```toml
"github/tweag/foo" = "1.2.0"
```

instead of

```nickel
foo = 'Index { package = "github/tweag/foo", version = "1.2.0" }
```

Since we expect registry imports to be the common case, maybe it's worth having
a shorthand?
Comment on lines +106 to +107
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is, to be honest. But on the other hand, you'll have to be careful to avoid making “weird”. The shortcut and the full method should look decently similar. Or something.


### Question: manifest file name

The name `package.ncl` was chosen to be similar to npm's `package.json` or
stack's `package.yaml`. One problem with this is that it could be confused for
a nickel source file. Another possibility would be to use an extension-based
name like `<package-name>.cabal`. Or just a stranger name that's less likely to
conflict with something real.

## Import statements

The manifest file assigns a name to each dependency; to import the dependency
named `foo` you simply write `import foo`. That is, an `import` statement either
takes a string in quotes -- in which case it imports a path -- or an identifier
without quote -- in which case it imports a package.
jneem marked this conversation as resolved.
Show resolved Hide resolved

The `import foo` expression evaluates to the contents of `main.ncl` in `foo`'s
root directory.

### Question: other entry points?

We've hardcoded `main.ncl` as the entry point of every package, but what if
they want to expose multiple entry points? For example, node allows a package's
manifest to specify the entry point(s). This is probably not very important to
support, as you can just put

```nickel
{
other = import "./other.ncl",
blah = import "./blah.ncl",
}
```

in your package's `main.ncl`, to provide "other" and "blah" as other entry points.
yannham marked this conversation as resolved.
Show resolved Hide resolved

## Kinds of dependencies

Where can dependencies come from? Dhall allows imports from arbitrary urls. Nix
supports fetching from a variety of VCSs, paths, and archive formats.

We'll support dependencies from

- a central registry, that can identify packages by name and version number.
This should be the most common method of importing packages, like `crates.io`
in rust.
- git repositories (either from HEAD, or from branches, tags, and revisions
specified by hashes). This allows for easy use of unpublished packages,
including in-development versions.
- paths (relative or absolute). This allows for easy use of different packages
within the same repository, or for temporary patching of published packages.

We require packages to have their own manifest file (at the package root), even
if they don't import dependencies.

## Lock files

In order to ensure reproducibility across time and across machines, we build
a lock-file (if there isn't yet one) when running `nickel eval` or `nickel
export`. The lock-file specifies the exact versions of all (transitive)
jneem marked this conversation as resolved.
Show resolved Hide resolved
dependencies, allowing those identical versions to be used every time.

- For a git dependency, the manifest might not specify the exact revision (it
might specify a branch or tag, or just default to HEAD). The lock-file will
record the exact revision.
- For a repository dependency, the version specifier might allow for a range of
versions. The lock-file will record the exact version used.
- For a path dependency, the lock-file will record that there was a path
dependency, but it won't record anything about it and it will ignore recursive
dependencies. This is because path dependencies can change at any time, so
they can't be meaningfully locked.

What happens if we have a lock-file, but we modify the manifest? We don't want
to be too strict about requiring the exact versions in the lock-file, or we'll
end up forcing the user to re-create the lock-file from scratch. In this case,
jneem marked this conversation as resolved.
Show resolved Hide resolved
we treat the lock-file as a suggestion instead of a hard constraint: during
resolving, when choosing the next package version to try, it picks the locked
version first. But if the locked version leads to a conflict, it will try
another version without complaining. If nothing has changed since the lock-file
was created, it should always resolve the same versions.

## Version compatibility and resolution

How do we handle a package that gets imported multiple times in the dependency
tree?

For path and git dependencies, there isn't much choice. Dependencies
from the registry are the most interesting. Fortunately, there are fairly
well-established conventions for specifying ranges of versions (like ">=1.0
<3.0", or "^1.2"). What's less clear is how to handle multiple packages with
overlapping ranges. Some languages (e.g. python) insist that each package
resolves to a single version across the whole dependency tree. Other languages
allow multiple versions, keeping track of which package in the dependency tree
needs to import which version of a package.

I think we want to allow multiple versions of a package; the alternative can be
fragile and annoying. But then we need to figure out how many different versions
to allow. There's a trade-off: if we allow pulling in a different version
every time a package gets imported, solving the dependency graph is easy.
But it increases the chance of getting incompatibilities at runtime: we might
accidentally get a value from `[email protected]` and try to pass it to an incompatible
function defined in `[email protected]`. Pulling in too many different versions also
increases the total number of packages in the dependency graph.
Comment on lines +309 to +316
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, the Bazel bzlmod solution is to use Go's MVS and fail if they encounter incompatible version requests for the same package. But, the root package can declare single or multi version overrides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very interesting read, thanks! This approach and the one of allowing duplicate semver-compatible versions seem to be pretty much mutually exclusive, so I guess we need to pick one or the other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the user controlled override is compatible with either approach. But, otherwise, yes, I think so.


The current prototype uses a strategy similar to cargo: it divides package
versions into semver-delimited "bins" and allows resolution to choose at most
one version from each bin. That is, we can have a `[email protected]` and a `[email protected]` in
the same dependency tree, but not a `[email protected]` and a `[email protected]`.
jneem marked this conversation as resolved.
Show resolved Hide resolved

## The registry

How should we manage the global registry? There's a potential for incurring
substantial maintenance costs here, so we should be careful.
Comment on lines +411 to +412
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you considering support for custom registries?

When Bazel introduced its new dependency manager bzlmod they also allowed users to define custom registries and even use multiple registries. Commercial users often like this because they can place proprietary code into their own private registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll look into that.


We will provide a git repo, at a hard-coded location, to serve as the registry.
This repo will contain the "index", but not the actual package contents. It
will contain one file per package, each of which contains a line per version.
Each entry specifies the location of the package (currently required to be on
github) and its git tree hash. This ensures that packages are immutable, but it
doesn't stop them from disappearing: we don't keep a copy of the actual package
contents.
Comment on lines +416 to +421
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This representation strategy means that the manifest files (hence the packages' metadata) isn't kept in the registry. It means that querying the registry can be quite expensive since it requires many calls to many Git repositories. Is this reasonable? (genuine question, I don't really have an opinion on this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to the thing I left out, that the registry index has enough information for dependency resolution. We don't store the whole manifest in the registry, but enough that we can do version resolution just by querying the index (and manifests from git/path dependencies, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe we don't just want dependency resolution? What if we want to query package descriptions, for instance? What meta-data should be in the repo is something we definitely ought to answer, oughtn't we?

PS: I'm not even sure “oughtn't we” is the right way to end this sentence. But it sounds super cool, so I went with it anyway!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I guess the index ought to include whatever metadata we want to be easily queryable.


The registry entries are named like "github/\<org\>/\<package\>" (where in the
future we might support places other than github). This allows the package
registry to automatically discover new package versions: to find the latest
versions of "github/tweag/json-schema-lib", we simply fetch the repository at
`github.com/tweag/json-schema-lib` and look for tags that look like version
numbers. Initially, we will scrape packages daily in a cron job. Eventually we
will allow people to automatically request re-scrapes of specific packages.
jneem marked this conversation as resolved.
Show resolved Hide resolved

Once a package version is stored in the index, it will never be overwritten.
If a future scrape sees that a previously existing version tag is pointing at a
different commit, we will make a note (maybe warn someone somehow?) and keep the
old version.

### Question: should we store a content hash too?

We're storing a git tree hash in the index, but if we ever want to store package
contents in the future, maybe we should also store a hash of the git tree
contents? This would allow verification of package tarballs, without needing the
whole git repo.
Comment on lines +437 to +442
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing a content hash is a good idea. But there's more than one way to do so, so you may end up not getting much more compatibility than you planned.

That being said, there is one content-hash standard, namely SWHID from our good friends at Software Heritage. It's specified here https://www.swhid.org/specification/v1.1/ . Maybe worth looking into it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like SWHID is very easy to support. As far as file and tree identifiers go (which I think are the ones we care about; the registry only needs to identify the tree, not the whole history), it's 100% git-compatible. That is, if the tree if is d198bc9d7a6bcf6db04f476d29314f157507d505 then the SWHID is
swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505


## CLI support

We'll need some CLI commands for handling common package-management tasks. The
current prototype has

- a `nickel package generate-lockfile` command that updates the lock-file
- a `nickel package debug-resolution` command that prints the full recursive
dependency tree

We probably also want

- a command for adding a new dependency to the manifest (checking if it exists,
and picking the most recent version)
- a command for downloading the dependency tree (for use in build systems that
expect different "fetch" and "build" phases)
- a command that checks for new dependency versions and updates the manifest

Anything else?