Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for support of alternate registries in cargo #2006

Closed
wants to merge 7 commits into from

Conversation

cswindle
Copy link

@cswindle cswindle commented May 23, 2017

This RFC proposes adding support for alternative crates.io servers which could be used alongside the public crates.io server.

Rendered

@Mark-Simulacrum
Copy link
Member

Could you change the RFC file name so every line is "new"? Otherwise it's a little hard to read in the diff format.

@cswindle
Copy link
Author

@Mark-Simulacrum, I have sorted that now.

@phrohdoh
Copy link

phrohdoh commented May 23, 2017

In my opinion (please don't take this as a personal slight) you as a user do not want to (and should not have to) define which registry a dependency comes from (for the common case) so I feel this RFC should be expanded a bit.

IMO ideally you would do something like this in a global location (such as ~/.cargo/config):

[registry.'crates-io']
url = "..."
token = "..."

[registry.'companyname-private']
url = "..." # This could be a network-internal URL or a hosted instance
token = "..."

If the above is done and a registry is not explicitly provided for the your-private-package dep then, after not finding your package on crates.io (if you have it setup as a registry) we could get the niceness of:

$ cargo build
Downloading your-private-package 0.1.7 found in 'companyname-private' registry
   Building your-private-package 0.1.7

But if a registry is provided (using the named registry (or a url if we don't go with named registries)):

[dependencies]
your-private-package = { version = "=0.1.7", registry = "companyname-private" }
$ cargo build
Downloading your-private-package 0.1.7
   Building your-private-package 0.1.7

Sorry I feel I have somewhat derailed this and have attempted to push it in the direction I think is best, but I do think we're at a point of making an important decision.

Unfortunately I don't have much time this morning to discuss this but should be available on IRC (#rust, maybe #rust-infra and #cargo) closer to 6pm UTC-5.

@cswindle
Copy link
Author

@phrohdoh, thanks for the feedback on the RFC. I do however have concerns about your approaches, I feel that if you are wanting to use an alternate registry for crates then you would want to explicitly specify that in the dependencies, without that I think you would run into problems where a crate with the same name happens to exist in both registries and cargo may chose the wrong version to use.

I can recognise some use of having a global config file defining the URL that you want to use for the crates, but I think that the registry to use belongs in the crate that you are defining, not in the separate config on the users machine.

I am going to update the RFC to add that an alternate crates.io registry would be updated to include the registry in the fragment of Cargo.toml displayed for each package, thus allowing users just to copy and paste the fragment. In the instance where they wish to upload to the server it seems essential that they provide a server to upload to.

For reference, below is an example of the output when using cargo build using the private registries:

cargo build
Compiling unicode-normalization v0.1.4
Compiling matches v0.1.4
Compiling private-crate v0.3.1 (registry https://private.repo/)
Compiling unicode-bidi v0.2.5
Compiling idna v0.1.1
Compiling url v1.4.0
Compiling test v0.1.0 (file:///my-great-project/test)
Finished dev [unoptimized + debuginfo] target(s) in 7.99 secs

@phrohdoh
Copy link

if you are wanting to use an alternate registry for crates then you would want to explicitly specify that in the dependencies

Why do this if you don't have to?

If there is a conflict then it is of course up to the human to resolve (possibly with a --source cargo flag) and cargo should not be making any choices for the user (unless we say the order of registry entries also provides rank (where the lowest rank is chosen)).

I think that the registry to use belongs in the crate that you are defining
I would like to explore and discuss this further.

The large organizations I've worked at are mostly .NET based which means they use NuGet so this is where a lot of my inspiration draws from (the nuget client being a pain is another store and is separate).

It is extremely easy to spin up a private/public/personal/org NuGet server because of some choices they made (but I do think there is room for improvement there too).

@cswindle
Copy link
Author

I do not like the idea of needing to get the human involved to resolve by passing a flag to cargo as it will get very complicated very quickly if you need to specify the source for two different crates. I also do not like having an ordered list of registries as if the same name crate is available on the public and private registry then in some cases you may want it one way and in others the other. I feel that providing this information in the Cargo.toml and having it checked in with the project seems most sensible to me as the registries that you want to use is part of the data which belongs to the project in my eyes.

@Blacktiger
Copy link

Many organizations that want this feature will also want to make sure they can control exactly how the crates get into the organization. This usually means they want to know that when someone downloads the dependencies for a project they only come from their repository manager (Nexus, Artifactory, etc) and nowhere else. So developers need a way to make sure their local cargo only reaches out to the repository manager. Maybe that just means that they don't provide a specific registry at the project level?

@phrohdoh
Copy link

@Blacktiger indeed, this is why I want a registry list in a global place so crates.io can be taken out of the picture entirely if necessary.

At my current place of work we host everything we need on-prem and don't use nuget.org for anything.

If we need a package from there we ask for a specific version (down to patch number) and that gets pulled, inspected, and if approved pushed to our private registry.

@cswindle
Copy link
Author

@phrohdoh, what you are suggesting there is just to provide a replacement for crates.io, rather than something that works alongside a public registry. There are no cargo changes that would be required to support such a thing, all that would be required is only allowing a specific group of people to upload crates to your private registry, then update cargo/config to override using your new registry in place of crates.io.

@phrohdoh
Copy link

phrohdoh commented May 23, 2017

I am not suggesting that I merely stated that we use that process at work.

An alternate crates.io (mirror) and an alternate registry altogether are different things IMO.

Now I am not sure which one you are proposing here (your PR title says alternate registry but the RFC says alternate crates.io).

Replacing crates.io with an on-prem / private registry naturally falls out of supporting non-crates.io registries (assuming the APIs are the same of course) but doesn't mean you have to replace crates.io.

@alexcrichton alexcrichton added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label May 23, 2017
@@ -102,6 +107,12 @@ plus has the ability to have crates pushed to it, however this has the following
* It requires crates.io to be able to combine two registries, or requires a radical change to the way crates.io works
* The current proposal could be extended to support this, if a caching server is added at a later stage

## Including registry definitions in a global location
We considered using a global configuration file (eg ~/.cargo/config) to allow a registry to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think broad "we believe" statements like this are as truthful as they could be.

Future readers will no doubt read this as something everyone in this discussion agreed on which isn't the case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was a reflection of discussions that I had within our company prior to submitting the RFC, not intended as a full reflection of the discussion here as I thought it would be useful to include.

@alexcrichton
Copy link
Member

cc @rust-lang/cargo

@phrohdoh
Copy link

Thinking about this a bit more I have come to the conclusion that named registries are not a simple as I would like to think they are and perhaps living in each Cargo.toml is what we actually want (instead of pushing that off on a global config which is just the 'easy' route).

@cswindle what do you think of this?

[package]
name = "example"

[dependencies]
libc = "=0.1.7"
foobar = { registry = "internal", version = "^0.1" }

[registries]
default = "<url to crates.io>" # maybe this isn't necessary
internal = "<url to internal / on-prem / private / etc registry host>"

@cswindle
Copy link
Author

I did originally consider having the registries pulled out into a separate registries section in Cargo.toml, but thought that it was added complexity for very little gain. Also with my proposal to include the registry in the Cargo.toml fragment on the web server, this would then not be a simple copy/paste to use the library and I agree with the point you made about making it easier for the users. There is scope though for both methods to be used, but I would not be planning on adding that as part of this RFC, maybe it is something that could be proposed as a subsequent RFC if you feel there is enough justification for it.

@phrohdoh
Copy link

That's totally fair, I'll cede the registries table in favor of { registry = "<url>" }.

@mark-i-m
Copy link
Member

We still want to support private crates having dependencies on the public crates.io server, so we propose relaxing a check which ensures that the source for a dependency matches the registry. We propose that this only performs the check only if the dependency is not the default registry, thus allowing private crates to reference public crates on crates.io.

IIUC, this RFC makes it possible to have two versions of a library with the same name and version that are not the same... For example, suppose I have something like this:

[dependencies]
libc = { version = "*", registry = "https://github.com/my_awesome_fork/crates.io-index" }
foo = "*"

where foo depends on libc = "*"... Which libc does foo get? If it gets libc from crates.io then it is possible for libc (crates.io) to see types from the other libc or vice versa. But if foo gets libc from my registry, in theory it is possible for my libc to be different from what foo expects (so it doesn't compile or misbehaves).

One solution is to ensure that in the entire dependency graph all crates with the same name and version have the same registry too.

## Cargo.toml config changes
The following changes for Cargo.toml are proposed as part of this RFC:
* Add an optional 'registry' field to be specified as part of the package dependencies
* Add an optional 'registry' field to be specified in the package definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to be able to publish a crate to both a private registry and crates.io? For example, what if the workflow is that I publish prerelease versions to the private registry for internal testing, and then I publish polished versions to crates.io? I don't know if this is something that's wanted/needed, but it seems like it would be made inconvenient, if not impossible, by this implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... that's true. I can imagine such a feature being useful in CI environment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to do such a thing by not including the registry field and using a flag on your local crates.io server to allow a blank registry. I would not be intending to make such a change with this RFC though, I am trying to make this the minimum to unblock further work in this area.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the simple solution here is to supply a --source (or --registry) flag which tells cargo which registry to publish to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supply a --source (or --registry) flag which tells cargo which registry to publish to

👍, in the given example the package could not specify a registry as the primary release channel is the default (or it could explicitly specify crates.io if it wanted), then setup the CI to do cargo publish --registry https://internal.registry/ to override it if the tag is a prerelease version.

## Changes for alternative registries for dependencies
This boils down to a very simple change, where we previously setup the crate source for the
crates.io registry, we now just need to check if a registry is provided, if it has the crate
source is created using the registry URL, otherwise the crates.io server is used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm installing all crates from a private registry, it seems annoying to have to specify that registry for every dependency. I think it would be better if there was a way to provide a default registry and then override that default per-crate if necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My position is that there is a default registry and that is crates.io. I think keeping the registry with the dependency is a sensible approach as that makes it easier to copy dependencies from one Cargo.toml to another, but there is no stopping this from being done in a later stage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also in favor of "per dependency" config because of the "optimize for the reader" advice. When reading Cargo.toml, I would like to know from which registry each of the deficiencies comes, and per-dep config seems to be the easiest way to achieve that.

Adding a "default" registry would be backwards compatible, so we can skip it anyway at this stage :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the default is something you'll want to override on a per-dependency basis, but the current syntax seems pretty verbose and prone to typos. Something like this might be nicer?

[registries]
sone_name = "https://my_internal_registry.foobar"

[dependencies]
"some_name/foo" = "0.1"
libc = "0.3"

There are two parts to this, the first is a change to Cargo which checks if the registry
provided in the registry matches the host for the publish, if it does not it gets rejected.
The second part is a change to crates.io which will just reject the request to publish the
crate if the configured repository on the crates.io server does not match the registry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposed change to crates.io means that the crates.io codebase needs to know the host where it is running, which it does not currently. That might be something you were implying here by "change to crates.io", but I just wanted to make sure you knew that currently the codebase doesn't have knowledge of where it's running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to know the git repository and compare that, which it already has, so I don't think that this is an issue.


Currently this design requires that when you want to push to the private crates.io server
you need to override the host and token, it would be possible to update cargo to support
multiple registries tokens which can be used to login.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/cargo#3365 is relevant here :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and that is something which I would like as well, but I wanted to try and get something in place, which can then be built on to add support for things like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we may get this rather soon because we are moving credentials anyway: rust-lang/cargo#3978

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know there is progress on that front as well :)

* The current proposal could be extended to support this, if a caching server is added at a later stage

## Including registry definitions in a global location
We considered using a global configuration file (eg ~/.cargo/config) to allow a registry to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo will look up the directory tree for .cargo/config files, so it is possible to have, for example, ~/some/path/my-project/.cargo/config rather than globally.

I think Cargo.toml is a better place for this information since it's intended to be checked in and shared while .cargo/config files are meant to be ignored and personal, but I wanted to make sure you knew about .cargo/config being an option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know they can, but I just think that logically they belong in the Cargo.toml for the use cases I am trying to solve.

@withoutboats
Copy link
Contributor

I'm concerned about the implications this RFC would have for the evolvability of cargo. Right now, we control the source for both the client and the server (cargo and crates.io). Importantly, this means that we can roll out features to cargo with the knowledge that the server already supports them. We can't assume that all clients are up to date, but we can assume that crates.io is.

With multiple registries, most of which won't be run by the Rust project, we will need to worry about what happens when you make a request involving new feature X to a registry which has not been updated to support that feature. Today we have a lot more flexibility.

I'm also a bit concerned about the increased possibility of ecosystem incoherence. Someone running a public secondary registry could start introducing new features to it & release new cargo subcommands based on those features (which only work on their registry). Essentially this would enable motivated subgroups to ignore the consensus-building RFC process & partially fork the ecosystem without the extremely high cost of hard forking the ecoystem.

Both of these boil down to this: we currently have a "single server, many clients" model, and get all the benefits of that. Moving to a "many servers, many clients" model introduces significant consistency issues. This is a large drawback of this RFC.


Is the use case you describe of setting up a private registry hypothetical for you @cswindle, or are you currently trying to integrate cargo in a closed source environment & its not meeting your needs? Making it easier to use cargo in environments like this, where you have your own requirements, is a roadmap goal for this year, so if that's the case, we want to figure out what's needed to support your use case entirely. Feel free to leave comments on the roadmap tracking issue rust-lang/rust-roadmap-2017#12.

I think there is a more moderate approach that still treats crates.io as ultimate, but allows for a kind of 'lesser/intermediate' source. Ruby's gemstash seems like possibly a model for this. We already have a concept of source replacement, though its mainly used for local vendoring, not for setting up another server like this would proposal. I think an approach along those lines is worth pursuing.

@withoutboats
Copy link
Contributor

withoutboats commented May 24, 2017

Actually, reading the source replacement documentation again, I see that it does support non-local registries:

A "registry source" is one that is the same as crates.io itself. That is, it has an index served in a git repository which matches the format of the crates.io index. That repository then has configuration indicating where to download crates from.

Currently there is not an already-available project for setting up a mirror of crates.io. Stay tuned though!

This seems like it supports the use case described in the motivation. I suppose the concern is about making these "registry replacements" support including crates that aren't in the crates.io index (currently every source in the replacement needs to be present in the replaced source).

@cswindle
Copy link
Author

@withoutboats, I am not sure that I agree that you currently have full control of the server, it is already possible to setup mirrors and to override to use those, so I think that you already have the scenario where you should care about other server. I would agree that this is a step further along that road, but I do not see any alternative for enterprises which wish to have closed source integrated with open source that is a nice experience for the user. One of the great benefits of rust is that crates.io/cargo is really good at discovering crates, without that for closed source projects it makes rust pretty annoying. I agree that there are ways to work around it by using git repositories for everything, however that becomes very tedious to use.

Regarding using source replacement, you could update it to allow replacement when they aren't present on crates.io, but that feels like the wrong thing to me. I would personally like rust to be as nice an experience for closed source software development as it is for open source software and have all the same tools available and a normal rust way to use them. I accept that in doing that you will then lose some control over the server component.

@internetionals
Copy link

In our corporate environment we deal with a few external repositories and maintain custom repositories on the side. Here are some of our motivations.

In our current dealings with remote repositories for other languages, we often want to:

  • Add custom version of specific packages (for non-maintainer updates)
  • Force certain packages to always come from our repositories (for packages that we need to patch)
  • Provide internal packages (these must always come from our repositories)
  • Provide pre-release versions that are only accessible for certain environments

So in 3 of these cases, we explicitly want to receive the packages from our own repository only. Should our own repository, for whatever reason (malicious, human error, ..) serve the wrong replies, then we don't want to suddenly receive some other package or version.

The other case, where we just want to release a hot fix version of some package, we want to be able to do that for all possible packages available from upstream.

We currently handle different repositories and toolchains. The solutions we currently use:

  • Provide selective repositories and configure the toolchains to prefer our packages above the others and to only lookup specially named packages in our repositories (mostly based on a prefix).
  • Mirror the entire upstream repository index and apply our custom packages on top of that. The toolchains only ever talk to our version of the repositories.

The first one is generally easier to setup and maintain. We always try to use some form of tracking the changes in the upstream repositories, eg. through mirroring. We often setup multiple internal repositories with released versions and testing/CI versions.

For cargo to fit this use case, we'd want to be able to specify where the preferred repository is, where the fallback repositories are and a way to explicitly set certain dependencies to always come from on of those.

The second is a lot more intrusive, in that you need better tools to maintain it and go with the changes in the upstream repository management.

In this case we'd only need cargo to specify which upstream repository it must use.

The choice between those two solutions is mainly dominated by the setup and maintenance effort:

  • The first is often the easiest to implement as many toolchains support that and the repository management tools are often lacking. This is often the "cheap but pragmatic" solution and used as long as the toolchain supports it.
  • The second is the most preferred, but only if there is readily available software for the repository management. In that case the policy is fully enforced at the repository level and tracking updates is generally also easy. It also has the benefit of allowing you to have full control over the availability and the amount of information you "leak".

One thing that helps in both cases: Have an option to say, preferably system wide, what the default upstream repository is. This allows for the environment to control where to fetch the packages from when they are not explicitly pulled from a specific repository. This can be used to prevent unintended "leakage" and allows environment specific repositories to be used.

Hopefully this gives some insights into some of the use cases users might have.

@carols10cents
Copy link
Member

I'm interested in hearing from @alexcrichton @wycats @matklad about the technical implications of this proposal-- is this proposal compatible with how cargo works today? Is this proposal compatible with the future use cases mentioned in the proposal? What other considerations are there that we might be missing?

@matklad
Copy link
Member

matklad commented May 24, 2017

@mark-i-m raised a very interesting concern: #2006 (comment)

I think we need to cover in the RFC how dependencies resolution would work in the presence of registry key. The core question seems to be:

If crate X depends on crate Y from the default registry, could this dependency be satisfied by the crate Y from a private registry?

I think the safe answer is "No". But there is perhaps an interesting case where you want to patch some dependency across the ecosystem.

@cswindle
Copy link
Author

@matklad, my position would be that if a dependency does not include a registry it comes from the default registry. For the case of overriding I would prefer for the source replacement to be used as that seems a cleaner way to do it. I will look at updating the rfc to provide further details on this area (although it may be a few days before I get a chance to do so).

@cswindle
Copy link
Author

I would not be keen on a situation where DNS is used to locate packages as I think that is going to have the following major impacts:

  • external servers can go down, then it would not be possible to re-compile older code - this is major for a company that needs to be able to build patches for programs in 10yrs
  • discoverability of crates goes down, you will effectively end up in a situation like C, where you need to try and find each and every library you need, plus work out how/where to get them from

On the basis of that, I do not want to go down that route with this PR as I do not think it would be a good direction for rust to go in. You are welcome to raise your proposal and see if enough people wish that to be the direction, however I would now like this thread to get back to the RFC as proposed in this PR.

@SamWhited
Copy link

SamWhited commented Jun 21, 2017

also take away the single most important guarantee that Cargo in combination with crates.io brings: Continued availability of dependencies.

It would not take this guarantee away; it would move it to the crates.io service. If you want that guarantee, only use the original service or other services that you trust that provide similar guarantees.

I think most companies would prefer full local hosting of all crates that are being used inside their company.

external servers can go down, then it would not be possible to re-compile older code - this is major for a company that needs to be able to build patches for programs in 10yrs

I agree, I am not suggesting that we get rid of this requirement. Companies would still probably set a global blessed package repo as the default and pull everything from there and would not use the DNS discovery features at all.

Thanks for your feedback.

@alexcrichton
Copy link
Member

@cswindle yes for the index changes that looks sufficient

@cswindle
Copy link
Author

@carols10cents, I believe that with the latest updatesthere is nothing else that is blocking this RFC, do you agree?

@carols10cents
Copy link
Member

carols10cents commented Jun 25, 2017

One thing that I mentioned to @cswindle in IRC is that this week is the Mozilla All Hands, so a bunch of us are going to all be in one place for the week and will be discussing lots of things. I plan to bring this up in order to move this forward.

Rereading the text of the RFC, I think my main issues are (listing them here rather than on comments in the RFC so that I can better track when all of these have been resolved, because I think some of my comments are getting lost):

  • I think we want to move towards specifying host rather than index location everywhere. I'd rather have to put https://my-awesome-fork.com in my Cargo.tomls instead of https://github.com/my_awesome_fork/crates.io-index, for example. So I'd like to see that work get done before implementing this.

  • Regarding the "Crate naming on alternate servers" section, I think I'd also like to see support in Cargo.toml for renaming a dependency, much like we can alias things with use. For example:

    [dependencies]
    libc = { version = "*" } // crates.io libc
    libc = { as="awesomefork-libc", version = "*", registry = "https://github.com/my_awesome_fork/crates.io-index" }
    

    Then everywhere within my crate, I could refer to extern crate awesomefork-libc. I know @alexcrichton brought this up recently as something that would be useful for something else, but I can't remember where or why. Feature and crate namespaces colliding maybe?

    I think we need to have a plan for supporting the case of people who aren't caching their dependencies on their local server, and who do want to allow publishing of crates without a prefix as you've proposed, but also want to be able to depend on crates on crates.io that might have conflicting names.

  • Regarding the "Index files changes" section, this part:

    As Cargo requires the index file to include all the dependencies, the crates.io index file format is updated to include the registry in the dependency. The registry is an optional field, where by default it is None, and will only be set when using an alternate crate server.

    sounds like you're saying people will be able to publish crates to crates.io that depend on crates that come from other registries. I don't think this is something we want to allow; we want to guarantee that all crates on crates.io will build from only the information that crates.io hosts. Is that what you're saying or am I misunderstanding?

  • Regarding the Blocking requests to push to a registry section, this part:

    the first is a change to Cargo which checks if the registry provided in the registry matches the host for the publish, if it does not it gets rejected

    In your proposed change to the Cargo.toml, you have a registry specified for the current crate. Couldn't cargo publish then just use that registry value and only publish to that registry? I think it would be convenient if I didn't have to specify an --index when running cargo publish if I've already specified the index in the Cargo.toml. Are you saying that this rejection would only happen if I explicitly specified an index when publishing?

    Also in this comment thread, you suggested allowing private registries to have a setting to allow publishing a crate without a registry set? This seems like unnecessary complexity on the server side.... I think this needs to be worked out in a different way. Why not allow multiple registries to be specified in Cargo.toml that this crate is allowed to be published to?

  • Regarding the Allow publishing when referencing external dependencies section, it says:

    We still want to support private crates having dependencies on the public crates.io server, so we propose relaxing a check which ensures that the source for a dependency matches the registry.

    I understand that you're trying to separate this proposal from companies having cached versions of crates.io crates, but I think they interact here. I could see a company wanting to disallow publishes to their internal registry if the dependencies haven't been cached from crates.io to their internal registry yet.

  • Regarding the "Making it easier for users using an alternate crates.io registry" section, it took me a few readings to figure out that you were talking about the fragment that gets shown on a crate's page in the web UI of the registry. Could you clarify this section please, perhaps including a screenshot with your proposed changes, to pre-empt confusion?

@adalinesimonian
Copy link

adalinesimonian commented Jun 26, 2017

@carols10cents

Regarding the "Crate naming on alternate servers" section, I think I'd also like to see support in Cargo.toml for renaming a dependency, much like we can alias things with use. For example:

[dependencies]
libc = { version = "*" } // crates.io libc
libc = { as="awesomefork-libc", version = "*", registry = "https://github.com/my_awesome_fork/crates.io-index" }

Then everywhere within my crate, I could refer to extern crate awesomefork-libc.

I ❤️ this idea! I think it can be made even better by abstracting the registry URL away from the individual dependency and to its own registries section. This way you don't get a bunch of URLs floating around in the same Cargo.toml when you could instead have just one (especially useful if you use multiple packages from the same private registry).

For example:

[registries]
bizcorp = "https://code.example.com/DefaultCollection/_git/private-crates-index"

[dependencies]
libc = { version = "*" } # crates.io libc
libc = { as = "bizcorp-libc", version = "*", registry = "bizcorp" }
coffeepot-integrations = { version = "*", registry = "bizcorp" }
boss-key = { version = "*", registry = "bizcorp" }

Or, alternately:

[registries]
bizcorp = "https://code.example.com/DefaultCollection/_git/private-crates-index"

[dependencies]
libc = { version = "*" } # crates.io libc
"bizcorp/libc" = { as = "bizcorp-libc", version = "*" }
"bizcorp/coffeepot-integrations" = { version = "*" }
"bizcorp/boss-key" = { version = "*" }

And an example with multiple private registries:

[registries]
# Internal libraries
bizcorp = "https://code.example.com/DefaultCollection/_git/private-crates-index"
# Private libraries licensed to us by 3rd party vendor
software-vendor = "https://libraries.vendor.com/rust-libraries/index"

[dependencies]
libc = { version = "*" } # crates.io libc
libc = { as = "bizcorp-libc", version = "*", registry = "bizcorp" }
coffeepot-integrations = { version = "*", registry = "bizcorp" }
boss-key = { version = "*", registry = "bizcorp" }
proprietary-db-reader = { version = "*", registry = "software-vendor" }
proprietary-ui-toolkit = { version = "*", registry = "software-vendor" }

This keeps the TOML super clean and cute, and also makes it a whole lot easier to update the URL should the path to the repository change (hopefully a rare event, but still!).

@cswindle
Copy link
Author

cswindle commented Jun 26, 2017

@carols10cents, thanks for the detailed response. Below I have gone through each of them one by one, if you could let me know if you are happy with the responses and I will update the document to reflect.

Rereading the text of the RFC, I think my main issues are (listing them here rather than on comments in the RFC so that I can better track when all of these have been resolved, because I think some of my comments are getting lost:
• I think we want to move towards specifying host rather than index location everywhere. I'd rather have to put https://my-awesome-fork.com in my Cargo.tomls instead of https://github.com/my_awesome_fork/crates.io-index, for example. So I'd like to see that work get done before implementing this.

When is this likely to be implemented? If it is going to be a while (which I can imagine it may be), then an alternative could be to update to use the host in the registry fields, then do a query to see what the crates.io index is when required. That way the interface will be correct, even if the backend won’t be. I do not envisage that this will create much (if any) additional work for whoever fixes the issue you linked to. Would you like me to make the change to the RFC?

I have now updated the document to switch to use the server URL.

• Regarding the "Crate naming on alternate servers" section, I think I'd also like to see support in Cargo.toml for renaming a dependency, much like we can alias things with use. For example:
[dependencies]
libc = { version = "" } // crates.io libc
libc = { as="awesomefork-libc", version = "
", registry = "https://github.com/my_awesome_fork/crates.io-index" }
Then everywhere within my crate, I could refer to extern crate awesomefork-libc. I know @alexcrichton brought this up recently as something that would be useful for something else, but I can't remember where or why. Feature and crate namespaces colliding maybe?
I think we need to have a plan for supporting the case of people who aren't caching their dependencies on their local server, and who do want to allow publishing of crates without a prefix as you've proposed, but also want to be able to depend on crates on crates.io that might have conflicting names.

This was brought up by @alexcrichton in this thread (#2006 (comment)), I responded that this is something that I think is actually worthy of its own RFC as I think there will be people who would find this useful, for git repos for example, that would not notice this RFC. I am willing to write a new RFC if nobody else proposes it, however I will not be able to get around to doing this until around October due to work commitments. Does that seem reasonable?

• Regarding the "Index files changes" section, this part:
As Cargo requires the index file to include all the dependencies, the crates.io index file format is updated to include the registry in the dependency. The registry is an optional field, where by default it is None, and will only be set when using an alternate crate server.
sounds like you're saying people will be able to publish crates to crates.io that depend on crates that come from other registries. I don't think this is something we want to allow; we want to guarantee that all crates on crates.io will build from only the information that crates.io hosts. Is that what you're saying or am I misunderstanding?

No, I was meaning for a private crates.io server it would allow external dependencies, but not for sending to crates.io, which I agree should never reference an external dependency and this should be blocked on the server. I will update the RFC to clarify.

I have now updated the document to include this change.

• Regarding the Blocking requests to push to a registry section, this part:
the first is a change to Cargo which checks if the registry provided in the registry matches the host for the publish, if it does not it gets rejected
In your proposed change to the Cargo.toml, you have a registry specified for the current crate. Couldn't cargo publish then just use that registry value and only publish to that registry? I think it would be convenient if I didn't have to specify an --index when running cargo publish if I've already specified the index in the Cargo.toml. Are you saying that this rejection would only happen if I explicitly specified an index when publishing?

Good point, I will update to get the host automatically from Cargo.toml (although the token would still need to be provided until we have support for multiple tokens stored).

I have now updated the document to include this change.

Also in this comment thread, you suggested allowing private registries to have a setting to allow publishing a crate without a registry set? This seems like unnecessary complexity on the server side.... I think this needs to be worked out in a different way. Why not allow multiple registries to be specified in Cargo.toml that this crate is allowed to be published to?

This would just be allowing the private registry to use the same code path which is used for crates.io (which does allow a crate to be published with no registry). Also in this case I am pretty sure that this can be done today by just overriding the registry. I am not keen on specifying multiple registries as I think that having a single registry is more sensible as I personally feel it would lead to more confusion allowing multiple registries.

• Regarding the Allow publishing when referencing external dependencies section, it says:
We still want to support private crates having dependencies on the public crates.io server, so we propose relaxing a check which ensures that the source for a dependency matches the registry.
I understand that you're trying to separate this proposal from companies having cached versions of crates.io crates, but I think they interact here. I could see a company wanting to disallow publishes to their internal registry if the dependencies haven't been cached from crates.io to their internal registry yet.

Maybe this could be a config flag which is set for crates.io by default and private registries can override this to allow external dependencies, that way it is common code for crates.io and the private registries which want to just look like an internal crates.io, would you be happy with that change?

• Regarding the "Making it easier for users using an alternate crates.io registry" section, it took me a few readings to figure out that you were talking about the fragment that gets shown on a crate's page in the web UI of the registry. Could you clarify this section please, perhaps including a screenshot with your proposed changes, to pre-empt confusion?

I will update the RFC to include a diagram to clarify.

I have now updated the document to include this change.

Hopefully that deals with all of your queries.

@cswindle
Copy link
Author

@vsimonian, this is meant to be the minimal work to allow progress to be made on private registries. Having a registry section was already discussed and I proposed that is something that should not be part of this RFC and should instead be proposed as an extension in a subsequent RFC as I think that the notation that you propose can happily work alongside the way I propose.

@adalinesimonian
Copy link

@cswindle Thanks for the heads up! I spent last night reading all I could find related to this discussion, but I guess I missed that.

@alexcrichton
Copy link
Member

I personally feel like there's two crucial points to fix before landing this RFC:

  • Today there's no ability for unstable features to work in Cargo. This is such a big feature I think we need to finally bite the bullet and implement the ability to have unstable features in Cargo. This itself may be its own RFC or its own discussion thread (doesn't need to happen here), but I wouldn't want to land an implementation of this today unless it were unstable.

  • Next, I find the usage of URLs here to be "the wrong URL". The index of a registry is an internal implementation detail that I don't think should ever be surfaced. The fact that it's "exposed" today in a few commands is one that'll likely get fixed/removed in later versions of Cargo. In my mind the "correct" URL for crates.io is 'https://crates.io' or something like that. Basically I don't think we've got a great way to identify registries right now. Using an index is an ergonomic nightmare and otherwise we don't have an option on the table. I think this'll need some thinking to figure out the best option here.

@cswindle
Copy link
Author

cswindle commented Jul 3, 2017

@alexcrichton, I have updated the document to include switching to URL. Regarding the unstable feature, is that due to the fact that when people start experimenting with it you might want to tweak the interface, thus wanting to discourage widespread use of the feature? Is the discussion/RFC something that you/someone on the Rust team would be willing to propose as I will not have the time required to commit to get agreement on how it should be implemented.

@carols10cents, there are a few of your comments addressed in the changes I have just pushed. I will update the comment above to indicate which are updated to make it easier for you to see what feedback is still requiring some discussions.

@alexcrichton
Copy link
Member

Thanks for the updates @cswindle. The method of switching URLs to the "right one" specified locally here makes sense here and is a plausible way forward. It's a pretty big change to how Cargo works internally and can add latency when updating the registry, so we'll want to discuss the exact implementation over time (doesn't need to be 100% hammered out here though).

And yes we may be able to work on unstable features. I mostly wanted to point out that I would be uncomfortable landing this in Cargo before we have unstable features. Having this be unstable allows us flexibility to tweak the design as necessary without worrying about breaking changes. This is such a large addition that we're inevitable going to learn something during an implementation and deployment that we didn't think about before.

@cswindle
Copy link
Author

cswindle commented Jul 5, 2017

@alexcrichton, does that mean that you think that the proposal is fine to go to FCP, albeit the implementation after that is blocked based on your desire to have unstable features?

@alexcrichton
Copy link
Member

I believe that @carols10cents was going to shepherd this RFC, so I'll leave that up to her.

@wycats
Copy link
Contributor

wycats commented Jul 6, 2017

I'd like to raise a concern that hasn't been raised yet (as far as I can tell).

I personally really want multiple registries (and mirrors) as a public API. I've been motivated about this from Day 1 (which is why the internals are so amenable to this kind of change).

However, in order to make alternate registries a public API, we really do need to make the registry API and format, as used by Cargo, a public API. This means versioning the git repository format, and formalizing its contents in a compatible way.

In particular, while the registry has been scaling well so far, it's unclear that the current sharding strategy will work in perpetuity as registries get much larger. I personally feel very strongly that incremental updates and purely local dependency resolution are important features, which is why I went with a git repository in the first place. That said, I (in the early days) and we (more recently) never worked on stabilizing the current format.

I'm not saying this is extremely difficult, but rather that the work of stabilizing the registry format (rather than just assuming the current version is defacto public, as this RFC does), is the bulk of what I expect to be difficult about this RFC.

@cswindle are you interested in doing the work to expand this RFC to formalize the details of how Cargo interacts with registries?

@cswindle
Copy link
Author

cswindle commented Jul 7, 2017

@wycats, I can see how having the interfaces being public would make sense, however I am unable to commit to that as part of this RFC. The whole point of this RFC is to get something in place which can be built upon, the fact that @alexcrichton is wanting this to be unstable for a period of time, then maybe a subsequent RFC could deal with making the API public and you could have that as one of the criteria for the feature to become stable. Does that sound ok to you?

@bbatha
Copy link

bbatha commented Jul 16, 2017

I just wanted to voice my concern about this part of the proposed change:

I think we want to move towards specifying host rather than index location everywhere. I'd rather have to put https://my-awesome-fork.com in my Cargo.tomls instead of https://github.com/my_awesome_fork/crates.io-index, for example. So I'd like to see that work get done before implementing this.

For multiple repository hosting solutions like nexus and artifactory it needs to be possible to specify a path as well. For instance, artifactory hosts npm repos at https://host.company.com/api/npm/private-repo so you can host multiple repo types and multiple repos for the same language. Specifying just the host should have a good default but it should be overridable.

@carols10cents
Copy link
Member

@bbatha could you add that use case to rust-lang/cargo#4208 please?

@bbatha
Copy link

bbatha commented Jul 17, 2017

@carols10cents added there as well.

@joshtriplett
Copy link
Member

@wycats While I do see value in stabilizing the repository format itself, would it also help to define a library for both reading and writing it?

@cswindle
Copy link
Author

@wycats, I have updated the RFC to include that a public API is required in order to be used in a stable build, as discussed on IRC. Does that cover what you were wanting?

@aturon
Copy link
Member

aturon commented Aug 23, 2017

Nominating for discussion at the next Cargo team meeting.

@carols10cents
Copy link
Member

Just realized we never posted the decision made in the Cargo team meeting here-- we decided the specification of the index format was important, so we created a new RFC that contains that spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.