-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the way updates work to fix #3584 #3603
Conversation
Rather than having the update monitor itself exec the newly downloaded version of sandstorm, it simply exits (with status 0), and its parent process takes this as an indictation that it should re-spawn it (with the new version). This avoids the need to map ourselves to UID 0 in the user namespace, which avoids the EPERM error we were getting when later hiding the real UID from a grain. Per discussion on the issue, the old design no longer works as of Linux 5.12, after which spawning the grain we would need CAP_SETFCAP, which we do not want to have.
These (found in your Action results) existed prior to the large number of account failures, so I think we are good there, at least in getting back to where we used to be. ✖ tests/account #3239 You can see there are two other tests which also fail intermittently, all tracked here: https://github.com/sandstorm-io/sandstorm/milestone/19 |
Ok, yeah, it looks like we're good then. I was still seeing the appHooks failure locally and I couldn't remember if that one was one of the existing ones. |
Might make sense to add this/#3584 to that milestone. |
Done. |
I can't seem to reproduce the error on stop anymore; maybe I fixed it and forgot, or maybe I was running a stale version when I last tested. In any case, it appears to work. I still want to do a bit more testing; I patched the URLs used for updates and checked a bunch of stuff manually, both for a system wide install and just running the test configuration as an unprivileged user:
I still want to check:
I'll need to futz around with the signing code to check this, and then it's mostly a matter of backgrounding it and checking back. I'm being way more careful about this patch than I normally would be; I'm pretty confident it's fine, but breaking automatic updates would be very very bad, so I'm being really pedantic about this. At some point I'd like to have some actual automated tests for this stuff, but that will have to wait. |
I encountered this while testing sandstorm-io#3603; as written, the `update-tool add` subcommand does not print out the correct public key for the keypair that it generates; instead, it prints out the seed used to generate the keypair. This patch changes the code to actually generate the keypair from the seed, and then print out the public key that should be added to update-tool.capnp I assume this code was not used to generate the current contents of that file, since if it was loading the keyring would fail (which is the problem I encountered).
Alright, finished with the above checks and everything seems to work; marking ready for review. |
Trying to make sure I understand what's going on... So previously:
However, now you effectively cannot set yourself to UID 0 in a userns anymore. Your solution is to create yet another level of process hierarchy: A parent process to the update monitor itself. This parent process stays entirely outside the user namespace and PID namespace. When it sees the update monitor exit, it restarts it. A major problem with this approach is that the parent process itself never updates. So, we can't ever change anything about it. But also, I think it presents a problem on the first update: Existing sandstorm servers today don't have this parent process running. When they update to this logic the first time, they will end up with an update monitor that expects a parent process but doesn't have one. So the next update after that will, I think, cause the server to exit. Here are two other solutions to consider:
I'm inclined to say we should try the extra-userns approach. If it works, it's a much narrower change. In fact, in this approach, we don't have to change anything for the common case where the server runs as root -- we only add some extra new code to the start of the server monitor for the non-root case. Additionally, this approach doesn't touch the update monitor at all, so in the case that it's terribly broken, we can put another update to fix it -- no servers will need human intervention to fix. So, that feels much less risky. Thoughts? |
(2) does seem much simpler. Implemented in #3609, closing. |
Rather than having the update monitor itself exec the newly downloaded
version of sandstorm, it simply exits (with status 0), and its parent
process takes this as an indictation that it should re-spawn it (with
the new version). This avoids the need to map ourselves to UID 0 in the
user namespace, which avoids the EPERM error we were getting when
later hiding the real UID from a grain.
Per discussion on the issue, the old design no longer works as of Linux
5.12, after which spawning the grain we would need CAP_SETFCAP, which we
do not want to have.
This seems to basically work, but I'm marking it as a draft for now for a couple reasons: