Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umbrella: sled-agent and GZ must support read-only root #4292

Open
wesolows opened this issue Oct 18, 2023 · 1 comment
Open

umbrella: sled-agent and GZ must support read-only root #4292

wesolows opened this issue Oct 18, 2023 · 1 comment
Labels
enhancement New feature or request. mvp security Related to security. Sled Agent Related to the Per-Sled Configuration and Management
Milestone

Comments

@wesolows
Copy link

This ticket covers a collection of work, probably mainly in sled-agent, that is required to support and operate correctly with the root filesystem (that ZFS dataset mounted at /) that is read-only at runtime. Currently, the root filesystem is immutable but can be modified at runtime; upon a subsequent boot, the fixed contents created at build time are restored but the filesystem is then mounted read-write and can be modified by software such as sled-agent and svc.configd in the usual manner.

It is highly desirable to have the root filesystem be read-only at runtime as well. One of the most significant reasons is that then one needn't worry about running the rootfs out of space, which can make various services fail, lose log and error data vital to debugging, and be very difficult to diagnose without manual intervention on the Unix shell (a hard no-no in a quality revenue product). A read-only root also provides some additional security and reliability: not only will we know the contents at boot, we will have greater confidence that the contents are the same at all times.

There are several classes of work required here:

  1. In the vein of Persist fault management data across reboots #4211 some subtrees should simply have persistent mountpoints. This is especially true for locations that are primarily or exclusively write-only, such as log and error data.
  2. Multiple pieces of SMF rely on a read-write repository. The database can be moved with svcadm _smf_repository_switch, on an ephemeral but read-write backing store.
  3. Logic in either service start methods or various daemons may expect to modify configuration files in place or recreate them. There are various ways to handle this; the best is to do away with the configuration file entirely (or at least any possible need to modify it other than at build time) and instead have consumers use the SMF repository directly. In some cases, mainly for older and/or third-party tools, this may be impractical or expensive to maintain. One option is to create a lofs mount from a configuration file in a read-write filesystem. That might need to be augmented by delivering a new start method that consumes service properties and generates the configuration, then mounts it at the proper location before starting the service itself. The Fishworks stack included a general mechanism for this that was used for a significant number of services; we could do something similar if we find many services that require this type of approach. Note that since we're dealing here only with services that must run in the GZ, there may not be enough of them for this to require a complicated general solution.

There are probably some additional classes that will make themselves known in specific instances. This is a bit finicky to work on because some software may try to write to files in the root filesystem only in certain paths that are not especially easy to find. It can be made easier by minimising the set of software that ever runs in the GZ, which has many other benefits besides.

Additional specific tickets should probably be filed. Additional tags will probably be needed as the scope of work is better understood.

@wesolows wesolows added enhancement New feature or request. security Related to security. Sled Agent Related to the Per-Sled Configuration and Management mvp labels Oct 18, 2023
@wesolows wesolows added this to the MVP milestone Oct 18, 2023
@davepacheco
Copy link
Collaborator

Thanks for filing this. RFD 418 discusses some of this, too.

While it would take some up-front investment to smoke out the software that expects a writable root filesystem, I think it would pay off. In my past experience, unintended (or forgotten) divergence from a clean state has caused or exacerbated a lot of incidents. We've seen this at Oxide too with some of the cold boot bugs @smklein fixed (like #3018). Making the root filesystems read-only ensures that we stop introducing new bugs of that kind rather than only identifying those much later when the system happens to be rebooted during testing.

To be clear, a read-only root filesystem in the GZ is separate from a read-only filesystem in non-global Omicron component zones. I think both are valuable for the same reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. mvp security Related to security. Sled Agent Related to the Per-Sled Configuration and Management
Projects
None yet
Development

No branches or pull requests

2 participants