Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Unit.reboot for machine charms #1041

Merged
merged 31 commits into from
Oct 20, 2023

Conversation

tonyandrewmeyer
Copy link
Contributor

@tonyandrewmeyer tonyandrewmeyer commented Oct 10, 2023

Adds a new reboot method to Unit, that's a wrapper around the juju-reboot command.

juju-reboot is only available for machine charms. When it's run for a K8s charm, Juju logs an error message but doesn't return an response with the error (or have a non-zero exit), which means that we have to fail silently.

Adds a corresponding reboot() method to the harness backend.

From the point of view of the harness, we assume that everything is essentially the same after rebooting without --now, so this is a no-op. The most likely change that would impact the harness would be a leadership change, but it seems better to let the Charmer model that explicitly in their tests.

When rebooting with --now, what should happen is that everything after that point in the event handler is skipped, there's a simulated reboot (again, roughly a no-op), and then the event is re-emitted. I went down a long rabbit-hole trying to get the harness to do this, but it ended up way too messy for minimal benefit. Instead, a SystemExit exception is raised and the Charmer is expected to handle it in their tests, for example:

def test_installation(self):
    self.harness.begin()
    with self.assertRaises(SystemExit):
        self.harness.charm.on.install.emit()
        # More asserts here that the first part of installation was done.
    self.harness.charm.on.install.emit()
    # More asserts here that the installation continued appropriately.

Fixes #929

@tonyandrewmeyer tonyandrewmeyer added the feature New feature or request label Oct 10, 2023
@tonyandrewmeyer tonyandrewmeyer changed the title Expose juju-reboot [--now]. feat: add Unit.reboot for machine charms Oct 10, 2023
ops/model.py Outdated Show resolved Hide resolved
ops/model.py Outdated Show resolved Hide resolved
ops/model.py Outdated Show resolved Hide resolved
ops/testing.py Outdated Show resolved Hide resolved
ops/testing.py Outdated Show resolved Hide resolved
ops/testing.py Outdated Show resolved Hide resolved
test/test_testing.py Outdated Show resolved Hide resolved
test/test_testing.py Outdated Show resolved Hide resolved
test/test_model.py Show resolved Hide resolved
@benhoyt
Copy link
Collaborator

benhoyt commented Oct 10, 2023

Oh, looks like tox -e docs is failing right now.

Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks! Let's wait for @jameinel's review too.

ops/testing.py Outdated Show resolved Hide resolved
@benhoyt benhoyt requested a review from NucciTheBoss October 11, 2023 03:25
@jameinel
Copy link
Member

When it's run for a K8s charm, Juju logs an error message but doesn't return an response with the error (or have a non-zero exit), which means that we have to fail silently.

If we can tell that you're in k8s, can't we fail inside ops itself?
I do think Juju should be failing more appropriately (and I don't think there is a reason why we couldn't force the pod to terminate and restart). I don't think people have been asking for that ability, but it would be more consistent to have it do something.

@benhoyt
Copy link
Collaborator

benhoyt commented Oct 17, 2023

If we can tell that you're in k8s, can't we fail inside ops itself?

We don't have an easy way to do that right now. You can look for containers in metadata.yaml, but charms with no containers can be deployed on K8s too. Harry mentioned that JUJU_CONTAINER_NAMES will be set on K8s (it may be empty, but it'll be set).

So maybe it's time to add a helper method is_k8s_sidecar. We've needed this before too. Or maybe charm.type (though something more specific than "type") or something which is an enum, one of machine, k8s-podspec, k8s-sidecar.

@jameinel
Copy link
Member

jameinel commented Oct 17, 2023 via email

Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having "if harness.last_rebooted is not None" is fine as a way to test that it actually did trigger a reboot if there was some other thing that you were looking for.

I also think we probably want some form of interrupt in python code. I've tested at least:

  echo hello
  reboot
  echo goodbye

In bash, and you will occasionally see the "goodbye" but not always. I don't know whether juju-reboot --now guarantees you will/won't get your process back.
But likely charm authors will assume that if they pass now=True they won't execute any more code.

test/test_testing.py Outdated Show resolved Hide resolved
@tonyandrewmeyer
Copy link
Contributor Author

I think just having reboot do something on k8s is also fine, and we wouldn't want to work around the workaround in the future.

If Juju adds support for doing something to k8s when juju-reboot is run, and we just have the note in the docs, then updating your Juju version but without ops changing reboot will just start working.

If we have something like:

if container_type == "machine":
    raise ModelError("reboot() is only available when running on machine controllers")

Then you'd need to update ops as well (to the version where we presumably strip that out). That would be less surprising, which seems good.

So maybe it's time to add a helper method is_k8s_sidecar. We've needed this before too. Or maybe charm.type (though something more specific than "type") or something which is an enum, one of machine, k8s-podspec, k8s-sidecar.

charm.platform? (To match charmhub.io's Platform filter).

@tonyandrewmeyer
Copy link
Contributor Author

I don't know whether juju-reboot --now guarantees you will/won't get your process back. But likely charm authors will assume that if they pass now=True they won't execute any more code.

In my tests, I couldn't get any (Python) code to run after the call to reboot (which was done with subprocess.run as in the implementation, so Python would be waiting for the subprocess to exit). I guess we could sys.exit(0) in the now=True case - the behaviour would essentially be the same as now, but provide more of a guarantee that you didn't get stray lines of code executed.

Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like it! Approved, with one nit comment on the docstring.

ops/testing.py Outdated Show resolved Hide resolved
@jameinel
Copy link
Member

FWIW, I raised the question more so that we wouldn't have a flaky test suite, than because I was worried about the reboot time not being precise enough.
But I'm also happy to go with reboot count instead, as the actual datetime isn't something you could really test.

@benhoyt benhoyt merged commit 7904739 into canonical:main Oct 20, 2023
19 checks passed
@tonyandrewmeyer tonyandrewmeyer deleted the juju-reboot-929 branch October 20, 2023 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No interface to juju-reboot hook tool in ops
3 participants