Skip to content

Commit

Permalink
docs: go full Diátaxis, ingesting the relevant juju.is/docs/sdk docum…
Browse files Browse the repository at this point in the history
…entation (#1481)

[**Preview on
ReadTheDocs**](https://ops--1481.org.readthedocs.build/en/1481/)

This PR adds all the juju.is/docs content featuring Ops.

The content was originally formatted for Discourse; this PR reformats it
to serve the RTD project the docs are destined for.

The PR also retitles all how-to guides as "Manage x", often with
sections of the form "Implement the feature" and "Test the feature".
This extends a pattern we already had in juju docs, which will continue
in the juju RTD docs, and which we've also implemented in the new
charmcraft RTD docs. Whereas in the past we tended to associate "manage"
with a juju admin, here it's used as a catch-all for "how-to do all the
things that it is possible to do about x in juju/charmcraft/ops". I
believe the pattern is no clumsier than what we had before and having
the same terminology across juju/charmcraft/ops , along with
cross-referencing, will make it easier for people to piece together the
story. (E.g., in Juju, "manage relations" will include `juju integrate`,
while in charmcraft it'll mean declaring a particular endpoint and in
ops it'll mean reacting to certain events and using or implementing an
interface.)

Finally, the PR trims down the Reference and Explanation docs to just
content directly relevant to ops. E.g., the charm taxonomy, events /
lifecycle bits, charm maturity, charm best practices are all things that
belong at a higher level of abstraction, directly in juju docs. E.g., in
the charm best practices doc the first best practice we speak about is
using ops. (A lot of that content should also be a lot leaner -- e.g.,
I'm working to incorporate all the charm SDK event content into the juju
doc on hooks, and the charm maturity and charm best practices could
really use a complete rethink as well, as they are way too verbose,
always understandable, and often duplicate content that's better
expressed at a more specific level.)

Some things that I think should be addressed in subsequent PRs:

- Everywhere: (1) The docs reference juju and charmcraft pages. As those
pages are not yet live, the links are empty. Once the pages are live,
which should be very soon, the links should be updated. Note: This does
create a temporary crisis in the ops RTD docs which, unlike the juju and
the charmcraft RTD docs, are actually already live. I believe that's
acceptable, especially since (a) at present people still only look at
the ops RTD docs for reference and (b) the crisis will be resolved very
soon. (2) The testing docs, from the tutorial through the how-to guides
and all the way to reference, should be thoroughly revisited to reflect
(a) the fact that scenario is now the de facto unit testing tool and
that harness is legacy and (b) the fact that most of our "how to manage
x" docs have sections on testing"; in short, I anticipate that a lot of
that stuff will end up being removed altogether, the result being much
leaner and more relevant docs.
- Tutorial: The Kubernetes tutorial should use `charmcraft init`
(instead of creating all the files by hand) and should only focus on
features that are aided by ops (i.e., the publish section should be made
part of the planned charmcraft tutorial featuring a regular,
non-12-factor app, charm). Also, it's a good idea to make it quite a bit
shorter. (When we first created it, it was trying to make up for the
fact that our how-to guides were not very helpful. At present, however,
I believe it can be much leaner.)
- How-to guides: The "Manage x" formula, especially as extended just now
in the new charmcraft RTD docs, where it includes a "Manage charms",
suggest we should perhaps have a "Manage charms" how-to in the ops docs
too. If in Juju managing charms means, e.g., `juju find`, `juju
download`, `juju deploy`, and `juju refresh`, and if in charmcraft it
means `charmcraft init`, ... `charmcraft pack`, `charmcraft upload`,
etc., in ops it might be the place where we give the recipe for the
general charm logic from the point of view of ops -- import ops,
logging, creating a charm class that inherits from CharmBase, the init
method, etc. PS I anticipate some of the content from "Run workloads
with..." and the logs how-to might find their home here.

---------

Co-authored-by: Ben Hoyt <[email protected]>
Co-authored-by: Tony Meyer <[email protected]>
Co-authored-by: Dave Wilding <[email protected]>
Co-authored-by: Tony Meyer <[email protected]>
  • Loading branch information
5 people authored Dec 18, 2024
1 parent eb80926 commit d2508f8
Show file tree
Hide file tree
Showing 58 changed files with 9,523 additions and 77 deletions.
44 changes: 44 additions & 0 deletions docs/explanation/charm-relation-interfaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
(charm-relation-interfaces)=
# Charm-relation-interfaces

> See also: {ref}`manage-interfaces`
[`charm-relation-interfaces`](https://github.com/canonical/charm-relation-interfaces) is a repository containing specifications, databag schemas and interface tests for Juju relation interfaces. In other words, it is the source of truth for data and behavior of providers and requirers of integrations.

The purpose of this project is to provide uniformity in the landscape of all possible integrations and promote charm interoperability.

Juju interfaces are untyped, which means that for juju to think two charms can be integrated all it looks at is whether the interface names of the two endpoints you're trying to connect are the same string. But it might be that the two charms have different, incompatible implementations of two different integrations that happen to have the same name.

In order to prevent two separate charms from rolling their own integration with the same name, and prevent a sprawl of many subtly different interfaces with similar semantics and similar purposes, we introduced `charm-relation-interfaces`.

## Using `charm-relation-interfaces`

If you have a charm that provides a service, you should search `charm-relation-interfaces` (or directly charmhub in the future) and see if it exists already, or perhaps a similar one exists that lacks the semantics you need and can be extended to support it.

Conversely, if the charm you are developing needs some service (a database, an ingress url, an authentication endpoint...) you should search `charm-relation-interfaces` to see if there is an interface you can use, and to find existing charms that provide it.

There are three actors in play:

* **the owner of the specification** of the interface, which also owns the tests that can be used to verify "does charm X 'really' support this interface?". This is the `charm-relation-interfaces` repo.
* **the owner of the implementation** of an interface. In practice, this often is the charm that owns the charm library with the reference implementation for an interface.
* **the interface user**: a charm that wants to use the interface (either as requirer or as provider).

The interface user needs the implementation (typically, the provider also happens to be the owner and so it already has the implementation). This is addressed by `charmcraft fetch-lib`.

The owner of the implementation needs the specification, to help check that the implementation is in fact compliant.

## Repository structure

For each interface, the charm-relation-interfaces repository hosts:
- the **specification**: a semi-formal definition of what the semantics of the interface is, and what its implementations are expected to do in terms of both the provider and the requirer
- a list of **reference charms**: these are the charms that implement this interface, typically, the owner of the charm library providing the original implementation.
- the **schema**: pydantic models unambiguously defining the accepted unit and application databag contents for provider and requirer.
- the **interface tests**: python tests that can be run to verify that a charm complies with the interface specification.


## Charm relation interfaces in Charmhub
In the future, Charmhub will have a searchable collection of integration interfaces.
Charmhub will, for all charms using the interface, verify that they implement it correctly (regardless of whether they use the 'official' implementation or they roll their own) in order to give the charm a happy checkmark on `charmhub.io`. In order to do that it will need to fetch the specification (from `charm-relation-interfaces`) *and* the charm repo, because we can't know what implementation they are using: we need the source code.



66 changes: 66 additions & 0 deletions docs/explanation/holistic-vs-delta-charms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
(holistic-vs-delta-charms)=
# Holistic vs delta charms


Charm developers have had many discussion about "holistic" charms compared to "delta" charms, and which approach is better. First, let's define those terms:

* A *delta-based* charm is when the charm handles each kind of Juju hook with a separate handler function, which does the minimum necessary to process that kind of event.
* A *holistic* charm handles some or all Juju hooks using a common code path such as `_update_charm`, which queries the charm config and relation data and "rewrites the world", that is, rewrites application configuration and restarts necessary services.

Juju itself nudges charm authors in the direction of delta-based charms, because it provides specific event kinds that signal that one "thing" changed: `config-changed` says that a config value changed, `relation-changed` says that relation data has changed, `pebble-ready` signals that the Pebble container is ready, and so on.

However, this only goes so far: `config-changed` doesn't tell the charm which config keys changed, and `relation-changed` doesn't tell the charm how the relation data changed.

In addition, the charm may receive an event like `config-changed` before it's ready to handle it, for example, if the container is not yet ready (`pebble-ready` has not yet been triggered). In such cases, charms could try to wait for both events to occur, possibly storing state to track which events have occurred -- but that is error-prone.

Alternatively, a charm can use a holistic approach and handle both `config-changed` and `pebble-ready` with a single code path, as in this example:

```python
class MyCharm(ops.CharmBase):
def __init__(self, framework: ops.Framework):
super().__init__(framework)
framework.observe(self.on.config_changed, self._update_charm)
framework.observe(self.on['redis'].pebble_ready, self._update_charm)

def _update_charm(self, _: ops.EventBase): # event parameter isn't used
redis_port = self.config.get('redis-port')
if not redis_port:
# pebble-ready happened first, wait for config-changed
return

# If both the Pebble container and config are ready, rewrite the
# container's config file and restart Redis if needed.
container = self.unit.get_container('redis')
try:
self._update_redis_config(container, redis_port)
except ops.pebble.ConnectionError:
# config-changed happened first, wait for pebble-ready
return
```


## When to use the holistic approach

If a charm is waiting for a collection of events, as in the example above, it makes sense to group those events together and handle them holistically, with a single code path.

In other words, when writing a charm, it's not so much "should the *charm* be holistic?" as "does it make sense for *these events* to be handled holistically?"

Using the holistic approach is normally centred around configuring an application. Various events that affect configuration use a common handler, to simplify writing an application config file and restarting the application. This is common for events like `config-changed`, `relation-changed`, `secret-changed`, and `pebble-ready`.

Many existing charms use holistic event handling. A few examples are:

- [`alertmanager-k8s` uses a `_common_exit_hook` method to unify several event handlers](https://github.com/canonical/alertmanager-k8s-operator/blob/561f1d8eb1dc6e4511c1c0b3cba444a3ec399464/src/charm.py#L390)
- [`hello-kubecon` is a simple charm that handles `config-changed` and `pebble-cready` holistically](https://github.com/jnsgruk/hello-kubecon/blob/dbd133466dde59ee64f20a732a8f3d2e560ec3b8/src/charm.py#L32-L33)
- [`prometheus-k8s` uses a common `_configure` method to handle various events](https://github.com/canonical/prometheus-k8s-operator/blob/84c6a406ed585cdb7ba40e01a258864987d6f67f/src/charm.py#L221-L230)
- [`sdcore-gnbsim-k8s` also uses a common `_configure` method](https://github.com/canonical/sdcore-gnbsim-k8s-operator/blob/ea2afe069346757b1eb6c02de5b4f50f90e81698/src/charm.py#L84-L92)


## Which events can be handled holistically?

Only some events make sense to handle holistically. For example, `remove` is triggered when a unit is about to be terminated, so it doesn't make sense to handle it holistically.

Similarly, events like `secret-expired` and `secret-rotate` don't make sense to handle holistically, because the charm must do something specific in response to the event. For example, Juju will keep triggering `secret-expired` until the charm creates a new secret revision by calling [`event.secret.set_content()`](https://ops.readthedocs.io/en/latest/#ops.Secret.set_content).

This is very closely related to [which events can be `defer`red](https://juju.is/docs/sdk/how-and-when-to-defer-events). A good rule of thumb is this: if an event can be deferred, it may make sense to handle it holistically.

On the other hand, if an event cannot be deferred, the charm cannot handle it holistically. This applies to action "events", `stop`, `remove`, `secret-expired`, `secret-rotate`, and Ops-emitted events such as `collect-status`.
48 changes: 48 additions & 0 deletions docs/explanation/how-and-when-to-defer-events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
(how-and-when-to-defer-events)=
# How, and when, to defer events

Deferring an event is a common pattern, and when used appropriately is a convenient tool for charmers. However, there are limitations to `defer()` - in particular, that the charm has no way to specify when the handler will be re-run, and that event ordering and context move away from the expected pattern. Our advice is that `defer()` is a good solution for some problems, but is best avoided for others.

## Good: retrying on temporary failure

If the charm encounters a temporary failure (such as working with a container or an external API), and expects that the failure may be very short lived, our recommendation is to retry several times for up to a second. If the failure continues, but the charm still expects that it will be resolved without any intervention from a human, then deferring the event handler is often a good choice - along with placing the unit or app in waiting status.

Note that it’s important to consider that when the deferred handler is run again, the Juju context may not be exactly the same as it was when the event was first emitted, so the charm code needs to be aware of this.

If the temporary failure is because the workload is busy, and the charm is deployed to a Kubernetes sidecar controller, you might be able to avoid the defer using a [Pebble custom notice](https://juju.is/docs/sdk/interact-with-pebble#heading--use-custom-notices-from-the-workload-container). For example, if the code can’t continue because the workload is currently restarting, if you can have a post-completion hook for the restart that executes `pebble notify`, then you can ensure that the charm is ‘woken up’ at the right time to handle the work.

In the future, we hope to see a Juju ‘request re-emit event’ feature that will let the charm tell Juju when it expects the problem to be resolved.

## Reconsider: sequencing

There are some situations where sequencing of units needs to be arranged - for example, to restart replicas before a primary is restarted. Deferring a handler can be used to manage this situation. However, sequencing can also be arranged using a peer relation, and there’s a convenient [rolling-ops charm lib](https://github.com/canonical/charm-rolling-ops) that implements this for you, and we recommend using that approach first.

Using a peer relation to orchestrate the rolling operation allows for more fine-grained control than a simple defer, and avoids the issue of not having control over when the deferred handler will be re-run.

## Reconsider: waiting for a collection of events

It’s common for charms to need a collection of information in order to configure the application (for example, to write a configuration file). For example, the configuration might require a user-set config value, a secret provided by a relation, and a Kubernetes sidecar container to be ready.

Rather than having the handlers for each of these events (`config-changed`, `secret-changed` and/or `relation-changed`, `pebble-ready`) defer if other parts of the configuration are not yet available, it’s best to have the charm observe all three events and set the unit or app state to waiting, maintenance, or blocked status (or have the `collect-status` handler do this) and return. When the last piece of information is available, the handler that notifies the charm of that will complete the work. This is commonly called the "holistic" event handling pattern.

Avoiding defer means that there isn’t a queue of deferred handlers that all do the same work - for example, if `config-changed`, `relation-changed`, and `pebble-ready` were all deferred then when they were all ready, they would all run successfully. This is particularly important when the work is expensive - such as an application restart after writing the configuration, so should not be done unnecessarily.

## OK: waiting without expecting a follow-up event

In some situations, the charm is waiting for a system to be ready, but it’s not one that will trigger a Juju event (as in the case above). For example, the charm might need the workload application to be fully started up, and that might happen after all of the initial start, `config-changed`, `relation-joined`, `pebble-ready`, etc events.

Deferring the work here is ok, but it’s important to consider the delay between deferring the event and its eventual re-emitting - it’s not safe to assume that this will be a small period of time, unless you know that another event can be expected.

For a Kubernetes charm, If the charm is waiting on the workload and it’s possible to have the workload execute a command when it’s ready, then using a [Pebble custom notice](https://juju.is/docs/sdk/interact-with-pebble#heading--use-custom-notices-from-the-workload-container) is much better than deferring. This then becomes another example of “waiting for a collection of events”, described above.

## Not possible: actions, shutting down, framework generated events, secrets

In some situations, it’s not possible to defer an event, and attempting to do so will raise a `RuntimeError`.

In some cases, this is because the events are run with every Juju hook event, such as `pre-commit`, `commit`, and `update-status`. In others, it’s because Juju provides a built-in retry mechanism, such as `secret-expired` and `secret-rotate`.

With actions, there’s an expectation that the action either succeeds or fails immediately, and there are mechanisms for communicating directly with the user that initiated the action (`event.log` and `event.set_results`). This means that deferring an action event doesn’t make sense.

Finally, when doing cleanup during the shutdown phase of a charm’s lifecycle, deferring isn’t practical with the current implementation, where it’s tied to future events. For `remove`, for example, the unit will no longer exist after the event, so there will not be any future events that can trigger the deferred one - if there’s work that has to be done before the unit is gone, then you’ll need to enter an error state instead. The stop event is followed by remove, and possibly a few other events, but likewise has few chances to be re-emitted.

Note that all deferred events vanish when the unit is removed, so the charm code needs to take this into consideration.
14 changes: 14 additions & 0 deletions docs/explanation/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
(explanation)=
# Explanation

```{toctree}
:maxdepth: 1
charm-relation-interfaces
testing
interface-tests
holistic-vs-delta-charms
how-and-when-to-defer-events
storedstate-uses-limitations
```

39 changes: 39 additions & 0 deletions docs/explanation/interface-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
(interface-tests)=
# Interface tests
> See also: {ref}`manage-interfaces`
Interface tests are tests that verify the compliance of a charm with an interface specification.
Interface specifications, stored in {ref}`charm-relation-interfaces <charm-relation-interfaces>`, are contract definitions that mandate how a charm should behave when integrated with another charm over a registered interface.

Interface tests will allow `charmhub` to validate the integrations of a charm and verify that your charm indeeed supports "the" `ingress` interface and not just an interface called "ingress", which happens to be the same name as "the official `ingress` interface v2" as registered in charm-relation-interfaces (see [here](https://github.com/canonical/charm-relation-interfaces/tree/main/interfaces/ingress/v2)).

Also, they allow alternative implementations of an interface to validate themselves against the contractual specification stored in charm-relation-interfaces, and they help verify compliance with multiple versions of an interface.

An interface test is a contract test powered by {ref}``Scenario` <scenario>` and a pytest plugin called [`pytest-interface-tester`](https://github.com/canonical/pytest-interface-tester). An interface test has the following pattern:
1) **GIVEN** an initial state of the relation over the interface under test
2) **WHEN** a specific relation event fires
3) **THEN** the state of the databags is valid (e.g. it satisfies an expected pydantic schema)

On top of databag state validity, one can check for more elaborate conditions.

A typical interface test will look like:

```python
from interface_tester import Tester

def test_data_published_on_changed_remote_valid():
"""This test verifies that if the remote end has published valid data and we receive a db-relation-changed event, then the schema is satisfied."""
# GIVEN that we have a relation over "db" and the remote end has published valid data
relation = Relation(endpoint='db', interface='db',
remote_app_data={'model': '"bar"', 'port': '42', 'name': '"remote"', },
remote_units_data={0: {'host': '"0.0.0.42"', }})
t = Tester(State(relations=[relation]))
# WHEN the charm receives a db-relation-changed event
state_out = t.run(relation.changed_event)
# THEN the schema is valid
t.assert_schema_valid()
```

This allows us to, independently from what charm we are testing, determine if the behavioural specification of this interface is complied with.


Loading

0 comments on commit d2508f8

Please sign in to comment.