canonical · tonyandrewmeyer · Dec 18, 2024 · Nov 29, 2024 · Dec 3, 2024 · Dec 3, 2024
diff --git a/docs/explanation/charm-relation-interfaces.md b/docs/explanation/charm-relation-interfaces.md
@@ -0,0 +1,44 @@
+(charm-relation-interfaces)=
+# Charm-relation-interfaces
+
+> See also: {ref}`manage-interfaces`
+
+[`charm-relation-interfaces`](https://github.com/canonical/charm-relation-interfaces) is a repository containing specifications, databag schemas and interface tests for Juju relation interfaces. In other words, it is the source of truth for data and behavior of providers and requirers of integrations.
+
+The purpose of this project is to provide uniformity in the landscape of all possible integrations and promote charm interoperability.
+
+Juju interfaces are untyped, which means that for juju to think two charms can be integrated all it looks at is whether the interface names of the two endpoints you're trying to connect are the same string. But it might be that the two charms have different, incompatible implementations of two different integrations that happen to have the same name.
+
+In order to prevent two separate charms from rolling their own integration with the same name, and prevent a sprawl of many subtly different interfaces with similar semantics and similar purposes, we introduced `charm-relation-interfaces`.
+
+## Using `charm-relation-interfaces`
+
+If you have a charm that provides a service, you should search `charm-relation-interfaces` (or directly charmhub in the future) and see if it exists already, or perhaps a similar one exists that lacks the semantics you need and can be extended to support it.
+
+Conversely, if the charm you are developing needs some service (a database, an ingress url, an authentication endpoint...)  you should search `charm-relation-interfaces` to see if there is an interface you can use, and to find existing charms that provide it. 
+
+There are three actors in play:
+
+* **the owner of the specification** of the interface, which also owns the tests that can be used to verify "does charm X 'really' support this interface?". This is the `charm-relation-interfaces` repo.
+* **the owner of the implementation** of an interface. In practice, this often is the charm that owns the charm library with the reference implementation for an interface.
+* **the interface user**: a charm that wants to use the interface (either as requirer or as provider).
+
+The interface user needs the implementation (typically, the provider also happens to be the owner and so it already has the implementation). This is addressed by `charmcraft fetch-lib`.
+
+The owner of the implementation needs the specification, to help check that the implementation is in fact compliant.
+
+## Repository structure
+
+For each interface, the charm-relation-interfaces repository hosts:
+- the **specification**: a semi-formal definition of what the semantics of the interface is, and what its implementations are expected to do in terms of both the provider and the requirer
+- a list of **reference charms**: these are the charms that implement this interface, typically, the owner of the charm library providing the original implementation.
+- the **schema**: pydantic models unambiguously defining the accepted unit and application databag contents for provider and requirer.
+- the **interface tests**: python tests that can be run to verify that a charm complies with the interface specification.
+
+
+## Charm relation interfaces in Charmhub
+In the future, Charmhub will have a searchable collection of integration interfaces. 
+Charmhub will, for all charms using the interface, verify that they implement it correctly (regardless of whether they use the 'official' implementation or they roll their own) in order to give the charm a happy checkmark on `charmhub.io`. In order to do that it will need to fetch the specification (from `charm-relation-interfaces`) *and* the charm repo, because we can't know what implementation they are using: we need the source code.
+
+
+
diff --git a/docs/explanation/holistic-vs-delta-charms.md b/docs/explanation/holistic-vs-delta-charms.md
@@ -0,0 +1,66 @@
+(holistic-vs-delta-charms)=
+# Holistic vs delta charms
+
+
+Charm developers have had many discussion about "holistic" charms compared to "delta" charms, and which approach is better. First, let's define those terms:
+
+* A *delta-based* charm is when the charm handles each kind of Juju hook with a separate handler function, which does the minimum necessary to process that kind of event.
+* A *holistic* charm handles some or all Juju hooks using a common code path such as `_update_charm`, which queries the charm config and relation data and "rewrites the world", that is, rewrites application configuration and restarts necessary services.
+
+Juju itself nudges charm authors in the direction of delta-based charms, because it provides specific event kinds that signal that one "thing" changed: `config-changed` says that a config value changed, `relation-changed` says that relation data has changed, `pebble-ready` signals that the Pebble container is ready, and so on.
+
+However, this only goes so far: `config-changed` doesn't tell the charm which config keys changed, and `relation-changed` doesn't tell the charm how the relation data changed.
+
+In addition, the charm may receive an event like `config-changed` before it's ready to handle it, for example, if the container is not yet ready (`pebble-ready` has not yet been triggered). In such cases, charms could try to wait for both events to occur, possibly storing state to track which events have occurred -- but that is error-prone.
+
+Alternatively, a charm can use a holistic approach and handle both `config-changed` and `pebble-ready` with a single code path, as in this example:
+
+```python
+class MyCharm(ops.CharmBase):
+    def __init__(self, framework: ops.Framework):
+        super().__init__(framework)
+        framework.observe(self.on.config_changed, self._update_charm)
+        framework.observe(self.on['redis'].pebble_ready, self._update_charm)
+
+    def _update_charm(self, _: ops.EventBase):  # event parameter isn't used
+        redis_port = self.config.get('redis-port')
+        if not redis_port:
+            # pebble-ready happened first, wait for config-changed
+            return
+
+        # If both the Pebble container and config are ready, rewrite the
+        # container's config file and restart Redis if needed.
+        container = self.unit.get_container('redis')
+        try:
+	        self._update_redis_config(container, redis_port)
+	    except ops.pebble.ConnectionError:
+	    	# config-changed happened first, wait for pebble-ready
+            return
+```
+
+
+## When to use the holistic approach
+
+If a charm is waiting for a collection of events, as in the example above, it makes sense to group those events together and handle them holistically, with a single code path.
+
+In other words, when writing a charm, it's not so much "should the *charm* be holistic?" as "does it make sense for *these events* to be handled holistically?"
+
+Using the holistic approach is normally centred around configuring an application. Various events that affect configuration use a common handler, to simplify writing an application config file and restarting the application.  This is common for events like `config-changed`, `relation-changed`, `secret-changed`, and `pebble-ready`.
+
+Many existing charms use holistic event handling. A few examples are:
+
+- [`alertmanager-k8s` uses a `_common_exit_hook` method to unify several event handlers](https://github.com/canonical/alertmanager-k8s-operator/blob/561f1d8eb1dc6e4511c1c0b3cba444a3ec399464/src/charm.py#L390)
+- [`hello-kubecon` is a simple charm that handles `config-changed` and `pebble-cready` holistically](https://github.com/jnsgruk/hello-kubecon/blob/dbd133466dde59ee64f20a732a8f3d2e560ec3b8/src/charm.py#L32-L33)
+- [`prometheus-k8s` uses a common `_configure` method to handle various events](https://github.com/canonical/prometheus-k8s-operator/blob/84c6a406ed585cdb7ba40e01a258864987d6f67f/src/charm.py#L221-L230)
+- [`sdcore-gnbsim-k8s` also uses a common `_configure` method](https://github.com/canonical/sdcore-gnbsim-k8s-operator/blob/ea2afe069346757b1eb6c02de5b4f50f90e81698/src/charm.py#L84-L92)
+
+
+## Which events can be handled holistically?
+
+Only some events make sense to handle holistically. For example, `remove` is triggered when a unit is about to be terminated, so it doesn't make sense to handle it holistically.
+
+Similarly, events like `secret-expired` and `secret-rotate` don't make sense to handle holistically, because the charm must do something specific in response to the event. For example, Juju will keep triggering `secret-expired` until the charm creates a new secret revision by calling [`event.secret.set_content()`](https://ops.readthedocs.io/en/latest/#ops.Secret.set_content).
+
+This is very closely related to [which events can be `defer`red](https://juju.is/docs/sdk/how-and-when-to-defer-events). A good rule of thumb is this: if an event can be deferred, it may make sense to handle it holistically.
+
+On the other hand, if an event cannot be deferred, the charm cannot handle it holistically. This applies to action "events", `stop`, `remove`, `secret-expired`, `secret-rotate`, and Ops-emitted events such as `collect-status`.
diff --git a/docs/explanation/how-and-when-to-defer-events.md b/docs/explanation/how-and-when-to-defer-events.md
@@ -0,0 +1,48 @@
+(how-and-when-to-defer-events)=
+# How, and when, to defer events
+
+Deferring an event is a common pattern, and when used appropriately is a convenient tool for charmers. However, there are limitations to `defer()` - in particular, that the charm has no way to specify when the handler will be re-run, and that event ordering and context move away from the expected pattern. Our advice is that `defer()` is a good solution for some problems, but is best avoided for others.
+
+## Good: retrying on temporary failure
+
+If the charm encounters a temporary failure (such as working with a container or an external API), and expects that the failure may be very short lived, our recommendation is to retry several times for up to a second. If the failure continues, but the charm still expects that it will be resolved without any intervention from a human, then deferring the event handler is often a good choice - along with placing the unit or app in waiting status.
+
+Note that it’s important to consider that when the deferred handler is run again, the Juju context may not be exactly the same as it was when the event was first emitted, so the charm code needs to be aware of this.
+
+If the temporary failure is because the workload is busy, and the charm is deployed to a Kubernetes sidecar controller, you might be able to avoid the defer using a [Pebble custom notice](https://juju.is/docs/sdk/interact-with-pebble#heading--use-custom-notices-from-the-workload-container). For example, if the code can’t continue because the workload is currently restarting, if you can have a post-completion hook for the restart that executes `pebble notify`, then you can ensure that the charm is ‘woken up’ at the right time to handle the work.
+
+In the future, we hope to see a Juju ‘request re-emit event’ feature that will let the charm tell Juju when it expects the problem to be resolved.
+
+## Reconsider: sequencing
+
+There are some situations where sequencing of units needs to be arranged - for example, to restart replicas before a primary is restarted. Deferring a handler can be used to manage this situation. However, sequencing can also be arranged using a peer relation, and there’s a convenient [rolling-ops charm lib](https://github.com/canonical/charm-rolling-ops) that implements this for you, and we recommend using that approach first.
+
+Using a peer relation to orchestrate the rolling operation allows for more fine-grained control than a simple defer, and avoids the issue of not having control over when the deferred handler will be re-run.
+
+## Reconsider: waiting for a collection of events
+
+It’s common for charms to need a collection of information in order to configure the application (for example, to write a configuration file). For example, the configuration might require a user-set config value, a secret provided by a relation, and a Kubernetes sidecar container to be ready.
+
+Rather than having the handlers for each of these events (`config-changed`, `secret-changed` and/or `relation-changed`, `pebble-ready`) defer if other parts of the configuration are not yet available, it’s best to have the charm observe all three events and set the unit or app state to waiting, maintenance, or blocked status (or have the `collect-status` handler do this) and return. When the last piece of information is available, the handler that notifies the charm of that will complete the work. This is commonly called the "holistic" event handling pattern.
+
+Avoiding defer means that there isn’t a queue of deferred handlers that all do the same work - for example, if `config-changed`, `relation-changed`, and `pebble-ready` were all deferred then when they were all ready, they would all run successfully. This is particularly important when the work is expensive - such as an application restart after writing the configuration, so should not be done unnecessarily.
+
+## OK: waiting without expecting a follow-up event
+
+In some situations, the charm is waiting for a system to be ready, but it’s not one that will trigger a Juju event (as in the case above). For example, the charm might need the workload application to be fully started up, and that might happen after all of the initial start, `config-changed`, `relation-joined`, `pebble-ready`, etc events.
+
+Deferring the work here is ok, but it’s important to consider the delay between deferring the event and its eventual re-emitting - it’s not safe to assume that this will be a small period of time, unless you know that another event can be expected.
+
+For a Kubernetes charm, If the charm is waiting on the workload and it’s possible to have the workload execute a command when it’s ready, then using a [Pebble custom notice](https://juju.is/docs/sdk/interact-with-pebble#heading--use-custom-notices-from-the-workload-container) is much better than deferring. This then becomes another example of “waiting for a collection of events”, described above.
+
+## Not possible: actions, shutting down, framework generated events, secrets
+
+In some situations, it’s not possible to defer an event, and attempting to do so will raise a `RuntimeError`.
+
+In some cases, this is because the events are run with every Juju hook event, such as `pre-commit`, `commit`, and `update-status`. In others, it’s because Juju provides a built-in retry mechanism, such as `secret-expired` and `secret-rotate`.
+
+With actions, there’s an expectation that the action either succeeds or fails immediately, and there are mechanisms for communicating directly with the user that initiated the action (`event.log` and `event.set_results`). This means that deferring an action event doesn’t make sense.
+
+Finally, when doing cleanup during the shutdown phase of a charm’s lifecycle, deferring isn’t practical with the current implementation, where it’s tied to future events. For `remove`, for example, the unit will no longer exist after the event, so there will not be any future events that can trigger the deferred one - if there’s work that has to be done before the unit is gone, then you’ll need to enter an error state instead. The stop event is followed by remove, and possibly a few other events, but likewise has few chances to be re-emitted.
+
+Note that all deferred events vanish when the unit is removed, so the charm code needs to take this into consideration.
diff --git a/docs/explanation/index.md b/docs/explanation/index.md
@@ -0,0 +1,14 @@
+(explanation)=
+# Explanation
+
+```{toctree}
+:maxdepth: 1
+
+charm-relation-interfaces
+testing
+interface-tests
+holistic-vs-delta-charms
+how-and-when-to-defer-events
+storedstate-uses-limitations
+```
+
diff --git a/docs/explanation/interface-tests.md b/docs/explanation/interface-tests.md
@@ -0,0 +1,39 @@
+(interface-tests)=
+# Interface tests
+> See also: {ref}`manage-interfaces`
+
+Interface tests are tests that verify the compliance of a charm with an interface specification.
+Interface specifications, stored in {ref}`charm-relation-interfaces <charm-relation-interfaces>`, are contract definitions that mandate how a charm should behave when integrated with another charm over a registered interface.
+
+Interface tests will allow `charmhub` to validate the integrations of a charm and verify that your charm indeeed supports "the" `ingress` interface and not just an interface called "ingress", which happens to be the same name as "the official `ingress` interface v2" as registered in charm-relation-interfaces (see [here](https://github.com/canonical/charm-relation-interfaces/tree/main/interfaces/ingress/v2)).
+
+Also, they allow alternative implementations of an interface to validate themselves against the contractual specification stored in charm-relation-interfaces, and they help verify compliance with multiple versions of an interface.
+
+An interface test is a contract test powered by {ref}``Scenario` <scenario>` and a pytest plugin called [`pytest-interface-tester`](https://github.com/canonical/pytest-interface-tester). An interface test has the following pattern: 
+1) **GIVEN** an initial state of the relation over the interface under test
+2) **WHEN** a specific relation event fires
+3) **THEN** the state of the databags is valid (e.g. it satisfies an expected pydantic schema)
+
+On top of databag state validity, one can check for more elaborate conditions.
+
+A typical interface test will look like:
+
+```python
+from interface_tester import Tester
+
+def test_data_published_on_changed_remote_valid():
+    """This test verifies that if the remote end has published  valid data and we receive a db-relation-changed event, then the schema is satisfied."""
+    # GIVEN that we have a relation over "db" and the remote end has published valid data
+    relation = Relation(endpoint='db', interface='db',
+                        remote_app_data={'model': '"bar"', 'port': '42', 'name': '"remote"', },
+                        remote_units_data={0: {'host': '"0.0.0.42"', }})
+    t = Tester(State(relations=[relation]))
+    # WHEN the charm receives a db-relation-changed event
+    state_out = t.run(relation.changed_event)
+    # THEN the schema is valid
+    t.assert_schema_valid()
+```
+
+This allows us to, independently from what charm we are testing, determine if the behavioural specification of this interface is complied with.
+
+