add some high-level Reconfigurator docs #6207

davepacheco · 2024-08-01T23:33:00Z

This is probably still a bit rough, but maybe better than nothing?

I'm torn between whether this stuff should just be in Big Theory Statements (module-level comments) vs. separate docs like this. But in the limit it'd be great to have tutorial-like content here too and that'd probably not be in the source.

jgallagher

Thanks for writing this out! Most of my comments are from conversations we've had where there have been points of confusion, although I worry a little I'm suggesting adding too much nitpicky detail.

docs/reconfigurator.adoc

jgallagher · 2024-08-02T15:10:39Z

docs/reconfigurator.adoc

+// TODO mermaid diagram showing concurrent execution
+--
+
+This approach can be applied to many other areas like DNS configuration, too.  Other areas (e.g., the process of updating database state to reflect internal IP allocations) sometimes require different, _ad hoc_ mechanisms.  In all cases, though, the goals are what we said above: attempting to execute a stale blueprint must never move the system backwards and as long as _something_ is executing the newer blueprint, the system should eventually get to the new target.


Reading this, I'm now concerned our current executor implementation isn't correct for IP address allocation 😬. I need to go back and look at this; the specific thing that came to mind is:

Nexus A is executing an old blueprint where zone Z has an external IP address

Nexus B is executing a new blueprint where zone Z has been expunged

I think we could get in a situation where Nexus B deallocates the IP, then Nexus A reallocates the IP, which might cause Nexus B to fail somewhere later in execution (e.g., if it wanted to assign that IP to a different zone) or might not (if that IP is not being reassigned).

This would eventually resolve itself when Nexus A caught up to the correct blueprint, but maybe the intermediate failures here are bad and this needs some rework?

It sounds like the update to the IP allocations should be conditional on a generation number. That should fix this issue if it exists.

I think that may be very difficult given how the current IP allocation stuff works (it's all mixed in with instance IP allocation). Maybe worth chatting live; off my head a couple ideas:

Make the executor's IP allocation changes conditional on "the blueprint I'm executing is still the target" (within a transaction)

Separate service external IPs from instance external IPs more fundamentally in CRDB (much more work, but maybe better long term? or maybe not if most of the external IP machinery on the networking side is the same for both?)

I guess a hybrid third idea is to create separate, generational config for service external IPs, and a separate RPW that always tries to update the general, shared service/instance IP machinery to always reflect the latest generation?

docs/reconfigurator.adoc

andrewjstone

Thanks for doing this @davepacheco. I actually much prefer having this in docs than in the code for a few reasons. As a user of the system or new developer, it's easier to find what you need by going directly to the docs folder. Updating is also easier to do directly in docs, without the comments there. It also lends itself to using links and images better.

andrewjstone · 2024-08-02T15:27:17Z

docs/reconfigurator.adoc

+
+You might notice that the only step that has anything to do with adding a sled is the first one.  This is where a user communicated a change to the **intended state** of the world.  Everything Reconfigurator does follows a pattern just like this:
+
+1. A user changes the intended state of the world.


Potentially there should be a second step where the inventory is reloaded to reflect the latest state of the world. The whole system acts as a feedback loop around planning input (mostly inventory, db state, and policy).

In dba4f80 I added a step before this just to say that Reconfigurator is always updating its view of the world.

I'm not sure if it's clearer or more confusing, to be honest. In practice, an inventory collection might not happen immediately before, during, or after this sequence. And I feel like inventories are less relevant than it seems like they would be (since they're always out of date and potentially incomplete anyway). We don't use them most of the time during planning or execution -- instead we propagate the full intended state everywhere, every time (to sled agents, DNS servers, etc.).

It's a small difference either way so I just kept this in.

docs/reconfigurator.adoc

davepacheco · 2024-08-02T22:22:11Z

Thanks @jgallagher and @andrewjstone for taking a look! Sounds like this is at least not a step backwards so I'll plan to land it.

…action conditional on the current blueprint (#6240) See #6239 and #6207 (comment) for additional context, but briefly summarizing: * Prior to this PR, blueprint execution would deallocate and allocate external IP and NIC records independently * If two Nexus instances are trying to realize two different blueprints, this could result in them stomping on each other Andrew suggested that we put the external IP resource set under a generation number, similar to how we mark generations for zones and DNS config. However, in both of those cases, there's an external entity (sled-agent or the DNS zone, respectively) that's ultimately responsible for rejecting updates with an old generation number. In this external networking case, there is no other entity - other parts of Nexus consume this information directly from CRDB. In chat, we discussed that we think it's equivalent to putting the set of allocation/deallocation operations inside a transaction that's conditional on the blueprint being executing still being the current target. Most of this PR is moving the `external_networking` module from `nexus-reconfigurator-execution` into `nexus-db-queries` where it can be executed inside the new `blueprint_ensure_external_networking_resources()` method. This method opens a transaction, performs a `SELECT ... FOR UPDATE` to confirm that the blueprint we're ensuring is the current target, and then does the entire set of deallocation / allocation operations in that transaction. I added a test with some kinda gnarly `#[cfg(test)]` bits to control execution flow to show how the `SELECT ... FOR UPDATE` works: if the blueprint isn't the target, we immediately fail; if we are the target, any attempt to _change_ the target will be queued by CRDB until we've finished our transaction. Fixes #6239

davepacheco added 2 commits August 1, 2024 16:24

update reference

01ba62c

remove one XXX-dap

c6e3654

davepacheco marked this pull request as ready for review August 1, 2024 23:34

jgallagher reviewed Aug 2, 2024

View reviewed changes

andrewjstone approved these changes Aug 2, 2024

View reviewed changes

davepacheco added 2 commits August 2, 2024 15:13

review feedback

4bed612

review feedback (2)

dba4f80

davepacheco enabled auto-merge (squash) August 2, 2024 22:23

davepacheco merged commit e6fb6df into main Aug 2, 2024
22 checks passed

davepacheco deleted the dap/docs-reconfigurator branch August 2, 2024 23:30

This was referenced Aug 6, 2024

Blueprint execution: Nexuses operating on different blueprints could stomp on each other's external IP resource allocation #6239

Closed

Reconfigurator: Move external IP allocation/deallocation into a transaction conditional on the current blueprint #6240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add some high-level Reconfigurator docs #6207

add some high-level Reconfigurator docs #6207

davepacheco commented Aug 1, 2024

jgallagher left a comment

jgallagher Aug 2, 2024

andrewjstone Aug 2, 2024

jgallagher Aug 2, 2024

jgallagher Aug 2, 2024

andrewjstone left a comment

andrewjstone Aug 2, 2024

davepacheco Aug 2, 2024

davepacheco commented Aug 2, 2024


		You might notice that the only step that has anything to do with adding a sled is the first one. This is where a user communicated a change to the intended state of the world. Everything Reconfigurator does follows a pattern just like this:

		1. A user changes the intended state of the world.

add some high-level Reconfigurator docs #6207

add some high-level Reconfigurator docs #6207

Conversation

davepacheco commented Aug 1, 2024

jgallagher left a comment

Choose a reason for hiding this comment

jgallagher Aug 2, 2024

Choose a reason for hiding this comment

andrewjstone Aug 2, 2024

Choose a reason for hiding this comment

jgallagher Aug 2, 2024

Choose a reason for hiding this comment

jgallagher Aug 2, 2024

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

andrewjstone Aug 2, 2024

Choose a reason for hiding this comment

davepacheco Aug 2, 2024

Choose a reason for hiding this comment

davepacheco commented Aug 2, 2024