Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some high-level Reconfigurator docs #6207

Merged
merged 4 commits into from
Aug 2, 2024
Merged

Conversation

davepacheco
Copy link
Collaborator

This is probably still a bit rough, but maybe better than nothing?

I'm torn between whether this stuff should just be in Big Theory Statements (module-level comments) vs. separate docs like this. But in the limit it'd be great to have tutorial-like content here too and that'd probably not be in the source.

@davepacheco davepacheco marked this pull request as ready for review August 1, 2024 23:34
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this out! Most of my comments are from conversations we've had where there have been points of confusion, although I worry a little I'm suggesting adding too much nitpicky detail.

docs/reconfigurator.adoc Outdated Show resolved Hide resolved
docs/reconfigurator.adoc Outdated Show resolved Hide resolved
docs/reconfigurator.adoc Outdated Show resolved Hide resolved
// TODO mermaid diagram showing concurrent execution
--

This approach can be applied to many other areas like DNS configuration, too. Other areas (e.g., the process of updating database state to reflect internal IP allocations) sometimes require different, _ad hoc_ mechanisms. In all cases, though, the goals are what we said above: attempting to execute a stale blueprint must never move the system backwards and as long as _something_ is executing the newer blueprint, the system should eventually get to the new target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this, I'm now concerned our current executor implementation isn't correct for IP address allocation 😬. I need to go back and look at this; the specific thing that came to mind is:

  • Nexus A is executing an old blueprint where zone Z has an external IP address
  • Nexus B is executing a new blueprint where zone Z has been expunged

I think we could get in a situation where Nexus B deallocates the IP, then Nexus A reallocates the IP, which might cause Nexus B to fail somewhere later in execution (e.g., if it wanted to assign that IP to a different zone) or might not (if that IP is not being reassigned).

This would eventually resolve itself when Nexus A caught up to the correct blueprint, but maybe the intermediate failures here are bad and this needs some rework?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the update to the IP allocations should be conditional on a generation number. That should fix this issue if it exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may be very difficult given how the current IP allocation stuff works (it's all mixed in with instance IP allocation). Maybe worth chatting live; off my head a couple ideas:

  • Make the executor's IP allocation changes conditional on "the blueprint I'm executing is still the target" (within a transaction)
  • Separate service external IPs from instance external IPs more fundamentally in CRDB (much more work, but maybe better long term? or maybe not if most of the external IP machinery on the networking side is the same for both?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a hybrid third idea is to create separate, generational config for service external IPs, and a separate RPW that always tries to update the general, shared service/instance IP machinery to always reflect the latest generation?

docs/reconfigurator.adoc Show resolved Hide resolved
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @davepacheco. I actually much prefer having this in docs than in the code for a few reasons. As a user of the system or new developer, it's easier to find what you need by going directly to the docs folder. Updating is also easier to do directly in docs, without the comments there. It also lends itself to using links and images better.


You might notice that the only step that has anything to do with adding a sled is the first one. This is where a user communicated a change to the **intended state** of the world. Everything Reconfigurator does follows a pattern just like this:

1. A user changes the intended state of the world.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially there should be a second step where the inventory is reloaded to reflect the latest state of the world. The whole system acts as a feedback loop around planning input (mostly inventory, db state, and policy).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dba4f80 I added a step before this just to say that Reconfigurator is always updating its view of the world.

I'm not sure if it's clearer or more confusing, to be honest. In practice, an inventory collection might not happen immediately before, during, or after this sequence. And I feel like inventories are less relevant than it seems like they would be (since they're always out of date and potentially incomplete anyway). We don't use them most of the time during planning or execution -- instead we propagate the full intended state everywhere, every time (to sled agents, DNS servers, etc.).

It's a small difference either way so I just kept this in.

docs/reconfigurator.adoc Show resolved Hide resolved
docs/reconfigurator.adoc Outdated Show resolved Hide resolved
@davepacheco
Copy link
Collaborator Author

Thanks @jgallagher and @andrewjstone for taking a look! Sounds like this is at least not a step backwards so I'll plan to land it.

@davepacheco davepacheco enabled auto-merge (squash) August 2, 2024 22:23
@davepacheco davepacheco merged commit e6fb6df into main Aug 2, 2024
22 checks passed
@davepacheco davepacheco deleted the dap/docs-reconfigurator branch August 2, 2024 23:30
jgallagher added a commit that referenced this pull request Aug 8, 2024
…action conditional on the current blueprint (#6240)

See #6239 and
#6207 (comment)
for additional context, but briefly summarizing:

* Prior to this PR, blueprint execution would deallocate and allocate
external IP and NIC records independently
* If two Nexus instances are trying to realize two different blueprints,
this could result in them stomping on each other

Andrew suggested that we put the external IP resource set under a
generation number, similar to how we mark generations for zones and DNS
config. However, in both of those cases, there's an external entity
(sled-agent or the DNS zone, respectively) that's ultimately responsible
for rejecting updates with an old generation number. In this external
networking case, there is no other entity - other parts of Nexus consume
this information directly from CRDB. In chat, we discussed that we think
it's equivalent to putting the set of allocation/deallocation operations
inside a transaction that's conditional on the blueprint being executing
still being the current target.

Most of this PR is moving the `external_networking` module from
`nexus-reconfigurator-execution` into `nexus-db-queries` where it can be
executed inside the new
`blueprint_ensure_external_networking_resources()` method. This method
opens a transaction, performs a `SELECT ... FOR UPDATE` to confirm that
the blueprint we're ensuring is the current target, and then does the
entire set of deallocation / allocation operations in that transaction.
I added a test with some kinda gnarly `#[cfg(test)]` bits to control
execution flow to show how the `SELECT ... FOR UPDATE` works: if the
blueprint isn't the target, we immediately fail; if we are the target,
any attempt to _change_ the target will be queued by CRDB until we've
finished our transaction.

Fixes #6239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants