-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some high-level Reconfigurator docs #6207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this out! Most of my comments are from conversations we've had where there have been points of confusion, although I worry a little I'm suggesting adding too much nitpicky detail.
// TODO mermaid diagram showing concurrent execution | ||
-- | ||
|
||
This approach can be applied to many other areas like DNS configuration, too. Other areas (e.g., the process of updating database state to reflect internal IP allocations) sometimes require different, _ad hoc_ mechanisms. In all cases, though, the goals are what we said above: attempting to execute a stale blueprint must never move the system backwards and as long as _something_ is executing the newer blueprint, the system should eventually get to the new target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, I'm now concerned our current executor implementation isn't correct for IP address allocation 😬. I need to go back and look at this; the specific thing that came to mind is:
- Nexus A is executing an old blueprint where zone Z has an external IP address
- Nexus B is executing a new blueprint where zone Z has been expunged
I think we could get in a situation where Nexus B deallocates the IP, then Nexus A reallocates the IP, which might cause Nexus B to fail somewhere later in execution (e.g., if it wanted to assign that IP to a different zone) or might not (if that IP is not being reassigned).
This would eventually resolve itself when Nexus A caught up to the correct blueprint, but maybe the intermediate failures here are bad and this needs some rework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the update to the IP allocations should be conditional on a generation number. That should fix this issue if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that may be very difficult given how the current IP allocation stuff works (it's all mixed in with instance IP allocation). Maybe worth chatting live; off my head a couple ideas:
- Make the executor's IP allocation changes conditional on "the blueprint I'm executing is still the target" (within a transaction)
- Separate service external IPs from instance external IPs more fundamentally in CRDB (much more work, but maybe better long term? or maybe not if most of the external IP machinery on the networking side is the same for both?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a hybrid third idea is to create separate, generational config for service external IPs, and a separate RPW that always tries to update the general, shared service/instance IP machinery to always reflect the latest generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @davepacheco. I actually much prefer having this in docs
than in the code for a few reasons. As a user of the system or new developer, it's easier to find what you need by going directly to the docs
folder. Updating is also easier to do directly in docs, without the comments there. It also lends itself to using links and images better.
docs/reconfigurator.adoc
Outdated
|
||
You might notice that the only step that has anything to do with adding a sled is the first one. This is where a user communicated a change to the **intended state** of the world. Everything Reconfigurator does follows a pattern just like this: | ||
|
||
1. A user changes the intended state of the world. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially there should be a second step where the inventory is reloaded to reflect the latest state of the world. The whole system acts as a feedback loop around planning input (mostly inventory, db state, and policy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In dba4f80 I added a step before this just to say that Reconfigurator is always updating its view of the world.
I'm not sure if it's clearer or more confusing, to be honest. In practice, an inventory collection might not happen immediately before, during, or after this sequence. And I feel like inventories are less relevant than it seems like they would be (since they're always out of date and potentially incomplete anyway). We don't use them most of the time during planning or execution -- instead we propagate the full intended state everywhere, every time (to sled agents, DNS servers, etc.).
It's a small difference either way so I just kept this in.
Thanks @jgallagher and @andrewjstone for taking a look! Sounds like this is at least not a step backwards so I'll plan to land it. |
…action conditional on the current blueprint (#6240) See #6239 and #6207 (comment) for additional context, but briefly summarizing: * Prior to this PR, blueprint execution would deallocate and allocate external IP and NIC records independently * If two Nexus instances are trying to realize two different blueprints, this could result in them stomping on each other Andrew suggested that we put the external IP resource set under a generation number, similar to how we mark generations for zones and DNS config. However, in both of those cases, there's an external entity (sled-agent or the DNS zone, respectively) that's ultimately responsible for rejecting updates with an old generation number. In this external networking case, there is no other entity - other parts of Nexus consume this information directly from CRDB. In chat, we discussed that we think it's equivalent to putting the set of allocation/deallocation operations inside a transaction that's conditional on the blueprint being executing still being the current target. Most of this PR is moving the `external_networking` module from `nexus-reconfigurator-execution` into `nexus-db-queries` where it can be executed inside the new `blueprint_ensure_external_networking_resources()` method. This method opens a transaction, performs a `SELECT ... FOR UPDATE` to confirm that the blueprint we're ensuring is the current target, and then does the entire set of deallocation / allocation operations in that transaction. I added a test with some kinda gnarly `#[cfg(test)]` bits to control execution flow to show how the `SELECT ... FOR UPDATE` works: if the blueprint isn't the target, we immediately fail; if we are the target, any attempt to _change_ the target will be queued by CRDB until we've finished our transaction. Fixes #6239
This is probably still a bit rough, but maybe better than nothing?
I'm torn between whether this stuff should just be in Big Theory Statements (module-level comments) vs. separate docs like this. But in the limit it'd be great to have tutorial-like content here too and that'd probably not be in the source.