-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconfigurator: Move external IP allocation/deallocation into a transaction conditional on the current blueprint #6240
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff @jgallagher
@@ -961,13 +1103,23 @@ impl DataStore { | |||
async fn blueprint_current_target_only( | |||
&self, | |||
conn: &async_bb8_diesel::Connection<DbConnection>, | |||
select_for_update: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The booleans at the call sites are somewhat confusing. I think it's probably overkill to make this two separate functions, even though that's an option. How about doing what we do elsewhere, and instead of passing true
or false
directly, create a local variable named select_for_update
and pass that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was already borderline on the call site booleans; thanks for the nudge. I don't love the local variable thing, so I replaced this with a two variant enum in 989c5db - how does that read?
@@ -2061,6 +2219,172 @@ mod tests { | |||
logctx.cleanup_successful(); | |||
} | |||
|
|||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test!
I usually do not like to add test only code to production methods. However, in this case, I think ensuring that CRDB transaction serialization works as we expect warrants it. This also serves as a useful regression in case that behavior changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if select_for_update
, which should take a write lock on the target table, or at least target row, also blocks reading from that table. I'm not sure if that would be a problem, and don't think it warrants a test. I'm just kinda curious at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also curious, and it was easy enough to expand this test to see! It does indeed block reads: 7fee264
See #6239 and #6207 (comment) for additional context, but briefly summarizing:
Andrew suggested that we put the external IP resource set under a generation number, similar to how we mark generations for zones and DNS config. However, in both of those cases, there's an external entity (sled-agent or the DNS zone, respectively) that's ultimately responsible for rejecting updates with an old generation number. In this external networking case, there is no other entity - other parts of Nexus consume this information directly from CRDB. In chat, we discussed that we think it's equivalent to putting the set of allocation/deallocation operations inside a transaction that's conditional on the blueprint being executing still being the current target.
Most of this PR is moving the
external_networking
module fromnexus-reconfigurator-execution
intonexus-db-queries
where it can be executed inside the newblueprint_ensure_external_networking_resources()
method. This method opens a transaction, performs aSELECT ... FOR UPDATE
to confirm that the blueprint we're ensuring is the current target, and then does the entire set of deallocation / allocation operations in that transaction. I added a test with some kinda gnarly#[cfg(test)]
bits to control execution flow to show how theSELECT ... FOR UPDATE
works: if the blueprint isn't the target, we immediately fail; if we are the target, any attempt to change the target will be queued by CRDB until we've finished our transaction.Fixes #6239