-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] add basic support for expunged sled policy and decommissioned sled state #5032
[nexus] add basic support for expunged sled policy and decommissioned sled state #5032
Conversation
Created using spr 1.3.5
Created using spr 1.3.6-beta.1
Almost done here, just need to finish up the region allocation tests. (I'll do the db migrations after acceptance.) |
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic @sunshowers! Great work. Once you add the DB migrations I'll take a quick look and approve.
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
All right, finally ready for review. I had to break up the migration into two, where the second one just drops the column and deletes the type. I believe this is required for idempotence reasons. Would love it if folks had thoughts on the questions as well, but those aren't critical, and otherwise this should be ready to go. Thanks for your patience, I ended up spending a lot of time refactoring and chasing down bugs. |
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some minor questions and nits.
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks for doing this. It's a big chunk of work.
@@ -13,7 +13,7 @@ use omicron_common::api::external::SemverVersion; | |||
/// | |||
/// This should be updated whenever the schema is changed. For more details, | |||
/// refer to: schema/crdb/README.adoc | |||
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(36, 0, 0); | |||
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(37, 0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have two schema versions in this PR?
Why do we use the micro number? I thought schema versions were always supposed to bump the major.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we need two schema versions is that we have to make two separate transactions to achieve an idempotent result.
The order of operations we need to perform is:
- Add the new
sled_policy
andsled_state
enum types. - Add the new
sled_policy
andsled_state
columns. - Update
sled_policy
to the new value based on theprovision_state
column. - Remove the
provision_state
column. - Remove the
sled_provision_state
enum types.
1 and 2 are inherently idempotent, so those are fine.
3 by itself is also fine to repeat.
But consider what happens if step 4 occurs, then the process fails, and we go back to 1 again. We'd repeat 3, but the corresponding column no longer exists so it will fail.
Solving this can be done in one of two ways:
- Find a way to make 3 only perform the update if the
provision_state
column exists. This is possible, but somewhat complicated. - Add a transaction commit between 3 and 4. This is the approach I ended up using.
The reason it's 37.0.0 and 37.0.1 is because both updates are part of the same PR. This follows the pattern in #4261.
Created using spr 1.3.6-beta.1
This PR does a few things:
sled_provision_state
to a newsled_policy
enum, which has more information (per RFD 457). This PR implements the expunged state, not the graceful removal state.sled_state
enum, which describes Nexus's view of the sled. This PR adds theactive
anddecommissioned
states.Not done here, but in future PRs (to try and keep this PR a manageable size):
time_deleted
because it has a lifecycle too complicated to be described that way -- instead, we'll add atime_decommissioned
field: Replace sled's time_deleted with time_decommissioned #5131Questions
sled_list
hide decommissioned sleds by default?TODO: