Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3886 2/4] Region replacement omdb commands #5820

Merged
merged 3 commits into from
May 29, 2024

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented May 24, 2024

This commit adds some commands to omdb related to the new region replacement logic:

$ ./target/debug/omdb db region-replacement
Query for information about region replacements, optionally manually triggering one

Usage: omdb db region-replacement [OPTIONS] <COMMAND>

Commands:
  list     List region replacement requests
  status   Show current region replacements and their status
  info     Show detailed information for a region replacement
  request  Manually request a region replacement
  help     Print this message or the help of the given subcommand(s)

list will list all region replacement requests, along with their request time and state.

status will show a summary of all non-Complete region replacements, along with their state and progress.

info will show a detailed view of a region replacement, starting with the details that the status summary shows, then showing all related notifications and steps taken to drive the replacement forward.

Finally, request will request that a region be replaced, and return the ID of the replacement.

@jmpesp jmpesp requested a review from leftwo May 24, 2024 16:16
This commit adds some commands to omdb related to the new region
replacement logic:

    $ ./target/debug/omdb db region-replacement
    Query for information about region replacements, optionally manually triggering one

    Usage: omdb db region-replacement [OPTIONS] <COMMAND>

    Commands:
      list     List region replacement requests
      status   Show current region replacements and their status
      info     Show detailed information for a region replacement
      request  Manually request a region replacement
      help     Print this message or the help of the given subcommand(s)

`list` will list all region replacement requests, along with their
request time and state.

`status` will show a summary of all non-Complete region replacements,
along with their state and progress.

`info` will show a detailed view of a region replacement, starting with
the details that the `status` summary shows, then showing all related
notifications and steps taken to drive the replacement forward.

Finally, `request` will request that a region be replaced, and return
the ID of the replacement.
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits.. just nits.

println!();

if let Some(new_region_id) = request.new_region_id {
println!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the if let Some(repair) .. fails beloe, i.e. the maybe_repair does not find anything, will we
still print an extra blank line here?

Maybe we don't need two blank lines, and instead we print space after each request?
Right now it looks like this:

EVT22200005 # omdb db region-replacement status
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221,[fd00:1122:3344:101::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (56.0.0)
af905bec-acb8-4d91-a439-af310d3f6d63:

      started: 2024-05-28 20:44:57.042456 UTC
        state: Running
old region id: 367d0aab-a3d5-470f-9ffa-214de71b6a5b
new region id: Some(7ae0833c-7ab2-4fa1-adb1-e1d72e7a1415)


progress:        ██████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ [    233/   1024]
bb75e969-ecc6-40f9-bab0-9e5520ae4f2e:

      started: 2024-05-28 20:44:01.013417 UTC
        state: Running
old region id: ff9d23b4-9d39-4be2-86fd-d8806f53ebd2
new region id: Some(90d3fd2f-582e-4e75-af05-74f5a90eda46)


progress:        ███████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ [    310/    800]

So, the grouping is a little off. If you had many replacements, it might not be clear which progress
bar is for what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this println to after the bar.abandon call in 9970286, I think that abandon doesn't print a newline?

time: notification.time,
repair_id: notification.repair_id.to_string(),
repair_type: format!("{:?}", notification.repair_type),
upstairs_id: notification.upstairs_id.to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is upstairs_id here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from whatever Upstairs is sending the notification.


bar.abandon();

println!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress bar does not get to the end?

EVT22200005 # omdb db region-replacement info 3a4b04be-4bf9-43d6-a890-e5946d241084
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221,[fd00:1122:3344:101::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (56.0.0)
      started: 2024-05-28 21:02:19.949975 UTC
        state: Complete
old region id: ff1e4d49-8959-42e1-9fcc-ca219a68ef8e
new region id: Some(897b3307-36a8-4d0f-8ea2-9ba1562e0898)

Repair notifications                                                                                                                                                        
TIME                           REPAIR_ID                            REPAIR_TYPE UPSTAIRS_ID                          SESSION_ID                           NOTIFICATION_TYPE 
2024-05-28 21:02:57.786939 UTC a661e5b9-0735-4ff6-a318-693349460b4b Live        23776670-5d21-449f-9552-efad0fe32b2a dd90808a-ff3b-41d0-a273-6a64e820458c Started           
2024-05-28 21:08:21.759356 UTC a661e5b9-0735-4ff6-a318-693349460b4b Live        23776670-5d21-449f-9552-efad0fe32b2a dd90808a-ff3b-41d0-a273-6a64e820458c Succeeded         

progress:        █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████░ [    799/    800]

Repair steps                                                                                                                                
TIME                           STEP_TYPE DETAILS                                                                                            
2024-05-28 21:02:47.663609 UTC Propolis  instance Some(94122867-721b-40a2-aa85-518139367113) vmm Some(735975a0-26dd-4fe0-bb87-80e5770b16b4) 

Is there a way to short-circuit it if it is done? Instead of printing the incomplete bar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a bug in Crucible's notification logic, in that it doesn't send the progress update notification when it's done? I'd rather fix that bug than have omdb print the complete bar. Opened oxidecomputer/crucible#1319

let requests: Vec<RegionReplacement> = datastore
.pool_connection_for_tests()
.await?
.transaction_async(|conn| async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of interest, why is this a transaction? IIUC only a single query is performed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no answer, removed those transaction calls in a155af9

/// Show detailed information for a region replacement
Info(RegionReplacementInfoArgs),
/// Manually request a region replacement
Request(RegionReplacementRequestArgs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably make this more explicit: Perhaps name it Start instead of Request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the Request command name as the first state things end up in is Requested.
omdb can't actually start it, it can just put the request into the database that
someone else has to then go and make happen.

let limit = fetch_opts.fetch_limit;

let requests: Vec<RegionReplacement> = datastore
.pool_connection_for_tests()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should almost certainly use the opctx argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool_connection_authorized is a private method and can't be accessed from omdb. Short of moving these into DataStore I'm not sure what to do here except that, so that's done in a155af9, except when the fetch opts limit needs to be honoured.

let limit = fetch_opts.fetch_limit;

let requests: Vec<RegionReplacement> = datastore
.pool_connection_for_tests()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above wrt opctx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in a155af9


/// Show details for a single region replacement
async fn cmd_db_region_replacement_info(
_opctx: &OpContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in a155af9

async fn cmd_db_region_replacement_info(
_opctx: &OpContext,
datastore: &DataStore,
_fetch_opts: &DbFetchOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need these, we should probably get rid of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in a155af9

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! My only required fix is to use the real opctx. Take or leave other suggestions.

@jmpesp jmpesp enabled auto-merge (squash) May 29, 2024 21:03
@jmpesp jmpesp merged commit 7633d17 into oxidecomputer:main May 29, 2024
15 checks passed
@jmpesp jmpesp deleted the region_replacement_part_2 branch May 29, 2024 21:40
jmpesp added 2 commits May 30, 2024 01:57
remove unused arguments

don't use a transaction when only one query is performed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants