Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omdb: Add sled state to blueprint displays and diffs #6545

Merged
merged 4 commits into from
Sep 13, 2024

Conversation

jgallagher
Copy link
Contributor

On main, omdb shows this for the confusing blueprint on dogfood from oxidecomputer/product-assurance#52:

root@oxz_switch0:/var/tmp/john# omdb nexus blueprints diff 430f5c6b-3156-4921-8ddc-74560989c8f4 3eb67393-bdbc-4957-98c2-36cc60e3e901
... snip unchanged sleds ...

 MODIFIED SLEDS:

  sled 1efda86b-caef-489f-9792-589d7677e59a:

    physical disks from generation 1:
    -----------------------------------
    vendor   model             serial
    -----------------------------------
-   1b96     WUS4C6432DSP3X3   A079DDFD
-   1b96     WUS4C6432DSP3X3   A079DE08
-   1b96     WUS4C6432DSP3X3   A079DE11
-   1b96     WUS4C6432DSP3X3   A079DEA7
-   1b96     WUS4C6432DSP3X3   A079DEAF
-   1b96     WUS4C6432DSP3X3   A079DF11
-   1b96     WUS4C6432DSP3X3   A079DFA7
-   1b96     WUS4C6432DSP3X3   A079DFCA
-   1b96     WUS4C6432DSP3X3   A079E02E
-   1b96     WUS4C6432DSP3X3   A079E076


    omicron zones generation 3 -> 4:
    -------------------------------------------------------------------------------------------
    zone type      zone id                                disposition    underlay IP
    -------------------------------------------------------------------------------------------
*   crucible       09f14045-df78-447a-b7d8-217e0ca8ee09   - in service   fd00:1122:3344:124::24
     └─                                                   + expunged
*   crucible       3ec0e848-39de-495e-be0e-88241e11d0fb   - in service   fd00:1122:3344:124::22
     └─                                                   + expunged
*   crucible       562c58dc-2408-415e-a005-eb80d1769d10   - in service   fd00:1122:3344:124::28
     └─                                                   + expunged
*   crucible       568f842d-3cd3-4901-97f8-94991c2e9938   - in service   fd00:1122:3344:124::25
     └─                                                   + expunged
*   crucible       5f3fbd1c-5513-4527-88b5-d07c8fbf71e0   - in service   fd00:1122:3344:124::29
     └─                                                   + expunged
*   crucible       6632cd6f-ade4-415f-ad65-b510d4ead12d   - in service   fd00:1122:3344:124::23
     └─                                                   + expunged
*   crucible       6ca6aa76-c32b-402c-a1ac-751f12d5bdd9   - in service   fd00:1122:3344:124::2b
     └─                                                   + expunged
*   crucible       9a475569-439c-4749-b78f-eba2096b2131   - in service   fd00:1122:3344:124::26
     └─                                                   + expunged
*   crucible       bac40327-4eeb-429d-94e1-d3c0525266a2   - in service   fd00:1122:3344:124::2a
     └─                                                   + expunged
*   crucible       eb71bb55-37fb-4fc4-bdee-a5382a480271   - in service   fd00:1122:3344:124::27
     └─                                                   + expunged
*   internal_ntp   bfea30f1-9ea6-496d-aeb9-ae126ea4f686   - in service   fd00:1122:3344:124::21
     └─                                                   + expunged


 ADDED SLEDS:

  sled 05652dc1-b811-4cac-95e1-d32633f2ba75:

 COCKROACHDB SETTINGS:
    state fingerprint:::::::::::::::::   d4d87aa2ad877a4cc2fddd0573952362739110de (unchanged)
    cluster.preserve_downgrade_option:   "22.1" (unchanged)

 METADATA:
*   internal DNS version:   5 -> 6
    external DNS version:   31 (unchanged)

As of this branch, we get parenthetical state information on the sled $SLED_ID: lines:

 MODIFIED SLEDS:

  sled 1efda86b-caef-489f-9792-589d7677e59a (active -> decommissioned):

    physical disks from generation 1:
    -----------------------------------
    vendor   model             serial
    -----------------------------------
-   1b96     WUS4C6432DSP3X3   A079DDFD
-   1b96     WUS4C6432DSP3X3   A079DE08
-   1b96     WUS4C6432DSP3X3   A079DE11
-   1b96     WUS4C6432DSP3X3   A079DEA7
-   1b96     WUS4C6432DSP3X3   A079DEAF
-   1b96     WUS4C6432DSP3X3   A079DF11
-   1b96     WUS4C6432DSP3X3   A079DFA7
-   1b96     WUS4C6432DSP3X3   A079DFCA
-   1b96     WUS4C6432DSP3X3   A079E02E
-   1b96     WUS4C6432DSP3X3   A079E076


    omicron zones generation 3 -> 4:
    -------------------------------------------------------------------------------------------
    zone type      zone id                                disposition    underlay IP
    -------------------------------------------------------------------------------------------
*   crucible       09f14045-df78-447a-b7d8-217e0ca8ee09   - in service   fd00:1122:3344:124::24
     └─                                                   + expunged
*   crucible       3ec0e848-39de-495e-be0e-88241e11d0fb   - in service   fd00:1122:3344:124::22
     └─                                                   + expunged
*   crucible       562c58dc-2408-415e-a005-eb80d1769d10   - in service   fd00:1122:3344:124::28
     └─                                                   + expunged
*   crucible       568f842d-3cd3-4901-97f8-94991c2e9938   - in service   fd00:1122:3344:124::25
     └─                                                   + expunged
*   crucible       5f3fbd1c-5513-4527-88b5-d07c8fbf71e0   - in service   fd00:1122:3344:124::29
     └─                                                   + expunged
*   crucible       6632cd6f-ade4-415f-ad65-b510d4ead12d   - in service   fd00:1122:3344:124::23
     └─                                                   + expunged
*   crucible       6ca6aa76-c32b-402c-a1ac-751f12d5bdd9   - in service   fd00:1122:3344:124::2b
     └─                                                   + expunged
*   crucible       9a475569-439c-4749-b78f-eba2096b2131   - in service   fd00:1122:3344:124::26
     └─                                                   + expunged
*   crucible       bac40327-4eeb-429d-94e1-d3c0525266a2   - in service   fd00:1122:3344:124::2a
     └─                                                   + expunged
*   crucible       eb71bb55-37fb-4fc4-bdee-a5382a480271   - in service   fd00:1122:3344:124::27
     └─                                                   + expunged
*   internal_ntp   bfea30f1-9ea6-496d-aeb9-ae126ea4f686   - in service   fd00:1122:3344:124::21
     └─                                                   + expunged


 ADDED SLEDS:

  sled 05652dc1-b811-4cac-95e1-d32633f2ba75 (decommissioned):

 COCKROACHDB SETTINGS:
    state fingerprint:::::::::::::::::   d4d87aa2ad877a4cc2fddd0573952362739110de (unchanged)
    cluster.preserve_downgrade_option:   "22.1" (unchanged)

 METADATA:
*   internal DNS version:   5 -> 6
    external DNS version:   31 (unchanged)

The empty sled added block is still kinda confusing, but I think the note that the added sled is starting out in the decommissioned state is at least a reasonable pointer to an explanation of what's going on.

Fixes #6544.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks for the quick fix on this @jgallagher.

before_zones.keys().chain(before_disks.keys()).collect();
let after_sleds: BTreeSet<_> =
after_zones.keys().chain(after_disks.keys()).collect();
// Work around a quirk of sled decommissioning. If a sled has a before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -165,6 +142,27 @@ to: blueprint 9f71f5d3-a272-4382-9154-6ea2e171a6c6
- nexus 67622d61-2df4-414d-aa0e-d1277265f405 expunged fd00:1122:3344:103::22


sled 68d24ac5-f341-49ea-a92a-0381b52ab387 (active):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this sled move from "REMOVED" to "MODIFIED"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test manually messes around with blueprint_zones:

let expunged_zones =
blueprint2a.blueprint_zones.get_mut(&expunged_sled_id).unwrap();
expunged_zones.zones.clear();
expunged_zones.generation = expunged_zones.generation.next();
blueprint2a.blueprint_zones.remove(&decommissioned_sled_id);

Prior to this PR, "sled missing from zones and disks" was enough for it to be REMOVED. After this PR, it also has to be missing from sled_state, which the test doesn't touch. I thought that was fine, but looking back at the test it explicitly wants to test diff output including a removed sled, so I changed the test to remove the decommissioned sled from sled_state and now it's back to REMOVED: fab82fc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for the details.

@jgallagher jgallagher merged commit 4c72357 into main Sep 13, 2024
16 checks passed
@jgallagher jgallagher deleted the john/omdb-sled-state branch September 13, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

omdb blueprint diffs and displays should show sled state
2 participants