Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Add zpool to inventory #5249

Merged
merged 27 commits into from
Mar 15, 2024
Merged

[nexus] Add zpool to inventory #5249

merged 27 commits into from
Mar 15, 2024

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Mar 13, 2024

This PR adds zpools to the inventory, and updates the existing omicron.public.zpool table to only include aspects of the zpool which are fully controlled by Nexus.

This PR is intended to solve a problem introduced by #5172 . In that PR, physical disk and zpool provisioning will be decided by Nexus, and sent to the Sled Agent as a request. As a result, the total_size value of zpools is not known when a pool is provisioned. Here, we add zpools to the inventory, and JOIN with that table to access the "total_size" of the zpool if it is known.

Additionally, this PR adds a test to validate that the database (appropriately) throws "Insufficient capacity" errors if the Zpool exists, but has not appeared in the inventory.

nexus/db-model/src/zpool.rs Outdated Show resolved Hide resolved
@smklein
Copy link
Collaborator Author

smklein commented Mar 13, 2024

This PR basically implements the following:
image

In which Nexus can "PUT" zpools to the sled before knowing their size, and relies on the inventory (+ this background task) to update zpool information via scanning.

But I'm wondering if we'd prefer to just do the following instead:
image

In which we avoid the background task altogether, and simply await "all relevant zpool info" in the response to omicron_physical_disks PUT.

@smklein
Copy link
Collaborator Author

smklein commented Mar 13, 2024

Chatted with @davepacheco a little about this PR, here's our consensus:

  • Keep "control plane zpool info" separate from "reported by inventory zpool info". This means no need for a background task to synchronize them.
  • Make sure the inventory representation of "zpool info" has a timestamp, and an index by that timestamp.
  • Anyone interested in observing the inventory info (like the region allocation query) can look for "the latest value of that zpool info from the inventory". It's okay if this is slightly stale.

Base automatically changed from disk-in-inventory to main March 13, 2024 21:05
@smklein smklein changed the title [wip] Add zpool in inventory, update size via background task [nexus] Add zpool to inventory Mar 14, 2024
@@ -316,7 +316,11 @@ impl CandidateZpools {
// into a BigInt, not a Numeric. I welcome a better solution.
let it_will_fit = (old_zpool_usage::dsl::size_used
+ diesel::dsl::sql(&zpool_size_delta.to_string()))
.le(diesel::dsl::sql(zpool_dsl::total_size::NAME));
.le(diesel::dsl::sql(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hours spent trying to make this work with Diesel, getting pushed away from subqueries, fighting aliases, and then fumbling with full-table scans as I tried to split this into a separate query:

~4

I am still motivated to make #5063 work after getting through this disk expungement stuff.

@@ -351,6 +353,17 @@ impl nexus_test_interface::NexusServer for Server {
.unwrap();
}

async fn inventory_collect_and_get_latest_collection(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is part of the nexus_test_interface API, and is basically punched through so tests which create synthetic disks can also "cause inventory collection to happen". They need it, to get the inv_zpool table populated.

@smklein smklein marked this pull request as ready for review March 14, 2024 03:00
@smklein
Copy link
Collaborator Author

smklein commented Mar 14, 2024

I need to take a lap through the instructions here for fixing the reconfigurator "test_complex" test (see: #5145 (comment)), but this is otherwise ready for review

@smklein
Copy link
Collaborator Author

smklein commented Mar 14, 2024

Also, for completeness, we ended up with an implementation like this, rather than the sequence diagrams above:

image

@@ -1156,6 +1198,103 @@ mod test {
logctx.cleanup_successful();
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very readable test!

schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic, thanks!

nexus/db-queries/src/db/datastore/mod.rs Outdated Show resolved Hide resolved
@smklein smklein merged commit f63707a into main Mar 15, 2024
21 checks passed
@smklein smklein deleted the zpool-in-inventory branch March 15, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants