Skip to content

Commit

Permalink
[nexus] Add a schema change to fix instance counter underflow (#5838)
Browse files Browse the repository at this point in the history
This is a corollary PR to
#5830 , which fixed the
root cause.

Due to a bug in the virtual provisioning query, it was possible to
undercount virtual provisioning information for
instances, which would result in an integer underflow for "total CPU/RAM
provisioned" for a {project, silo, fleet}.

Although #5830 fixed the root cause, it's possible that in-field systems
have an invalid value if they experienced this bug.

This PR uses a schema change, exploiting the fact that schema changes
occur with instances offline, to reset these values to a known value.
  • Loading branch information
smklein authored Jun 6, 2024
1 parent 01bc9ad commit 4fd9dd2
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 2 deletions.
3 changes: 2 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use std::collections::BTreeMap;
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(71, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(72, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(72, "fix-provisioning-counters"),
KnownVersion::new(71, "add-saga-unwound-vmm-state"),
KnownVersion::new(70, "separate-instance-and-vmm-states"),
KnownVersion::new(69, "expose-stage0"),
Expand Down
2 changes: 1 addition & 1 deletion schema/crdb/dbinit.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4076,7 +4076,7 @@ INSERT INTO omicron.public.db_metadata (
version,
target_version
) VALUES
(TRUE, NOW(), NOW(), '71.0.0', NULL)
(TRUE, NOW(), NOW(), '72.0.0', NULL)
ON CONFLICT DO NOTHING;

COMMIT;
29 changes: 29 additions & 0 deletions schema/crdb/fix-provisioning-counters/up.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
-- This change fixes provisioning counters, alongside the
-- underflow fix provided in https://github.com/oxidecomputer/omicron/pull/5830.
-- Although this underflow has been fixed, it could have resulted
-- in invalid accounting, which is mitigated by this schema change.
--
-- This update is currently occurring offline, so we exploit
-- that fact to identify that all instances *should* be terminated
-- before racks are updated. If they aren't, and an instance is in the
-- "running" state when an update occurs, the propolis zone would be
-- terminated, while the running database record remains. In this case,
-- the only action we could take on the VMM would be to delete it,
-- which would attempt to delete the "vritual provisioning resource"
-- record anyway. This case is already idempotent, and would be a safe
-- operation even if the "virtual_provisioning_resource" has already
-- been removed.

SET LOCAL disallow_full_table_scans = OFF;

-- First, ensure that no instance records exist.
DELETE FROM omicron.public.virtual_provisioning_resource
WHERE resource_type='instance';

-- Next, update the collections to identify that there
-- are no instances running.
UPDATE omicron.public.virtual_provisioning_collection
SET
cpus_provisioned = 0,
ram_provisioned = 0;

0 comments on commit 4fd9dd2

Please sign in to comment.