-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[nexus] Add a schema change to fix instance counter underflow (#5838)
This is a corollary PR to #5830 , which fixed the root cause. Due to a bug in the virtual provisioning query, it was possible to undercount virtual provisioning information for instances, which would result in an integer underflow for "total CPU/RAM provisioned" for a {project, silo, fleet}. Although #5830 fixed the root cause, it's possible that in-field systems have an invalid value if they experienced this bug. This PR uses a schema change, exploiting the fact that schema changes occur with instances offline, to reset these values to a known value.
- Loading branch information
Showing
3 changed files
with
32 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
-- This change fixes provisioning counters, alongside the | ||
-- underflow fix provided in https://github.com/oxidecomputer/omicron/pull/5830. | ||
-- Although this underflow has been fixed, it could have resulted | ||
-- in invalid accounting, which is mitigated by this schema change. | ||
-- | ||
-- This update is currently occurring offline, so we exploit | ||
-- that fact to identify that all instances *should* be terminated | ||
-- before racks are updated. If they aren't, and an instance is in the | ||
-- "running" state when an update occurs, the propolis zone would be | ||
-- terminated, while the running database record remains. In this case, | ||
-- the only action we could take on the VMM would be to delete it, | ||
-- which would attempt to delete the "vritual provisioning resource" | ||
-- record anyway. This case is already idempotent, and would be a safe | ||
-- operation even if the "virtual_provisioning_resource" has already | ||
-- been removed. | ||
|
||
SET LOCAL disallow_full_table_scans = OFF; | ||
|
||
-- First, ensure that no instance records exist. | ||
DELETE FROM omicron.public.virtual_provisioning_resource | ||
WHERE resource_type='instance'; | ||
|
||
-- Next, update the collections to identify that there | ||
-- are no instances running. | ||
UPDATE omicron.public.virtual_provisioning_collection | ||
SET | ||
cpus_provisioned = 0, | ||
ram_provisioned = 0; | ||
|