-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IP pools data migration #4903
Conversation
|
||
// fleet-level pool that was not default becomes non-default on all silos | ||
assert_row(&ip_pool_resources[4].values, POOL4, SILO1, false); | ||
assert_row(&ip_pool_resources[5].values, POOL4, SILO2, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These asserts are the same as before, there are just a couple more cases, and I made them denser so I could stare at them more easily for 5 hours.
FROM omicron.public.ip_pool p0 | ||
WHERE p0.silo_id = s.id | ||
AND p0.is_default | ||
AND p0.time_deleted IS NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and time_deleted is null
is the fix.
@@ -975,9 +977,11 @@ fn before_23_0_0(client: &Client) -> BoxFuture<'_, ()> { | |||
// no corresponding silo. | |||
client.batch_execute(&format!("INSERT INTO ip_pool | |||
(id, name, description, time_created, time_modified, time_deleted, rcgen, silo_id, is_default) VALUES | |||
('{POOL0}', 'pool2', '', now(), now(), now(), 1, '{SILO2}', true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this deleted pool added to the test data but the fix commented out, the test fails in the way we expect, which matches the problem on dogfood: the (pool3
, silo2
) link gets is_default=false
instead of is_default=true
.
Closes #4875
Problem
After the IP pools migrations on the dogfood rack, the
default
pool was not markedis_default=true
for theoxide
silo when it should have been.Diagnosis
When checking for silo-scoped default pools overriding a fleet-scoped default, I neglected to require that the silo-scoped defaults in question were non-deleted. This means that if there was a deleted pool with
silo_id=<oxide silo id>
andis_default=true
, that would be considered an overriding default and leave us withis_default=false
on thedefault
pool.Well, I can't check
silo_id
andis_default
on the pools because those columns have been dropped, but there is a deleted pool calledoxide-default
that says in the description it was meant as the default pool for only theoxide
silo.I think we can be pretty confident this is what got us.
Fix
Add
AND time_deleted IS NULL
to the subquery.Mitigation in existing systems
Already done. Dogfood is the only long-running system where the bad migration ran, and all I had to do there was use the API to set
is_default=true
for the (default
pool,oxide
silo) link.