You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bit of data migration from #4261 did not behave as expected on dogfood. The default pool was a fleet-level default, and there was a second pool oxide-pool associated directly with the oxide silo but as non-default. In this situation, as the comment describes, the migration ends up associating both pool default and oxide-pool with silo oxide, with default keeping is_default = true and oxide-pool getting is_default = false. When we ran the update, however, both had is_default = false, leaving silo oxide without a default pool.
// pool3 did not previously have a corresponding silo, so now it's associated
// with both silos as a new resource in each.
//
// Additionally, silo1 already had a default pool (pool1), but silo2 did
// not have one. As a result, pool3 becomes the new default pool for silo2.
The text was updated successfully, but these errors were encountered:
david-crespo
changed the title
IP pool migration failed to keep fleet default as defafult
IP pool migration failed to keep fleet default as default
Jan 23, 2024
Closes#4875
## Problem
After the IP pools migrations on the dogfood rack, the `default` pool
was not marked `is_default=true` for the `oxide` silo when it should
have been.
## Diagnosis
When checking for silo-scoped default pools overriding a fleet-scoped
default, I neglected to require that the silo-scoped defaults in
question were non-deleted. This means that if there was a deleted pool
with `silo_id=<oxide silo id>` and `is_default=true`, that would be
considered an overriding default and leave us with `is_default=false` on
the `default` pool.
Well, I can't check `silo_id` and `is_default` on the pools because
those columns have been dropped, but there is a deleted pool called
`oxide-default` that says in the description it was meant as the default
pool for only the `oxide` silo.
```
oot@[fd00:1122:3344:105::3]:32221/omicron> select * from omicron.public.ip_pool;
id | name | description | time_created | time_modified | time_deleted | rcgen
---------------------------------------+--------------------+--------------------------------+-------------------------------+-------------------------------+-------------------------------+--------
1efa49a2-3f3a-43ab-97ac-d38658069c39 | oxide-default | oxide silo-only pool - default | 2023-08-31 05:33:00.11079+00 | 2023-08-31 05:33:00.11079+00 | 2023-08-31 06:03:22.426488+00 | 1
```
I think we can be pretty confident this is what got us.
## Fix
Add `AND time_deleted IS NULL` to the subquery.
## Mitigation in existing systems
Already done. Dogfood is the only long-running system where the bad
migration ran, and all I had to do there was use the API to set
`is_default=true` for the (`default` pool, `oxide` silo) link.
This bit of data migration from #4261 did not behave as expected on dogfood. The
default
pool was a fleet-level default, and there was a second pooloxide-pool
associated directly with theoxide
silo but as non-default. In this situation, as the comment describes, the migration ends up associating both pooldefault
andoxide-pool
with silooxide
, withdefault
keepingis_default = true
andoxide-pool
gettingis_default = false
. When we ran the update, however, both hadis_default = false
, leaving silooxide
without a default pool.omicron/schema/crdb/23.0.0/up4.sql
Lines 14 to 28 in 624fbba
This is despite our lovely test for this very scenario. Indeed, this is the change these tests were added for.
omicron/nexus/tests/integration_tests/schema.rs
Lines 1022 to 1026 in 624fbba
The text was updated successfully, but these errors were encountered: