-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include enums in dbinit_equal_sum_of_all_up
test
#4472
Conversation
sum_of_all_up
check
sum_of_all_up
checkdbinit_equal_sum_of_all_up
test
@@ -168,13 +168,14 @@ async fn query_crdb_schema_version(crdb: &CockroachInstance) -> String { | |||
// | |||
// Note that for the purposes of schema comparisons, we don't care about parsing | |||
// the contents of the database, merely the schema and equality of contained data. | |||
#[derive(Eq, PartialEq, Clone, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to get rid of Eq
to make this work because floats don't impl Eq
. Fortunately it seems our equality checks only need PartialEq
anyway.
e7845ff
to
5d670bb
Compare
5d670bb
to
f731961
Compare
I spent some time reading through Cockroach's RFC for adding enums to get a little bit of a better understanding of how they're implemented. Inserts at the beginning or the middle of a enum are supported and work fine w/ alter. In the cases that were updated in the db init, the migration itself was wrong. For the -ALTER TYPE omicron.public.service_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper';
+ALTER TYPE omicron.public.service_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper' AFTER 'clickhouse'; I think this justifies the test if nothing else. Running dbinit on an empty db should result in the same enum as running the migrations from the first schema version. I don't know what bugs, if any, we might run into here, but this does seem like a correctness case that the test does a good job of solving. |
schema/crdb/dbinit.sql
Outdated
@@ -191,7 +191,6 @@ CREATE UNIQUE INDEX IF NOT EXISTS lookup_switch_by_rack ON omicron.public.switch | |||
|
|||
CREATE TYPE IF NOT EXISTS omicron.public.service_kind AS ENUM ( | |||
'clickhouse', | |||
'clickhouse_keeper', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure what I'm about to say is absolutely taboo, but I think we should actually update the migration and not this. The specific update would be an AFTER 'clickhouse'
being added to the alter clause of the enum. For enums where clickhouse
already exists that migration step won't be re-run, but for new systems it'll be the same as what we have today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's very cool. I didn't know you could insert it into a particular spot. We might have issues with the order number comparison if we do that. I think when you insert in the middle of two existing members, it's going to give it an decimal order number halfway between them. Then again I think we will also have this problem if we ever delete a variant. Important not-really-edge cases to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be an issue because the enum isn't stored as a decimal. You can find an overview of the format here. Postgres stores enums as floats so you'd [1.0, 2.0]
and inserting a value between them would end up with [1.0, 1.5, 2.0]
. CRDB just does some fancy byte index padding which essentially ends up being a similar concept. So this should still work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m saying those order numbers won’t match what’s generated from scratch in dbinit.sql, which would be [1.0, 2.0, 3.0]
, so the test will fail as currently written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! To my surprise, this still passes. I don't think we can change the migrations that have already been applied on customer racks, but I agree the AFTER
trick is what we should do in the future.
diff --git a/schema/crdb/4.0.0/up1.sql b/schema/crdb/4.0.0/up1.sql
index b26b646d6..ba67d8394 100644
--- a/schema/crdb/4.0.0/up1.sql
+++ b/schema/crdb/4.0.0/up1.sql
@@ -1 +1 @@
-ALTER TYPE omicron.public.service_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper';
+ALTER TYPE omicron.public.service_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper' AFTER 'clickhouse';
diff --git a/schema/crdb/4.0.0/up2.sql b/schema/crdb/4.0.0/up2.sql
index 018d39c13..e71cb22f3 100644
--- a/schema/crdb/4.0.0/up2.sql
+++ b/schema/crdb/4.0.0/up2.sql
@@ -1 +1 @@
-ALTER TYPE omicron.public.dataset_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper';
+ALTER TYPE omicron.public.dataset_kind ADD VALUE IF NOT EXISTS 'clickhouse_keeper' AFTER 'clickhouse';
diff --git a/schema/crdb/dbinit.sql b/schema/crdb/dbinit.sql
index 6e8ab0e39..da842cbfe 100644
--- a/schema/crdb/dbinit.sql
+++ b/schema/crdb/dbinit.sql
@@ -191,6 +191,7 @@ CREATE UNIQUE INDEX IF NOT EXISTS lookup_switch_by_rack ON omicron.public.switch
CREATE TYPE IF NOT EXISTS omicron.public.service_kind AS ENUM (
'clickhouse',
+ 'clickhouse_keeper',
'cockroach',
'crucible',
'crucible_pantry',
@@ -201,7 +202,6 @@ CREATE TYPE IF NOT EXISTS omicron.public.service_kind AS ENUM (
'ntp',
'oximeter',
'tfport',
- 'clickhouse_keeper',
'mgd'
);
@@ -402,9 +402,9 @@ CREATE TYPE IF NOT EXISTS omicron.public.dataset_kind AS ENUM (
'crucible',
'cockroach',
'clickhouse',
+ 'clickhouse_keeper',
'external_dns',
- 'internal_dns',
- 'clickhouse_keeper'
+ 'internal_dns'
);
/*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have to leave the migrations as is we should at least comment on why the enum arms are out of order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this - I feel bad that the initial db comparison tooling wasn't comprehensive enough (ugh why would CRDB silently drop user-defined types from InformationSchema) but I really appreciate you handling this case.
similar_asserts::assert_eq!( | ||
self.enums, | ||
other.enums, | ||
"Enums did not match. Members must have the same order in dbinit.sql and migrations. If a migration adds a member, it should be added at the end of the definition in dbinit.sql." |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
schema/crdb/dbinit.sql
Outdated
@@ -202,6 +201,7 @@ CREATE TYPE IF NOT EXISTS omicron.public.service_kind AS ENUM ( | |||
'ntp', | |||
'oximeter', | |||
'tfport', | |||
'clickhouse_keeper', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zephraph asked the following extremely relevant question:
is it unsafe to update the migrations to add the AFTER clause? That migration won't rerun on customer deployments, right? Even if it did, the enum is already present so it wouldn't try to readd it
So there are a few cases we need to consider:
- Someone hasn't deployed yet. In this case, whatever dbinit.sql says goes, so it "matches" trivially during formatting.
- Someone ran with an old version of dbinit.sql , and applied the schema migrations to get to this order.
Before this PR:clickhouse_keeper
appears at the end of the enum in the deployed DB, which doesn't match dbinit.sql
After this PR:dbinit.sql
matches the order applied by schema changes. - Someone ran with a newish version of dbinit.sql, and hasn't applied any schema changes.
Before this PR:clickhouse_keeper
does not appear at the end of the enum in the deployed DB.
After this PR: the enum order is no longer correct! In the deployed DB,clickhouse_keeper
is early in the order, but this doesn't matchdbinit.sql
, which thinks it shows up later.
By making this change, we "fix" case (2), but we "break" case (3).
I point this out to identify: if you rely on enum order after this change, there are still deployments that exist where we do not have such a guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this is to say -- we can make this change, and if you want, but I think it's still unsafe to rely on order of our enums with possible deployments unless we add a "schema change" that validates order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, to be explicit, the changes that you've added to the integration tests are extremely useful still, in either situation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pain-in-the-ass, but we could:
- Add a new enum variant to service_kind -- at the end -- called clickhouse_keeper2
- Update omicron.public.service_kind usage, anywhere that clickhouse_keeper is used, update the index to clickhouse_keeper2
- Drop the clickhouse_keeper variant (see: https://www.cockroachlabs.com/docs/v23.1/alter-type#drop-a-value-in-a-user-defined-type)
This is a "full table scan" operation but the data sizes are really small so I don't think it's so bad
Then we add that as a schema change while updating dbinit.sql at the same time
New deployments -> Use dbinit.sql, new order
Old deployments -> Use the schema change, land on the same order as dbinit.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor note... it doesn't have to be at the end. It'll still work being inserted after clickhouse
as we intended clickhouse_keeper
to originally be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach (even if it makes me slightly nervous b/c we have to do a small amount of data migration). There are two primary justifications that lead me to believe this is a good idea
- Given that enums are ordered, having this integration test is valuable. It's caught this case where migrations result in an effectively different enum shape than what we intended in dbinit
- We have an active case of database bifurcation. Installations which were or will be provisioned from a schema version greater 4.0.0 will effectively have different values for the
service_kind
anddataset_kind
than those provisioned with an earlier version that had to run that migration. I don't have a full understanding of what challenges this could cause outside of ordering issues but it seems safer to sand down this sharp edge before we cut ourselves on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did it in 338828a. Rather than keep clickhouse_keeper2
around to confuse people in perpetuity, I decided to do the same thing again to change it back to clickhouse_keeper
after fixing the order.
See the readme I added for the migration summarizing what it does: https://github.com/oxidecomputer/omicron/blob/338828a24837790f7b96a30dde7aecf7963748d0/schema/crdb/10.0.0/README.md
Still working on the above comments, but the
|
e5f8e46
to
338828a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes make sense to me -- good catch to not leave clickhouse_keeper2
lying around!
Thanks for doing this!
Only the “nexus applies update on boot” test fails on our friend the table scan prohibition. Easiest fix would be to temporarily turn that flag off.
|
I believe that we can narrowly enable the FTS for only the impacted query: omicron/nexus/db-queries/src/db/queries/mod.rs Lines 21 to 41 in 4331828
|
While working on #4261 I
ran intocreated a situation where the order of items in a DB enum matters because I want to sort by it. In this case it's about specificity — the type issilo
orfleet
and I want to sort by it to ensure I get the most "specific" thing first. @zephraph noticed I had an order mismatch between my migration adding the enum and the copy indbinit.sql
, and it wasn't caught by the tests.So, here I've added the list of user-defined enums to the set of things we compare when we ask whether the result of our migrations matches the contents of
dbinit.sql
. There were two existing enums that failed the test because the copy indbinit.sql
was sorted alphabetically, but a migration added a new item at the end of the list. Very easy to fix, though now are enums aren't sorted nicely, which is sad.The question is: should we enforce enum order? Seems like we might as well if it doesn't cost us much. But should we write application logic like #4261 that relies on enum order? In that case, at least, I have a reasonable alternative because the list of things I'm sorting is at most 2 long, so I can easily get the whole list and pick the element I want in app code instead of doing an
order by
+limit 1
in SQL. It is made quite clear in a comment, but it still feels a little dicey.