Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize blueprints in the database #4899

Merged
merged 25 commits into from
Jan 26, 2024
Merged

Conversation

jgallagher
Copy link
Contributor

This replaces the in-memory blueprint storage added as a placeholder in #4804 with cockroachdb-backed tables. Both the tables and related queries are heavily derived from the similar tables in the inventory system (particularly serializing omicron zones and their related properties). The tables are effectively identical as of this PR, but we opted to keep the separate because we expect them to diverge some over time (e.g., inventory might start collecting additional per-zone properties that don't exist for blueprints, such as uptime).

The big exception to "basically the same as inventory" is the bp_target table which tracks the current (and past) target blueprint. Inserting into this table has some subtleties, and we use a CTE to check and enforce the invariants. This is the first diesel/CTE I've written; it's based on other similar CTEs in Nexus, but I'd still appreciate a particularly careful look there.

Fixes #4793.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I checked everything here except the function that writes out the SQL directly. I can come back to this but wanted to get this in now.

nexus/src/app/deployment.rs Show resolved Hide resolved
.get(&blueprint_id)
.cloned()
.ok_or_else(|| blueprint.not_found())
self.db_datastore.blueprint_read(opctx, &blueprint).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider making BlueprintMetadata a resource with its own authz type and lookup type so that people could use the usual LookupPath::new(..).blueprint_id(blueprint_id).fetch().await API instead of a function like this. I'm not sure if that's a good idea and it's definitely not important now.

nexus/src/app/deployment.rs Show resolved Hide resolved
schema/crdb/dbinit.sql Show resolved Hide resolved
schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
#[derive(Queryable, Clone, Debug, Selectable, Insertable)]
#[diesel(table_name = bp_target)]
pub struct BpTarget {
pub version: i64, // i64 only for db serialization; should never be negative
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use one of the SqlU* types instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, switched to SqlU32. I wasn't 100% sure that was big enough to never worry about exhausting it, but I think I've convinced myself it's fine. Even if runaway automation set a new target every second (which seems pretty unlikely, since that first requires generating and inserting a new blueprint in its entirety), u32 is fine for 100+ years.

nexus/db-model/src/deployment.rs Outdated Show resolved Hide resolved
Comment on lines +879 to +885
// Error messages generated from the above sentinel values.
const NO_SUCH_BLUEPRINT_ERROR_MESSAGE: &str =
"could not parse \"no-such-blueprint\" as type uuid: \
uuid: incorrect UUID length: no-such-blueprint";
const PARENT_NOT_TARGET_ERROR_MESSAGE: &str =
"could not parse \"parent-not-target\" as type uuid: \
uuid: incorrect UUID length: parent-not-target";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure these are stable messages. But I guess as long as we test the behavior, we should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both counts. FWIW this is not a new technique; I cribbed it from

// Error message generated when we're attempting to operate on an instance,
// either inserting or deleting an interface, and that instance does not exist
// at all or has been destroyed. These are the same thing from the point of view
// of the client's API call.
const NO_INSTANCE_ERROR_MESSAGE: &'static str =
"could not parse \"no-instance\" as type uuid: uuid: incorrect UUID length: no-instance";
, which also has tests covering it.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic work @jgallagher. I particularly like the tests!

pub bp_nic_id: Option<Uuid>,
pub dns_gz_address: Option<ipv6::Ipv6Addr>,
pub dns_gz_address_index: Option<SqlU32>,
pub ntp_ntp_servers: Option<Vec<String>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ntp_ntp intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is - these are NTP zone's NTP servers.

schema/crdb/dbinit.sql Outdated Show resolved Hide resolved
use uuid::Uuid;

#[derive(Debug)]
pub(crate) struct OmicronZone {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question here, nothing to do in response.

It looks like this is essentially flattening the OmicronZoneType enum into a struct full of options. Is this because it's not really possible to represent ZoneType in the DB without flattening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that it's not possible, it's just that it's a lot of work with dubious payoff. At one point I think Dave had 3 different impls at least partly done / sketched out: fully (or mostly) normalized, this one, and "just stuff JSON in there". This seemed like the most reasonable one: we get almost all the benefits of putting this into the DB schema, without all the diesel pain of working with a dozen tables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. This is what you guys were discussing the other day. Thanks.

ensure!(expected_id == nic_row.id, "caller provided wrong NIC");
Ok(nic_row.into_network_interface_for_zone(self.id)?)
}
(None, None) => Err(anyhow!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long rambling nit below:

I was confused by the (None, None) case, even though it's documented well in the block comment. I think this was partially my brain not functioning when reviewing yesterday and it became clear when I read the rest of the function.

However, while I like the pattern I still find it somewhat confusing to mix an early return of a result with an error that will be returned later. I think this could be made clearer if you inlined a comment for the (None, None) case. Maybe that's just moving the last paragraph of the block comment and slightly rewording it. While I typically dislike inlined comments like this for small match statements it may work better here. I'm interested in other's opinions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an inline comment in 0c14431

nexus/db-queries/src/db/datastore/deployment.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/deployment.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/deployment.rs Outdated Show resolved Hide resolved
) {
let conn = datastore.pool_connection_for_tests().await.unwrap();

macro_rules! query_count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From embedded closures to embedded macros, the John Gallagher story.

@jgallagher jgallagher merged commit 80cc001 into main Jan 26, 2024
22 checks passed
@jgallagher jgallagher deleted the john/blueprint-database-1 branch January 26, 2024 21:44
andrewjstone added a commit that referenced this pull request Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DB Serialization for Blueprints
3 participants