-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DB Serialization for Blueprints #4793
Comments
Some dimensions to consider: How many tables do we need to represent the blueprint?At a base level, I'm assuming we have a blueprint structure like the following: CREATE TABLE omicron.public.blueprint (
id UUID NOT NULL
parent_id UUID,
...
) But there's a question of "how do we represent the blueprint targets, like the set of active sleds?" 1. "The entirely in-DB approach", where we create a new table like this:CREATE TABLE omicron.public.blueprint_sled_record (
// References `omicron.public.sled`
sled_id UUID NOT NULL,
// References `omicron.public.blueprint`
blueprint_id UUID NOT NULL
) And we create these records anew for each blueprint. Creating a new blueprint would mean:
Pros: Easy + fast to index using the DB alone when determining "which sleds do/don't belong to a particular blueprint 2. "The mostly dense blueprint, more in-memory approach"Where we add an This would look something like the following: CREATE TABLE omicron.public.blueprint (
id UUID NOT NULL
parent_id UUID,
sleds UUID[],
...
) Pros: Much cheaper to create / maintain blueprints. We can use Rust to do set management rather than SQL which is kinda nice. |
How do we manage concurrency control with the blueprint?If we use the latest blueprint as a "goalpost", there are a lot of downstream tasks that would like to consume this information (e.g., creating DNS records for the allocated services / sleds, deciding which disks are active/inactive, deciding which sleds are valid targets for instance allocation). 1. Store a marker on the blueprint to signify it's the latest, check it in downstream tasksFor example, this proposes the following structure: CREATE TABLE omicron.public.blueprint (
id UUID NOT NULL
parent_id UUID,
...
);
-- We do a similar thing for storing the "DB version"; the
-- singleton lets everyone access the "implied latest thing".
--
-- If we had a "fleet UUID", we could use that as primary
-- key here too.
CREATE TABLE omicron.public.blueprint_metadata (
singleton BOOL NOT NULL PRIMARY KEY,
blueprint_id UUID UNIQUE REFERENCES blueprint (id),
CHECK (singleton = true)
); If an operation wants to read information about the latest blueprint, it can do so by accessing a cached, in-memory copy, and also validate that the blueprint has not changed by comparing against the metadata. For example: -- Do the new work based on the blueprint...
UPDATE <some auxiliary table> SET <new values>
-- Validating that the old blueprint has not changed.
-- This could be modified to return an explicit error, to make it more clear, if that would help.
WHERE $old_blueprint_id = (SELECT blueprint_id FROM blueprint_metadata WHERE singleton = TRUE) The idea here is optimistic concurrency control, similar to our usage of "rcgen" elsewhere (and we could insert generation numbers explicitly into the We use a similar concept for concurrency control from Nexus -> external services (DNS servers take a generation number as input, as do sled agents when Within Nexus, there are many RPWs that may want to consume the contents of the blueprint. By using optimistic concurrency control, they can safely mutate the state of the database conditionally on the assumption that the blueprint has not changed while their calculations took place. 2. Operate entirely in the DBThis requires "The entirely in-DB approach" from this comment above: #4793 (comment). Basically: When acting on some information from the blueprint, perform all the lookups and validation entirely within the context of a SQL operation. For example, to select the set of active sleds: SELECT sled_id
FROM blueprint_sled_record INNER JOIN blueprint on blueprint_sled_record.blueprint_id = blueprint.id
WHERE blueprint_id = (SELECT blueprint_id FROM blueprint_metadata WHERE singleton = TRUE) Different from option (1), this requires no "in-memory caching" of blueprint information, since the records can be acted upon entirely within SQL. This option is arguably the "least likely to generate retries", but also the least flexible, as it requires all execution to be done purely in SQL, which may impose usage of CTEs more heavily. 3. Cache a "pretty recent" blueprint in-memory, hope for the bestThis option omits concurrency control altogether. We read a "pretty recent" copy of the blueprint into memory, act on it, and keep it up-to-date periodically. Downstream tasks may operate on old versions of the blueprint, and we just trust that they'll be "eventually consistent". This is definitely the simplest option, but it has drawbacks. Many different nexuses can act with different blueprints as "targets" simultaneously. This can cause problems for downstream tasks. (Admission from Sean: I don't really understand how this option would be viable, but @davepacheco mentioned it to me in chat: "Also, the whole idea is phrasing planning+execution such that it's generally okay to attempt to execute an older blueprint" ) |
Follow-up from a discussion with @jgallagher : One more idea for "concurrency control" with the blueprint would be to store one or more generation numbers in the blueprint structure itself, and for downstream consumers to use those. 4. Create generation numbers within the blueprintFor example: The DNS system currently has a One difficulty with blueprints is the issue of "deciding when and how to update this column".
Example data flow:
NOTE: The "Generation Number as External Service concurrency control" mechanism already exists today, and is also part of our API for requesting new services. However, the usage of "Generation Number as CRDB concurrency control" would be somewhat new -- in particular, the choice to have it be generated by the blueprint would be a novel behavior. |
#4804 calls this the "target" blueprint. A previous version did call it "goal" (and let me know if I missed a spot!). I don't really care which we use but I think we should use consistent terminology.
I think it may be important to distinguish uses of the blueprint that are part of execution vs. those that aren't. So far, we've designed execution mechanisms that can correctly ignore attempts to use an older version (the DNS version and the OmicronZonesConfig version). That may not be possible for something like "deciding which sleds are valid targets for instance allocation". But that choice is always racy -- a sled might catch fire immediately after we decide to put an instance there. So how much does it matter if we make a slightly stale choice, if the system detects that quickly and corrects it? I think we want to treat each of these uses of blueprint data individually until we have enough to generalize.
I think I've been assuming the "entirely in-DB" approach. I think that's pretty straightforward for the Omicron zones -- we've already done the same thing for the inventory side. I'm not sure if sleds need any representation in the blueprint yet. I've been thinking a bit about the sled lifecycle, going through states like "in-service", "draining", and "decommissioned". I think we could keep storing that in the
I have been assuming that blueprints are immutable and that if we wanted to change the intended state of the system, we'd create a new blueprint. If blueprints could change and had a generation number or something, when would you bump that vs. generate a new blueprint?
Option 2 (John's idea) sounds similar to a proposal that @smklein and I discussed in chat a few weeks ago. That was basically:
I think this is not quite the same but I don't think I follow John's idea yet. |
Closing the loop a little bit -- I wrote up a lot of ideas in this issue as an attempt to understand "how should downstream tasks consume the blueprint, to decide what work needs to be done". I originally did this writing with the idea that "downstream tasks will want to read the blueprint, in some form, to understand what work they need to do". However, as I'm better understanding the executor, it sounds like the blueprint executor is responsible for reading the blueprint and writing state (in the form of records, updated state, etc) that triggers these backgrounds tasks to do work. This means that the executor (optionally) forms a layer of indirection between "the blueprint" and "tasks which consume the blueprint". Whether or not the "version" or "generation numbers" exist in the blueprint, some "queued operations table" or directly in the state field of a table being acted upon, is kinda an implementation detail, that I suppose we'll figure out as we create more of these tasks acting downstream from the blueprint. |
This replaces the in-memory blueprint storage added as a placeholder in #4804 with cockroachdb-backed tables. Both the tables and related queries are _heavily_ derived from the similar tables in the inventory system (particularly serializing omicron zones and their related properties). The tables are effectively identical as of this PR, but we opted to keep the separate because we expect them to diverge some over time (e.g., inventory might start collecting additional per-zone properties that don't exist for blueprints, such as uptime). The big exception to "basically the same as inventory" is the `bp_target` table which tracks the current (and past) target blueprint. Inserting into this table has some subtleties, and we use a CTE to check and enforce the invariants. This is the first diesel/CTE I've written; it's based on other similar CTEs in Nexus, but I'd still appreciate a particularly careful look there. Fixes #4793.
A "blueprint" is a concept arriving in main...dap/update-control-2 , which describes the set of software that should be running on hardware components, including versions and configurations.
The update planner will end up evaluating blueprints and generating new ones, but the "goal" blueprint describes the intended state of the system. This is a description of fleet-wide configuration intent, which has overlap with topics like sled addition and removal (See: #4787, #4719). During the update sync, we discussed that the "goal blueprint" could be the source of configuration information, such as "What DNS config should we deploy?"
This has been kinda a back-and-forth from a design perspective: Is the state of Sleds, disks, services, etc, "attached to the object" (e.g., in the form of a "State" row in the DB table) or is it "part of the blueprint" (e.g., if it's in the blueprint, it's in-use, otherwise, it is not in-use, and the state can be inferred from other factors).
A lot of this discussion depends on the form of the blueprint when it's serialized to the database. This issue tracks that serialization specifically.
I'm basically filing this issue because I want to work on some other downstream tasks:
The text was updated successfully, but these errors were encountered: