Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint builder: Guard against unknown external networking inputs #5776

Merged
merged 4 commits into from
May 16, 2024

Conversation

jgallagher
Copy link
Contributor

This started as a small cleanup to make builder.rs a little less noisy (the first commit, that just moves all the external networking "what IPs are in use and what IP should I pick next" logic into a separate module). As I was doing that, though, I noticed we only look at the parent blueprint when deciding what IPs are used, and we completely ignore PlanningInput (which tells us what IPs were in use in the database "recently").

I think we should never have an IP or NIC in the database that isn't also in the parent blueprint: blueprints are the source of how IPs/NICs are created, and we shouldn't prune zones from the blueprint until we've confirmed that external networking records for those zones have been deleted. I made this a planning error: if we get an input that reports external IPs or NICs that the parent blueprint doesn't know about, we bail out and fail to plan.

Assuming all of the above is correct (please correct me if any of it seems wrong!), it is still possible for this to fail, I think, if we had a sequence like this:

  1. Nexus A generates a PlanningInput by reading from CRDB
  2. Nexus B executes on a target blueprint that removes IPs/NICs from CRDB
  3. Nexus B regenerates a new blueprint and prunes the zone(s) associated with the IPs/NICs from step 2
  4. Nexus B makes this new blueprint the target
  5. Nexus A attempts to generate a new blueprint with its PlanningInput from step 1

This isn't possible today for at least a couple reasons: we don't do prune zones as in step 3 at all, and the timing for steps 3-5 probably require automated blueprint regeneration. I think it could eventually be possible, but seems fine? Nexus A would fail to plan in step 5, but it would succeed the next time it tried (because it would generate a new PlanningInput that reflected the changes from step 2).

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're commit message makes sense to me.

For an expunged zone, we'd first remove the network resources from the DB, then from the blueprint. The next blueprint could add a new zone reusing those resources as necessary. That resource wouldn't be in the DB because it was already removed. This is the converse of what you mention, but tracks with all that IIRC.

@jgallagher jgallagher merged commit 5d5aab5 into main May 16, 2024
18 checks passed
@jgallagher jgallagher deleted the john/blueprint-builder-external-networking branch May 16, 2024 20:31
@jgallagher
Copy link
Contributor Author

Relaying some offline chat for posterity: @sunshowers pointed out that the change here exactly matches case 2 from this comment:

    // [2]: This should never happen. The blueprint is meant to be a complete
    //      description of the state. This is a bug -- warn about it (and maybe
    //      consider erroring out?)

and this PR did make the decision to error out on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants