-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blueprints: Nexus needs to update external DNS #5068
Comments
This is trickier than I thought because we still have not automated Reconfigurator. So if Reconfigurator fully owns writing to the DNS tables, then we'd regress functionality: when you create a Silo, you wouldn't get DNS names for it unless you ran the planner/executor by hand. And I don't think we're quite ready to fully automate Reconfigurator yet. So if we start from the constraint that we don't change the "silo create"/"silo delete" paths so that they continue working as-is, then suppose we adopt a scheme similar to what we do for internal DNS:
If this succeeds, then we know there have been no changes to external DNS since the blueprint was planned, so our view of what external DNS should look like is up-to-date, and we have successfully made DNS reflect the current reality. If this fails because external DNS has changed in the meantime, we have to abort and cannot fix this until a new blueprint is planned. (This is the same behavior as for internal DNS.) What if a "silo create" operation happens between planning and execution? This operation commits changes to the Silo table and the DNS tables in one transaction. So one of these will be true:
|
Just making sure I follow the bad case here, you're describing this sequence, right?
At this point the blueprint executor will be stuck constantly failing, right? When it lists all Silos, it will see the new one, see that it needs a DNS diff, attempt to apply it, and fail, because the external DNS version stored in the plan is now stale. Then rerunning the planner to get the latest DNS version will unwedge it and allow it to fix the new silo's incorrect DNS entries. |
That's all correct. |
@davepacheco Your new plan seems reasonable to me. It's also very unlikely in the current state, that an operator would be creating a silo while we were running reconfigurator. |
The rack operates external DNS servers containing DNS names for each Silo. These DNS names resolve to the set of Nexus external IPs. Thus, this needs to be updated when either a Silo is created or destroyed or when a Nexus is brought into or out of service.
Reconfigurator probably needs to take responsibility for this. Right now, we add and remove DNS external DNS names when Silos are created or destroyed. I think it'll be unnecessarily complicated to try to keep doing that while having Reconfigurator be responsible for adding and removing DNS names when Nexus instances come and go. Simpler would be to do what we're doing with internal DNS in #4989, which is that the blueprint contains enough information that the executor can construct the complete contents of external DNS and then just make that the reality.
We could go further: after doing this, the only thing that ever changes DNS would be the blueprint system. It works by just writing the correct records to the database and letting the existing DNS propagation background tasks take care of the rest. We could consider ripping out the DNS database and propagation stuff altogether and instead just propagate to DNS servers directly from blueprint execution. I'm not sure it's really worth making this change now.
The text was updated successfully, but these errors were encountered: