-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sled agent: index running VMMs by VMM ID, not instance ID #6429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Even setting aside that this is a prerequisite for implementing RFD 486, I think that, in a post-#5749 world, it feels a lot more conceptually correct for sled-agents to refer to things by their VMM IDs rather than their instance IDs. And, thanks so much for cleaning up some of the weird naming and such that I left behind in #5749! :)
I left some suggestions for some small improvements we could make, but I don't have any high-level concerns about the change overall. It would be nice to get rid of the code in sled-agent that loops over the BTreeMap
trying to find a Propolis UUID that looks the same as a Propolis zone name when we can just look up the UUID now, and it might be worth eliminating the VMM refetch by having vmm_and_migration_update_runtime
return the instance ID? But, for the most part, I love this change.
This is mostly renaming and cleanup; the main functional change is to the instance zone bundle collection routine, which now parses the input zone name to get a Propolis ID instead of converting all registered Propolis IDs to zone names to look for a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all looks good to me whenever you're happy with it!
Thanks as always for the review, @hawkw -- I'd like to check manually that instance zone bundles still work as intended; provided that they do I'll merge once main is open again. |
Change sled agent's instance lookup tables so that Propolis jobs are indexed by Propolis/VMM IDs instead of instance IDs.
This is a prerequisite to revisiting how the Failed instance state works. See RFD 486 section 6.1 for all the details of why this is needed, but very broadly: when an instance's VMM is Destroyed, we'd like sled agent to tell Nexus that before the agent deregisters the instance from the sled, for reasons described in the RFD; but if we do that with no other changes, there's a race where Nexus may try to restart the instance on the same sled before sled agent can update its instance table, causing instance start to fail.
To achieve this:
InstanceManagerRunner
's instance map to aBTreeMap<PropolisUuid, Instance>
, then clean up all the compilation errors.cpapi_instances_put
to take a Propolis ID instead of an instance ID. Regular sled agent still has both IDs, but with these changes, simulated sled agents only have a Propolis ID to work with, and plumbing an instance ID down to them requires significant code changes.poke
endpoints to use Propolis IDs.Tests: cargo nextest.
Related: #4226 and #4872, among others.