Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve node removal by validating departed|broken relations #80

Merged
merged 8 commits into from
Apr 26, 2024

Conversation

addyess
Copy link
Contributor

@addyess addyess commented Apr 25, 2024

Summary

Attempts to lengthen the amount of time that a leaving node exists to give time to the lead control-plane charm time to remove the unit

Changes

  • revoke tokens only when k8s-cluster and cluster relations experience a RelationDepartedEvent
  • detect departing unit and elect for removal
  • update_status event now checks if the node itself is ready with kubectl
  • last_gasp now requires the node to be reported not in a Ready at least 3 times before ending the delay

@addyess addyess requested a review from a team as a code owner April 25, 2024 04:19
Copy link
Member

@mateoflorido mateoflorido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +91 to +93
isinstance(event, ops.RelationDepartedEvent)
and event.relation.name in ["k8s-cluster", "cluster"]
and event.departing_unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of the and operator 😄

Copy link
Contributor Author

@addyess addyess Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> 1 and 2 and 3 and 4
4

Comment on lines +598 to +600
if not self._is_node_ready():
status.add(ops.WaitingStatus("Node not yet Ready"))
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a nice little check for update-status to keep the node from showing "Ready" when -- actually it can't host any workloads yet because it's not Kubernetes Ready

Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work.
Always fun to learn some new Python magic. (e.g. that and chains return the last element)

@@ -618,6 +670,39 @@ def _on_update_status(self, _event: ops.UpdateStatusEvent):
except status.ReconcilerError:
log.exception("Can't update_status")

def kubectl(self, *args) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The charm file starts to get quite large. Maybe it makes sense to split it up in the future, e.g. by moving such utils into their own classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is -- and i got REALLY close to pulling the trigger on that here.

@addyess addyess merged commit 66b4af0 into main Apr 26, 2024
34 checks passed
@addyess addyess deleted the akd/improve-node-removal branch April 26, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants