Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic replacement of MC NotReady nodes #1541

Closed
whites11 opened this issue Oct 20, 2022 · 9 comments
Closed

Automatic replacement of MC NotReady nodes #1541

whites11 opened this issue Oct 20, 2022 · 9 comments

Comments

@whites11
Copy link

whites11 commented Oct 20, 2022

Similarly to what we do on WCs, we should have an automatic replacement feature of MC nodes when they get not ready

@whites11 whites11 changed the title Automatic replacement of NotReady nodes Automatic replacement of MC NotReady nodes Oct 20, 2022
@T-Kukawka
Copy link
Contributor

Ideal approach would be to have an operator that recycles nodes regardless of the provider. This could be used for CAPI then as well. Operator would have to be HA as the node it is running on can be affected.

@whites11
Copy link
Author

@giantswarm/team-hydra is there any feature like this implemented in the CAPI world?

@alex-dabija
Copy link

CAPI does have the concept of health checks for machines. Unfortunately, it only works with machine deployments and not with machine pools because a machine resource CR (or machine set CR) needs to be present.

@T-Kukawka
Copy link
Contributor

Waiting for CAPA to stabilize implementation of machinepools/machinedeployments

@whites11
Copy link
Author

whites11 commented Apr 6, 2023

did some progress in giantnetes-terraform 14.12.0.
We run a script in the masters that take the node down if services are down

@nprokopic
Copy link

In CAPI all MCs will also be WCs at the same time (with all CAPI CRs and managed my CAPI), so it will be possible to use MachineHealthChecks (however well/bad they work).

Currently it should be possible to use MachineHealthChecks for control plane nodes and workers created with machine deployments (CAPG and CAPZ, and I think all onprem providers).

When it comes to MachinePool support (currently used in CAPA), there is an open PR to implement MachinePoolMachine, which should then make it possible to implement MachineHealthChecks for MachinePool as well.

@fiunchinho
Copy link
Member

@fiunchinho
Copy link
Member

Nowadays I think this is a duplicate of https://github.com/giantswarm/giantswarm/issues/28006. I think I'd close this one @T-Kukawka

@T-Kukawka
Copy link
Contributor

agreed, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants