-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover controlled items after crash #26
Comments
Ideally we would make this dependent on the component. Examples:
This can be nicely implemented in form of a reconciliation loop where the controller attempts to synchronize its internal state with the state of the external components. See also here: https://cloud.redhat.com/blog/kubernetes-operators-best-practices Persisting the state in a dedicated database owned by the controller only asks for trouble as the state of this database can (and will) get out of sync with the real world (e.g. Kubernetes Pod dies while the controller is not running or a user stops OPAL-RT while the controller is not running). |
In GitLab by @skolen on Nov 25, 2021, 16:11 Thanks for the quick comment. @iripiri and my question was rather: How does a re-started controller know/ learn which were its "external" components before the crash? Currently this information is not saved persistently. If the controller knows which components it controlled before the crash, it can reconcile their status through the respective APIs. |
I assume this would be the task of the managers components. In the current state we always hard-code the manager components via the configuration file (e.g. ConfigMap). The reconciliation loop should then be implemented in the manager component which contacts the APIs to sync its managed components with the components seen by the API. I have already implemented this for the VILLASnode and VILLASrelay manager components here with a periodically called
Cheers, |
For the Kubernetes simulators/manager: We already store meta information such as:
In the Kubernetes metadata associated with the Kubernetes Job resources. If there is more information required, we could also add it to the Job metadata. |
In GitLab by @skolen on Nov 25, 2021, 15:18
After a crash, a VILLAScontroller should be able to recover all the items that it controlled before (not only the default items).
Simple approach: Checkpoint the controlled items in a file or DB once the set of controlled items changes, e.g. if a new simulator is created. Then this Checkpoint can be used to recover from a crash.
Question: Where to safely store the Checkpoint so that it is not destroyed with a crash? Separate Checkpoint config map in k8s for each controller?
The text was updated successfully, but these errors were encountered: