-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate runtime updates for etcd #71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully some suggestions for the unit tests to pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good -- one suggested change
client_key=etcd_config.get("client_key", ""), | ||
) | ||
|
||
self.api_manager.update_cluster_config(update_request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I continue to encounter this error when the third etcd unit tries to post it's update:
charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 500
method=PUT
endpoint=/1.0/k8sd/cluster/config
reason=Internal Server Error
body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"failed to reconcile network: failed to refresh network: failed to refresh network component: failed to upgrade component 'network': another operation (install/upgrade/rollback) is in progress","metadata":null}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the k8sd and charm logs around the same moment of this failure
https://pastebin.canonical.com/p/njZJqx55yX/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api server logs around the same moment
https://pastebin.canonical.com/p/QzMpFCmQCV/
is it possible that adding extra etcd units in rapid succcession caused the API server to restart when the first datastore came online, and when we published the second datastore URL -- the config wouldn't take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Adam,
Thanks for the review. I think the issue you were facing was before canonical/k8s-snap#356 was merged.
I cannot reproduce this issue and the CI also seems to be happy.
Co-authored-by: Adam Dyess <[email protected]>
Co-authored-by: Adam Dyess <[email protected]>
Co-authored-by: Adam Dyess <[email protected]>
Overview
With canonical/k8s-snap#334 the snap allows to configure the external datastore on runtime. This allows for changes of the etcd configuration after the cluster is bootstrapped.
The snap only performs steps (restarting services etc.) if a configuration changed. Hence, we can just send the current datastore configuration of the charm to the endpoint on every event. If there is no change, nothing happens.
Changes
UserFacingDatastoreConfig
type as defined in thek8s-snap
APIDatastore
status typedatastore
to theUpdateClusterConfigRequest
ensure_cluster_config