Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate runtime updates for etcd #71

Merged
merged 11 commits into from
Apr 22, 2024

Conversation

bschimke95
Copy link
Contributor

Overview

With canonical/k8s-snap#334 the snap allows to configure the external datastore on runtime. This allows for changes of the etcd configuration after the cluster is bootstrapped.
The snap only performs steps (restarting services etc.) if a configuration changed. Hence, we can just send the current datastore configuration of the charm to the endpoint on every event. If there is no change, nothing happens.

Changes

  • Add UserFacingDatastoreConfig type as defined in the k8s-snap API
  • Update Datastore status type
  • Add datastore to the UpdateClusterConfigRequest
  • Add new reconcile step ensure_cluster_config

Copy link
Contributor

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully some suggestions for the unit tests to pass

charms/worker/k8s/lib/charms/k8s/v0/k8sd_api_manager.py Outdated Show resolved Hide resolved
Copy link
Contributor

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good -- one suggested change

tests/integration/test_etcd.py Outdated Show resolved Hide resolved
client_key=etcd_config.get("client_key", ""),
)

self.api_manager.update_cluster_config(update_request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to encounter this error when the third etcd unit tries to post it's update:

charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 500
	method=PUT
	endpoint=/1.0/k8sd/cluster/config
	reason=Internal Server Error
	body={"type":"error","status":"","status_code":0,"operation":"","error_code":500,"error":"failed to reconcile network: failed to refresh network: failed to refresh network component: failed to upgrade component 'network': another operation (install/upgrade/rollback) is in progress","metadata":null}

Copy link
Contributor

@addyess addyess Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the k8sd and charm logs around the same moment of this failure
https://pastebin.canonical.com/p/njZJqx55yX/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api server logs around the same moment
https://pastebin.canonical.com/p/QzMpFCmQCV/

is it possible that adding extra etcd units in rapid succcession caused the API server to restart when the first datastore came online, and when we published the second datastore URL -- the config wouldn't take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Adam,
Thanks for the review. I think the issue you were facing was before canonical/k8s-snap#356 was merged.

I cannot reproduce this issue and the CI also seems to be happy.

@bschimke95 bschimke95 marked this pull request as ready for review April 22, 2024 09:41
@bschimke95 bschimke95 requested a review from a team as a code owner April 22, 2024 09:41
@bschimke95 bschimke95 merged commit 653378b into main Apr 22, 2024
34 checks passed
@bschimke95 bschimke95 deleted the KU-530/updates-for-external-datastore branch April 22, 2024 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants