Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement upgrade-relation for control plane nodes #200

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

mateoflorido
Copy link
Member

Overview

Introduce upgrade orchestration for control plane nodes.

Rationale

To simplify the upgrade process for charms, this pull request adds orchestration logic specifically for upgrading the control plane nodes, specifically the charm core (k8s snap). It does not cover the worker node upgrade orchestration, which will be addressed in a future pull request.

Changes

  • Added the on_upgrade_granted handler to manage the upgrade process for nodes in the cluster.
  • Integrated the upgrade instance into the status update handler to reflect the current upgrade status in the charm's status.
  • Enhanced the snap and charm modules to support unblocking snap installations when the upgrade logic confirms all conditions are met.

@mateoflorido mateoflorido requested a review from a team as a code owner December 2, 2024 04:48
Copy link
Contributor

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrades are obviously hard. We want to guide the user rather than road-blocking them at every turn. In the end, the juju admin knows best what they want their cluster to do and how to get it to the revision they want. We shouldn't make unlimited hurdles from them to cross. This process should guide them -- not block them at every turn.

I say this as one who has performed these upgrades before and putting too many guardrails up saves the casual user but can impede the person who has accepted the risk and just wants the cluster upgraded

charms/worker/k8s/src/charm.py Outdated Show resolved Hide resolved
charms/worker/k8s/src/charm.py Outdated Show resolved Hide resolved
Comment on lines -697 to +727
status.add(ops.BlockedStatus(f"Version mismatch with {unit.name}"))
raise ReconcilerError(f"Version mismatch with {unit.name}")
# NOTE: Add a check to validate if we are doing an upgrade
status.add(ops.WaitingStatus("Upgrading the cluster"))
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method _announce_kubernetes_version feels similar to the above get_worker_version in some ways that it's reading the version field from the k8s-cluster or cluster relation.

So is a version mismatch now a waiting situation because an upgrade is in-progress? Is that why there's a NOTE here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, i see now that it's because the raising of the reconciler error prevented the upgrade events from engaging. Whew -- we still need a check to let folks know they're running on out-of-spec-version of the applications

For now _announce_kubernetes_version is only run on the lead CP. say you deployed and related a kw 1.35 to a 1.31 cluster. I imagine the 1.35 workers may not join. Should they join? Is the k8s-cp the right place to gripe about it? You're right that we should at least make sure we're not in an upgrade scenario before we raise the reconciler error.

charms/worker/k8s/src/protocols.py Outdated Show resolved Hide resolved
charms/worker/k8s/src/upgrade.py Show resolved Hide resolved
charms/worker/k8s/src/upgrade.py Outdated Show resolved Hide resolved
charms/worker/k8s/src/upgrade.py Show resolved Hide resolved
charms/worker/k8s/src/charm.py Outdated Show resolved Hide resolved
Copy link
Contributor

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work here. Thanks mateo!

Copy link
Contributor

@eaudetcobello eaudetcobello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left a few comments.

Copy link
Contributor

github-actions bot commented Dec 4, 2024

Test coverage for 7e7e480

coverage-report: install_deps /home/runner/work/k8s-operator/k8s-operator/charms/worker/k8s> python -I -m pip install 'coverage[toml]'
coverage-report: commands[0] /home/runner/work/k8s-operator/k8s-operator/charms/worker/k8s> coverage report
Name                                    Stmts   Miss  Cover
-----------------------------------------------------------
lib/charms/k8s/v0/k8sd_api_manager.py     278     29    90%
src/charm.py                              491    242    51%
src/cloud_integration.py                   80      3    96%
src/config/extra_args.py                   27      1    96%
src/containerd.py                         140     16    89%
src/cos_integration.py                     33     12    64%
src/events/update_status.py                51     10    80%
src/inspector.py                           40      4    90%
src/kube_control.py                        39     31    21%
src/literals.py                             6      0   100%
src/protocols.py                           26      5    81%
src/reschedule.py                          77      4    95%
src/snap.py                               185     26    86%
src/token_distributor.py                  181    109    40%
src/upgrade.py                            105     48    54%
-----------------------------------------------------------
TOTAL                                    1759    540    69%
coverage-report: OK (1.22=setup[1.01]+cmd[0.21] seconds)
congratulations :) (1.27 seconds)

Static code analysis report

Run started:2024-12-04 01:55:45.296556

Test results:
  No issues identified.

Code scanned:
  Total lines of code: 3760
  Total lines skipped (#nosec): 3
  Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Run metrics:
  Total issues (by severity):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
  Total issues (by confidence):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
Files skipped (0):

Copy link
Contributor

@eaudetcobello eaudetcobello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mateoflorido mateoflorido merged commit 5d6af4b into main Dec 4, 2024
61 checks passed
@mateoflorido mateoflorido deleted the KU-2116/upgrade-control-plane branch December 4, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants