Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of tls config #524

Merged
merged 9 commits into from
Sep 13, 2023
Merged

Fix handling of tls config #524

merged 9 commits into from
Sep 13, 2023

Conversation

sed-i
Copy link
Contributor

@sed-i sed-i commented Sep 8, 2023

Issue

TLS config is written and workload restarted, before the cert is written to disk.

Solution

  • Use rock from Add ca-certificates prometheus-rock#30
    • Use the update-ca-certificates machinery
  • Instead of stop(), have _scheme return http and _is_tls_enabled return False until the cert is in place.

Depends on:

Fixes #521

Testing Instructions

bundle: kubernetes
applications:
  ca:
    charm: self-signed-certificates
    channel: edge
    scale: 1
  prom:
    charm: ./prometheus-k8s_ubuntu-20.04-amd64.charm
    series: focal
    scale: 1
    trust: true
    resources:
        prometheus-image: ghcr.io/canonical/prometheus:dev
relations:
- - ca:certificates
  - prom:certificates
bundle: kubernetes
applications:
  ca:
    charm: self-signed-certificates
    channel: edge
    scale: 1
  cat:
    charm: catalogue-k8s
    channel: edge
    series: focal
    scale: 1
  prom:
    charm: ./prometheus-k8s_ubuntu-20.04-amd64.charm
    series: focal
    scale: 1
    trust: true
    resources:
        prometheus-image: ghcr.io/canonical/prometheus:dev
  trfk:
    charm: traefik-k8s
    channel: edge
    series: focal
    scale: 1
relations:
- - cat:catalogue
  - prom:catalogue
- - ca:certificates
  - prom:certificates
- - prom:ingress
  - trfk:ingress-per-unit
- - trfk:certificates
  - ca:certificates

Release Notes

Fix handling of tls config.

@sed-i
Copy link
Contributor Author

sed-i commented Sep 8, 2023

@PietroPasotti, check-libs complains:

Library charms.tempo_k8s.v0.charm_tracing has local changes, cannot be updated.

Haven't checked but might be a LIBPATCH bump issue.

@PietroPasotti
Copy link
Contributor

PietroPasotti commented Sep 11, 2023

@PietroPasotti, check-libs complains:

Library charms.tempo_k8s.v0.charm_tracing has local changes, cannot be updated.

Haven't checked but might be a LIBPATCH bump issue.

given #523 , the lib should be up to date? If the issue persists we could try deleting it manually and re-fetching it? Maybe the linter did some minor change to it?

src/charm.py Outdated Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
src/charm.py Outdated
@@ -539,7 +554,11 @@ def _configure(self, _):

self.remote_write_provider.update_endpoint()
self.grafana_source_provider.update_source(self.external_url)
self.unit.status = ActiveStatus()

if self._is_tls_enabled() and not self.container.exists(CERT_PATH):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear if we abstracted this to a self._is_tls_ready() method.
Enabled but not ready -> push cert to container, then restart. Regardless of how we got there, this should be the reasonable course of action. Or what am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of writing the cert only on cert-changed event, and:

Relation present Cert file on disk Hypothetical is_tls_ready() -> bool
Isolated charm No No No
tls-cert rel just joined Yes No No
on-cert-changed (after rel join) Yes Yes Yes
cert revoked/expired Yes Yes (keep on disk even if expired) Yes
relation departed / broken Yes ? No (how? code ordering...)
any event after rel-broken No No No

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable. I think departed/broken should immediately remove the cert from disk and from that moment onwards is_tls_ready should return false (assuming it checks the file presence synchronously).
Or we refactor to use manifests, but I'm not convinced that will make code ordering less of a headache

@@ -77,4 +77,4 @@ resources:
prometheus-image:
type: oci-image
description: Container image for Prometheus
upstream-source: ghcr.io/canonical/prometheus:latest
upstream-source: ghcr.io/canonical/prometheus:dev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed until canonical/oci-factory#58 is applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a local change that I reverted:

Library charms.tempo_k8s.v0.charm_tracing has local changes, cannot be updated.

@sed-i sed-i merged commit aad107c into main Sep 13, 2023
@sed-i sed-i deleted the feature/fix-tls branch September 13, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prometheus fails to restart after loki upgrade
3 participants