Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-k8s [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate #642

Closed
mcfly722 opened this issue Oct 9, 2024 · 4 comments

Comments

@mcfly722
Copy link

mcfly722 commented Oct 9, 2024

Bug Description

Hello,
we faced with prometheus-k8s juju charm exception after some period of time. Prometheus-k8s application cannot start and tries again and again with same exception.

Exception stacktrace (below) shows that it is a query from prometheus-k8s container to kubernetes cluster to obtain pv/pvc capacity.
Looks like it could'nt make this API request, because k8s cluster cert is not trusted.

We suppose that microk8s could rotate this k8s api certificate, but can't figure out how to resolve this issue.
The best solution would be set "skip tls verify" for all k8s requests, but currently there are not such configuration parameters for this charm.

Cloud you, please, tell us, how to update K8S api site certificate in this charm and how to escape this problem in the future?

To Reproduce

Currently we don't know how to reproduce this issue. We suppose it could happen when microk8s rotates it's API certificates.

Environment

root@jjclient1:~# juju status --relations
Model  Controller            Cloud/Region            Version  SLA          Timestamp
cos    controller-openstack  k8s-microk8s/localhost  3.4.5    unsupported  19:25:42Z

SAAS      Status  Store                 URL
ceph-mon  active  controller-openstack  se-moshkarin/ceph-ldc-1.ceph-mon

App                           Version  Status   Scale  Charm                         Channel        Rev  Address         Exposed  Message
alertmanager                           unknown      0  alertmanager-k8s              latest/stable  125  10.152.183.61   no
catalogue                              active       1  catalogue-k8s                 latest/stable   59  10.152.183.130  no
grafana                       9.5.3    active       1  grafana-k8s                   latest/stable  117  10.152.183.76   no
loki                          2.9.6    active       1  loki-k8s                      latest/stable  160  10.152.183.60   no
prometheus                    2.52.0   waiting      1  prometheus-k8s                latest/stable  209  10.152.183.234  no       waiting for units to settle down
prometheus-scrape-target-k8s  n/a      active       1  prometheus-scrape-target-k8s  latest/stable   34  10.152.183.120  no
traefik                       v2.11.0  waiting    1/0  traefik-k8s                   latest/stable  194  10.152.183.86   no       installing agent

Unit                             Workload  Agent  Address      Ports  Message
catalogue/0*                     active    idle   10.1.33.100
grafana/0*                       active    idle   10.1.33.91
loki/0*                          active    idle   10.1.33.90
prometheus-scrape-target-k8s/0*  active    idle   10.1.33.82
prometheus/0*                    error     idle   10.1.33.104         hook failed: "upgrade-charm"
traefik/0*                       error     idle   10.1.33.88          hook failed: "ingress-relation-broken" for catalogue:ingress

Offer       Application  Charm           Rev  Connected  Endpoint              Interface                Role
grafana     grafana      grafana-k8s     117  2/2        grafana-dashboard     grafana_dashboard        requirer
loki        loki         loki-k8s        160  2/2        logging               loki_push_api            provider
prometheus  prometheus   prometheus-k8s  209  2/2        metrics-endpoint      prometheus_scrape        requirer
                                                         receive-remote-write  prometheus_remote_write  provider

Integration provider                           Requirer                     Interface              Type     Message
alertmanager:alerting                          loki:alertmanager            alertmanager_dispatch  regular
alertmanager:alerting                          prometheus:alertmanager      alertmanager_dispatch  regular
alertmanager:grafana-dashboard                 grafana:grafana-dashboard    grafana_dashboard      regular
alertmanager:grafana-source                    grafana:grafana-source       grafana_datasource     regular
alertmanager:replicas                          alertmanager:replicas        alertmanager_replica   peer
alertmanager:self-metrics-endpoint             prometheus:metrics-endpoint  prometheus_scrape      regular
catalogue:catalogue                            alertmanager:catalogue       catalogue              regular
catalogue:catalogue                            grafana:catalogue            catalogue              regular
catalogue:catalogue                            prometheus:catalogue         catalogue              regular
catalogue:replicas                             catalogue:replicas           catalogue_replica      peer
ceph-mon:metrics-endpoint                      prometheus:metrics-endpoint  prometheus_scrape      regular
grafana:grafana                                grafana:grafana              grafana_peers          peer
grafana:metrics-endpoint                       prometheus:metrics-endpoint  prometheus_scrape      regular
grafana:replicas                               grafana:replicas             grafana_replicas       peer
loki:grafana-dashboard                         grafana:grafana-dashboard    grafana_dashboard      regular
loki:grafana-source                            grafana:grafana-source       grafana_datasource     regular
loki:metrics-endpoint                          prometheus:metrics-endpoint  prometheus_scrape      regular
loki:replicas                                  loki:replicas                loki_replica           peer
prometheus-scrape-target-k8s:metrics-endpoint  prometheus:metrics-endpoint  prometheus_scrape      regular
prometheus:grafana-dashboard                   grafana:grafana-dashboard    grafana_dashboard      regular
prometheus:grafana-source                      grafana:grafana-source       grafana_datasource     regular
prometheus:prometheus-peers                    prometheus:prometheus-peers  prometheus_peers       peer
traefik:ingress                                alertmanager:ingress         ingress                regular
traefik:ingress                                catalogue:ingress            ingress                regular
traefik:ingress-per-unit                       loki:ingress                 ingress_per_unit       regular
traefik:ingress-per-unit                       prometheus:ingress           ingress_per_unit       regular
traefik:metrics-endpoint                       prometheus:metrics-endpoint  prometheus_scrape      regular
traefik:peers                                  traefik:peers                traefik_peers          peer
traefik:traefik-route                          grafana:ingress              traefik_route          regular
root@microk8s-0:~# microk8s.kubectl get all -A
NAMESPACE        NAME                                           READY   STATUS    RESTARTS       AGE
cos              pod/catalogue-0                                2/2     Running   0              9h
cos              pod/grafana-0                                  3/3     Running   0              9h
cos              pod/loki-0                                     3/3     Running   0              9h
cos              pod/modeloperator-c47c94774-xg76v              1/1     Running   4 (21d ago)    41d
cos              pod/prometheus-0                               1/2     Running   0              8h
cos              pod/prometheus-scrape-target-k8s-0             1/1     Running   6 (21d ago)    40d
cos              pod/traefik-0                                  2/2     Running   10 (21d ago)   41d
kube-system      pod/calico-kube-controllers-6588bbfbdb-nbtlh   1/1     Running   0              14d
kube-system      pod/calico-node-2q74k                          1/1     Running   0              14d
kube-system      pod/calico-node-89nr7                          1/1     Running   0              14d
kube-system      pod/calico-node-f2psb                          1/1     Running   0              14d
kube-system      pod/coredns-864597b5fd-h7vlh                   1/1     Running   2 (21d ago)    48d
kube-system      pod/hostpath-provisioner-7df77bc496-xrd26      1/1     Running   1 (21d ago)    22d
metallb-system   pod/controller-5f7bb57799-mjltn                1/1     Running   1 (21d ago)    22d
metallb-system   pod/speaker-bbm9f                              1/1     Running   1 (21d ago)    22d
metallb-system   pod/speaker-hg5h8                              1/1     Running   9 (21d ago)    49d
metallb-system   pod/speaker-p5hzj                              1/1     Running   4 (21d ago)    49d

NAMESPACE        NAME                                             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
cos              service/alertmanager                             ClusterIP      10.152.183.61    <none>        65535/TCP                    41d
cos              service/alertmanager-endpoints                   ClusterIP      None             <none>        <none>                       41d
cos              service/catalogue                                ClusterIP      10.152.183.130   <none>        65535/TCP,80/TCP             41d
cos              service/catalogue-endpoints                      ClusterIP      None             <none>        <none>                       41d
cos              service/grafana                                  ClusterIP      10.152.183.76    <none>        65535/TCP,3000/TCP           41d
cos              service/grafana-endpoints                        ClusterIP      None             <none>        <none>                       41d
cos              service/loki                                     ClusterIP      10.152.183.60    <none>        65535/TCP,3100/TCP           41d
cos              service/loki-endpoints                           ClusterIP      None             <none>        <none>                       41d
cos              service/modeloperator                            ClusterIP      10.152.183.67    <none>        17071/TCP                    41d
cos              service/prometheus                               ClusterIP      10.152.183.234   <none>        65535/TCP,9090/TCP           41d
cos              service/prometheus-endpoints                     ClusterIP      None             <none>        <none>                       41d
cos              service/prometheus-scrape-target-k8s             ClusterIP      10.152.183.120   <none>        65535/TCP                    40d
cos              service/prometheus-scrape-target-k8s-endpoints   ClusterIP      None             <none>        <none>                       40d
cos              service/traefik                                  ClusterIP      10.152.183.86    <none>        65535/TCP                    41d
cos              service/traefik-endpoints                        ClusterIP      None             <none>        <none>                       41d
cos              service/traefik-lb                               LoadBalancer   10.152.183.141   10.88.56.26   80:32432/TCP,443:30964/TCP   22d
default          service/kubernetes                               ClusterIP      10.152.183.1     <none>        443/TCP                      49d
kube-system      service/kube-dns                                 ClusterIP      10.152.183.10    <none>        53/UDP,53/TCP,9153/TCP       49d
metallb-system   service/webhook-service                          ClusterIP      10.152.183.123   <none>        443/TCP                      49d

NAMESPACE        NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system      daemonset.apps/calico-node   3         3         3       3            3           kubernetes.io/os=linux   49d
metallb-system   daemonset.apps/speaker       3         3         3       3            3           kubernetes.io/os=linux   49d

NAMESPACE        NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
cos              deployment.apps/modeloperator             1/1     1            1           41d
kube-system      deployment.apps/calico-kube-controllers   1/1     1            1           49d
kube-system      deployment.apps/coredns                   1/1     1            1           49d
kube-system      deployment.apps/hostpath-provisioner      1/1     1            1           49d
metallb-system   deployment.apps/controller                1/1     1            1           49d

NAMESPACE        NAME                                                 DESIRED   CURRENT   READY   AGE
cos              replicaset.apps/modeloperator-c47c94774              1         1         1       41d
kube-system      replicaset.apps/calico-kube-controllers-55fff87cdf   0         0         0       43d
kube-system      replicaset.apps/calico-kube-controllers-59654dbbf    0         0         0       14d
kube-system      replicaset.apps/calico-kube-controllers-5b8d7465f6   0         0         0       30d
kube-system      replicaset.apps/calico-kube-controllers-5d86d446cf   0         0         0       44d
kube-system      replicaset.apps/calico-kube-controllers-6588bbfbdb   1         1         1       14d
kube-system      replicaset.apps/calico-kube-controllers-66c5c6884d   0         0         0       14d
kube-system      replicaset.apps/calico-kube-controllers-684f9474b5   0         0         0       43d
kube-system      replicaset.apps/calico-kube-controllers-6cbb8946d5   0         0         0       49d
kube-system      replicaset.apps/calico-kube-controllers-74567f7d84   0         0         0       30d
kube-system      replicaset.apps/calico-kube-controllers-75cdd899b7   0         0         0       48d
kube-system      replicaset.apps/calico-kube-controllers-8658c8f5d7   0         0         0       30d
kube-system      replicaset.apps/coredns-864597b5fd                   1         1         1       49d
kube-system      replicaset.apps/hostpath-provisioner-7df77bc496      1         1         1       49d
metallb-system   replicaset.apps/controller-5f7bb57799                1         1         1       49d

NAMESPACE   NAME                                            READY   AGE
cos         statefulset.apps/alertmanager                   0/0     41d
cos         statefulset.apps/catalogue                      1/1     41d
cos         statefulset.apps/grafana                        1/1     41d
cos         statefulset.apps/loki                           1/1     41d
cos         statefulset.apps/prometheus                     0/1     41d
cos         statefulset.apps/prometheus-scrape-target-k8s   1/1     40d
cos         statefulset.apps/traefik                        1/1     41d

Relevant log output

Exception:

juju debug-log --include=prometheus/0

unit-prometheus-0: 19:20:19 ERROR unit.prometheus/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_transports/default.py", line 233, in handle_request
    resp = self._pool.handle_request(req)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_sync/connection_pool.py", line 216, in handle_request
    raise exc from None
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_sync/connection_pool.py", line 196, in handle_request
    response = connection.handle_request(
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_sync/connection.py", line 99, in handle_request
    raise exc
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_sync/connection.py", line 76, in handle_request
    stream = self._connect(request)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_sync/connection.py", line 154, in _connect
    stream = stream.start_tls(**kwargs)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_backends/sync.py", line 168, in start_tls
    raise exc
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./src/charm.py", line 1083, in <module>
    main(PrometheusCharm)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/ops/main.py", line 513, in _emit
    self.framework.reemit()
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/ops/framework.py", line 870, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 546, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "./src/charm.py", line 530, in _configure
    if self.resources_patch.is_ready():
  File "/var/lib/juju/agents/unit-prometheus-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 546, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-prometheus-0/charm/lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py", line 550, in is_ready
    return self.patcher.is_ready(self._pod, resource_reqs)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py", line 397, in is_ready
    self.get_templated(),
  File "/var/lib/juju/agents/unit-prometheus-0/charm/lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py", line 371, in get_templated
    statefulset = self.client.get(
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/lightkube/core/client.py", line 140, in get
    return self._client.request("get", res=res, name=name, namespace=namespace)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/lightkube/core/generic_client.py", line 244, in request
    resp = self.send(req)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/lightkube/core/generic_client.py", line 216, in send
    return self._client.send(req, stream=stream)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_client.py", line 914, in send
    response = self._send_handling_auth(
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_client.py", line 1015, in _send_single_request
    response = transport.handle_request(request)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_transports/default.py", line 233, in handle_request
    resp = self._pool.handle_request(req)
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/var/lib/juju/agents/unit-prometheus-0/charm/venv/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)
unit-prometheus-0: 19:20:20 ERROR juju.worker.uniter.operation hook "upgrade-charm" (via hook dispatching script: dispatch) failed: exit status 1


### Additional context

_No response_
@Abuelodelanada
Copy link
Contributor

Feels like it is a Juju issue: https://bugs.launchpad.net/juju/+bug/2087557

From a charm perspective it is probably ok to go into error state: https://discourse.charmhub.io/t/its-probably-ok-for-a-unit-to-go-into-error-state/13022

I know @slapcat already faced this issue in the past. Perhaps he can helps here.

@slapcat
Copy link

slapcat commented Nov 8, 2024

Yes, we did encounter the same issue and, as @mcfly722 suspects, the issue was the cert on the kube-apiserver. Microk8s has a built-in command to refresh the certs. You'll need to do it for the ca.crt.

Be sure to read the notes at the bottom of the command documentation carefully. This operation is not safe to run with active workloads and will require you to rejoin any other microk8s nodes to the cluster.

@jack-w-shaw
Copy link
Member

jack-w-shaw commented Nov 11, 2024

Hi there, Jack from the Juju team

Could you try running juju update-k8s microk8s?

That may resolve your issue

@Abuelodelanada
Copy link
Contributor

Closing this, because this is a Juju issue they need to fix:

https://bugs.launchpad.net/juju/+bug/2087557/comments/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants