Missing metrics, namely, the # of clusters. #21460

esn89 · 2025-01-12T03:20:56Z

Checklist:

I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
I've included steps to reproduce the bug.
I've pasted the output of argocd version.

Describe the bug

Enabling metrics on Argocd Helm chart, combined with a scrape config, does not give a full view of clusters which are connected to ArgoCd

To Reproduce

Enable metrics in your ArgoCD (helm chart v7.6.12, ArgoCD v2.12.6) deployment:

  source:
    chart: argo-cd
    repoURL: https://argoproj.github.io/argo-helm
    targetRevision: 7.6.12
    helm:
          params:
          - name: controller.metrics.enabled
             value: "true"
          - name: server.metrics.enabled
             value: "true"
          - name: repoServer.metrics.enabled
             value: "true"
          - name: applicationSet.metrics.enabled
             value: "true"

Observe that all the metrics services come up:

kubectl get svc -n argocd | grep metrics
argocd-application-controller-metrics      ClusterIP   172.21.42.224   <none>        8082/TCP            3d6h
argocd-applicationset-controller-metrics   ClusterIP   172.21.35.4     <none>        8080/TCP            3d5h
argocd-repo-server-metrics                 ClusterIP   172.21.42.97    <none>        8084/TCP            3d5h
argocd-server-metrics                      ClusterIP   172.21.40.68    <none>        8083/TCP            3d5h

Create a scrape config job for either VictoriaMetrics or Grafana:

apiVersion: v1
data:
  scrape.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: argocd-metrics
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-application-controller-metrics.argocd.svc.cluster.local:8082
    - job_name: argocd-repo-server
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-repo-server-metrics.argocd.svc.cluster.local:8084
    - job_name: argocd-server-metrics
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-server-metrics.argocd.svc.cluster.local:8083
    - job_name: argocd-applicationset
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080
kind: ConfigMap
metadata:
  name: victoria-metrics-single-server-scrapeconfig
  namespace: vm

And from here, I expect that the argocd_cluster_info would give me all my clusters (which around 60), however it only shows 10!

I even went in, portforwarded the service:

kubectl port-forward svc/argocd-application-controller-metrics -n argocd 8082:8082 and did a grep:

curl localhost:8082/metrics | grep argocd_cluster_info | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  416k    0  416k    0     0   9.8M      0 --:--:-- --:--:-- --:--:--  9.9M
24

It shows 24, which his still much less than 60.

Expected behavior

I expect to see the correct number of clusters. Which is around 60, but I only see 10.

Screenshots

(image)[https://imgur.com/a/59yfCav]
The kicker here is that, a few days ago, apparently it detected 56 more, but then somehow, it did a -56. And now I am left with 10.

Version

{
    "Version": "v2.12.6+4dab5bd",
    "BuildDate": "2024-10-18T17:39:26Z",
    "GitCommit": "4dab5bd6a60adea12e084ad23519e35b710060a2",
    "GitTreeState": "clean",
    "GoVersion": "go1.22.4",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.4.2 2024-05-22T15:19:38Z",
    "HelmVersion": "v3.15.2+g1a500d5",
    "KubectlVersion": "v0.29.6",
    "JsonnetVersion": "v0.20.0"
}

Logs

Paste any relevant application logs here.

The text was updated successfully, but these errors were encountered:

agaudreault · 2025-01-15T16:53:16Z

argocd_cluster_info is part of the application-controller, because of sharding, each pod will report the metrics for the cluster they manage. How many replicas do you have for the application-controller?

When using a system like prometheus or victoria metrics, you should see the aggregation of the metrics from all pods.

It would seem that the scrape config use the load balancer endpoint to scrape the metrics, which will return you only 1 single pod. You should look for a scrape config that allows you to scrape every pods.

esn89 · 2025-01-17T04:39:59Z

Correct, it is part of the application-controller.

And I agree with you that, with the scrape config whether it is prom/vm I should see all of them.

The scrape config is indeed scraping the loadbalancer, which in this case is a service.
However, the service, which selects pods based on labels, should encompass ALL the application controllers, no?

kubectl get ep -n argocd argocd-application-controller-metrics -o wide
NAME                                    ENDPOINTS                                                          AGE
argocd-application-controller-metrics   172.21.74.39:8082,172.21.75.42:8082,172.21.78.7:8082 + 2 more...   8d

See here, there are 5 endpoints behind this. So I would expect that there would be 5 pods (shards) worth of data. However, I am not.

esn89 added the bug Something isn't working label Jan 12, 2025

agaudreault added user-issue An issue caused by the user ecosystem, misconfiguration or misuse of Argo CD component:metrics and removed bug Something isn't working labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing metrics, namely, the # of clusters. #21460

Missing metrics, namely, the # of clusters. #21460

esn89 commented Jan 12, 2025

agaudreault commented Jan 15, 2025

esn89 commented Jan 17, 2025

Missing metrics, namely, the # of clusters. #21460

Missing metrics, namely, the # of clusters. #21460

Comments

esn89 commented Jan 12, 2025

agaudreault commented Jan 15, 2025

esn89 commented Jan 17, 2025