Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics, namely, the # of clusters. #21460

Open
3 tasks done
esn89 opened this issue Jan 12, 2025 · 2 comments
Open
3 tasks done

Missing metrics, namely, the # of clusters. #21460

esn89 opened this issue Jan 12, 2025 · 2 comments
Labels
component:metrics user-issue An issue caused by the user ecosystem, misconfiguration or misuse of Argo CD

Comments

@esn89
Copy link

esn89 commented Jan 12, 2025

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Enabling metrics on Argocd Helm chart, combined with a scrape config, does not give a full view of clusters which are connected to ArgoCd

To Reproduce

Enable metrics in your ArgoCD (helm chart v7.6.12, ArgoCD v2.12.6) deployment:

  source:
    chart: argo-cd
    repoURL: https://argoproj.github.io/argo-helm
    targetRevision: 7.6.12
    helm:
          params:
          - name: controller.metrics.enabled
             value: "true"
          - name: server.metrics.enabled
             value: "true"
          - name: repoServer.metrics.enabled
             value: "true"
          - name: applicationSet.metrics.enabled
             value: "true"

Observe that all the metrics services come up:

kubectl get svc -n argocd | grep metrics
argocd-application-controller-metrics      ClusterIP   172.21.42.224   <none>        8082/TCP            3d6h
argocd-applicationset-controller-metrics   ClusterIP   172.21.35.4     <none>        8080/TCP            3d5h
argocd-repo-server-metrics                 ClusterIP   172.21.42.97    <none>        8084/TCP            3d5h
argocd-server-metrics                      ClusterIP   172.21.40.68    <none>        8083/TCP            3d5h

Create a scrape config job for either VictoriaMetrics or Grafana:

apiVersion: v1
data:
  scrape.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: argocd-metrics
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-application-controller-metrics.argocd.svc.cluster.local:8082
    - job_name: argocd-repo-server
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-repo-server-metrics.argocd.svc.cluster.local:8084
    - job_name: argocd-server-metrics
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-server-metrics.argocd.svc.cluster.local:8083
    - job_name: argocd-applicationset
      metrics_path: /metrics
      static_configs:
      - targets:
        - argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080
kind: ConfigMap
metadata:
  name: victoria-metrics-single-server-scrapeconfig
  namespace: vm

And from here, I expect that the argocd_cluster_info would give me all my clusters (which around 60), however it only shows 10!

I even went in, portforwarded the service:

kubectl port-forward svc/argocd-application-controller-metrics -n argocd 8082:8082 and did a grep:

curl localhost:8082/metrics | grep argocd_cluster_info | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  416k    0  416k    0     0   9.8M      0 --:--:-- --:--:-- --:--:--  9.9M
24

It shows 24, which his still much less than 60.

Expected behavior

I expect to see the correct number of clusters. Which is around 60, but I only see 10.

Screenshots

(image)[https://imgur.com/a/59yfCav]
The kicker here is that, a few days ago, apparently it detected 56 more, but then somehow, it did a -56. And now I am left with 10.

Version

{
    "Version": "v2.12.6+4dab5bd",
    "BuildDate": "2024-10-18T17:39:26Z",
    "GitCommit": "4dab5bd6a60adea12e084ad23519e35b710060a2",
    "GitTreeState": "clean",
    "GoVersion": "go1.22.4",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.4.2 2024-05-22T15:19:38Z",
    "HelmVersion": "v3.15.2+g1a500d5",
    "KubectlVersion": "v0.29.6",
    "JsonnetVersion": "v0.20.0"
}

Logs

Paste any relevant application logs here.
@esn89 esn89 added the bug Something isn't working label Jan 12, 2025
@agaudreault
Copy link
Member

argocd_cluster_info is part of the application-controller, because of sharding, each pod will report the metrics for the cluster they manage. How many replicas do you have for the application-controller?

When using a system like prometheus or victoria metrics, you should see the aggregation of the metrics from all pods.

It would seem that the scrape config use the load balancer endpoint to scrape the metrics, which will return you only 1 single pod. You should look for a scrape config that allows you to scrape every pods.

@agaudreault agaudreault added user-issue An issue caused by the user ecosystem, misconfiguration or misuse of Argo CD component:metrics and removed bug Something isn't working labels Jan 15, 2025
@esn89
Copy link
Author

esn89 commented Jan 17, 2025

Correct, it is part of the application-controller.

And I agree with you that, with the scrape config whether it is prom/vm I should see all of them.

The scrape config is indeed scraping the loadbalancer, which in this case is a service.
However, the service, which selects pods based on labels, should encompass ALL the application controllers, no?

kubectl get ep -n argocd argocd-application-controller-metrics -o wide
NAME                                    ENDPOINTS                                                          AGE
argocd-application-controller-metrics   172.21.74.39:8082,172.21.75.42:8082,172.21.78.7:8082 + 2 more...   8d

See here, there are 5 endpoints behind this. So I would expect that there would be 5 pods (shards) worth of data. However, I am not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:metrics user-issue An issue caused by the user ecosystem, misconfiguration or misuse of Argo CD
Projects
None yet
Development

No branches or pull requests

2 participants