Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelcol.processor.k8sattributes - Does not appear to provide proper metrics in AWS EKS #6899

Open
jseiser opened this issue May 10, 2024 · 3 comments
Labels
bug Something isn't working needs-attention An issue or PR has been sitting around and needs attention.

Comments

@jseiser
Copy link

jseiser commented May 10, 2024

What's wrong?

When running grafana agent in EKS, the resource attributes assigned to every trace are the attributes of the grafana agent pod which processed the trace, not the attributes for the pod that generated the trace.

Steps to reproduce

  1. Default Helm install on EKS
  2. Configure otelcol.processor.k8sattributes

System information

EKS 1.28

Software version

v0.40.4

Configuration

logging {
      level  = "debug"
      format = "json"
    }

    otelcol.exporter.otlp "to_tempo" {
      client {
        endpoint = "tempo-distributed-distributor.tempo.svc.cluster.local:4317"
        tls {
            insecure             = true
            insecure_skip_verify = true
        }
      }
    }

    otelcol.receiver.otlp "default" {
      debug_metrics {
        disable_high_cardinality_metrics = true
      }
      grpc {
        endpoint = "0.0.0.0:4317"
        include_metadata = true
      }

      http {
        endpoint = "0.0.0.0:4318"
        include_metadata = true
      }
      output {
        traces = [otelcol.processor.resourcedetection.default.input]
      }
    }

    otelcol.receiver.opencensus "default" {
      debug_metrics {
        disable_high_cardinality_metrics = true
      }
      endpoint  = "0.0.0.0:55678"
      transport = "tcp"
      output {
        traces = [otelcol.processor.resourcedetection.default.input]
      }
    }

    otelcol.processor.resourcedetection "default" {
      detectors = ["env", "eks"]

      output {
        traces = [otelcol.processor.k8sattributes.default.input]
      }
    }

    otelcol.processor.k8sattributes "default" {
      pod_association {
          source {
              from = "connection"
          }
      }
      extract {
        label {
          from      = "pod"
          key_regex = "(.*)/(.*)"
          tag_name  = "$1.$2"
        }
        metadata = [
          "k8s.namespace.name",
          "k8s.deployment.name",
          "k8s.statefulset.name",
          "k8s.daemonset.name",
          "k8s.cronjob.name",
          "k8s.job.name",
          "k8s.node.name",
          "k8s.pod.name",
          "k8s.pod.uid",
          "k8s.pod.start_time",
          "container.id",
          "k8s.container.name",
          "container.image.name",
          "container.image.tag",
        ]
      }
      output {
        traces  = [otelcol.processor.memory_limiter.default.input]
      }
    }

    otelcol.processor.memory_limiter "default" {
      check_interval = "5s"

      limit = "512MiB"

      output {
          traces  = [otelcol.processor.tail_sampling.default.input]
      }
    }

    otelcol.processor.tail_sampling "default" {
      policy {
        name = "ignore-health"
        type = "string_attribute"

        string_attribute {
          key                    = "http.url"
          values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
          enabled_regex_matching = true
          invert_match           = true
        }
      }

      policy {
        name = "ignore-health-target"
        type = "string_attribute"

        string_attribute {
          key                    = "http.target"
          values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
          enabled_regex_matching = true
          invert_match           = true
        }
      }


      policy {
        name = "ignore-health-path"
        type = "string_attribute"

        string_attribute {
          key                    = "http.path"
          values                 = ["/health", "/metrics", "/healthz", "/loki/api/v1/push"]
          enabled_regex_matching = true
          invert_match           = true
        }
      }

      policy {
        name = "all-errors"
        type = "status_code"

        status_code {
          status_codes = ["ERROR"]
        }
      }

      policy {
        name = "sample-percent"
        type = "probabilistic"

        probabilistic {
          sampling_percentage = 50
        }
      }

      output {
        traces =  [otelcol.processor.batch.default.input]
      }
    }


    otelcol.processor.batch "default" {
      send_batch_size = 16384
      send_batch_max_size = 0
      timeout = "2s"

      output {
          traces  = [otelcol.exporter.otlp.to_tempo.input]
      }
    }

Logs

{
  "ts": "2024-05-10T16:34:47.759964509Z",
  "level": "debug",
  "msg": "evaluating pod identifier",
  "component": "otelcol.processor.k8sattributes.default",
  "value": [
    {
      "Source": {
        "From": "connection",
        "Name": ""
      },
      "Value": "10.2.11.242"
    },
    {
      "Source": {
        "From": "",
        "Name": ""
      },
      "Value": ""
    },
    {
      "Source": {
        "From": "",
        "Name": ""
      },
      "Value": ""
    },
    {
      "Source": {
        "From": "",
        "Name": ""
      },
      "Value": ""
    }
  ]
}

{
  "ts": "2024-05-10T16:34:47.760017653Z",
  "level": "debug",
  "msg": "getting the pod",
  "component": "otelcol.processor.k8sattributes.default",
  "pod": {
    "Name": "grafana-agent-q9mln",
    "Address": "10.2.11.242",
    "PodUID": "7436418b-2ce8-4be5-a18b-5e27a21699dc",
    "Attributes": {
      "app.kubernetes.io.instance": "grafana-agent",
      "app.kubernetes.io.name": "grafana-agent",
      "k8s.daemonset.name": "grafana-agent",
      "k8s.namespace.name": "grafana-agent",
      "k8s.node.name": "i-04b3dab30856c5a83.us-gov-west-1.compute.internal",
      "k8s.pod.name": "grafana-agent-q9mln",
      "k8s.pod.start_time": "2024-05-10 13:40:33 +0000 UTC",
      "k8s.pod.uid": "7436418b-2ce8-4be5-a18b-5e27a21699dc",
      "linkerd.io.control-plane-ns": "linkerd",
      "linkerd.io.proxy-daemonset": "grafana-agent",
      "linkerd.io.workload-ns": "grafana-agent"
    },
    "StartTime": "2024-05-10T13:40:32Z",
    "Ignore": false,
    "Namespace": "grafana-agent",
    "HostNetwork": false,
    "Containers": {
      "ByID": {
        "01562d68846aff25e8e960139c0eb29c1c33c8c31ef9e94077e4556af0d4b268": {
          "Name": "linkerd-proxy",
          "ImageName": "cache.dev.trex.network/cr.l5d.io/linkerd/proxy",
          "ImageTag": "stable-2.14.10",
          "Statuses": {
            "0": {
              "ContainerID": "01562d68846aff25e8e960139c0eb29c1c33c8c31ef9e94077e4556af0d4b268"
            }
          }
        },
        "69329d069f99f6dfec76e60347d67c6ed1aadb45e41f7c7005f0f2f3a0c4a9fa": {
          "Name": "config-reloader",
          "ImageName": "cache.dev.trex.network/ghcr.io/jimmidyson/configmap-reload",
          "ImageTag": "v0.12.0",
          "Statuses": {
            "0": {
              "ContainerID": "69329d069f99f6dfec76e60347d67c6ed1aadb45e41f7c7005f0f2f3a0c4a9fa"
            }
          }
        },
        "b62f3cc440fb28d41c7e5502ae95f0cb11458996f08a29d110acf042efc6a499": {
          "Name": "grafana-agent",
          "ImageName": "cache.dev.trex.network/docker.io/grafana/agent",
          "ImageTag": "v0.40.4",
          "Statuses": {
            "0": {
              "ContainerID": "b62f3cc440fb28d41c7e5502ae95f0cb11458996f08a29d110acf042efc6a499"
            }
          }
        },
        "bded4fb82cf01750708c35f4f2a602cc53dcdcb23b5f2b5d3628ec971cbc729a": {
          "Name": "linkerd-init",
          "ImageName": "cache.dev.trex.network/cr.l5d.io/linkerd/proxy-init",
          "ImageTag": "v2.2.3",
          "Statuses": {
            "0": {
              "ContainerID": "bded4fb82cf01750708c35f4f2a602cc53dcdcb23b5f2b5d3628ec971cbc729a"
            }
          }
        }
      },
      "ByName": {
        "config-reloader": {
          "Name": "config-reloader",
          "ImageName": "cache.dev.trex.network/ghcr.io/jimmidyson/configmap-reload",
          "ImageTag": "v0.12.0",
          "Statuses": {
            "0": {
              "ContainerID": "69329d069f99f6dfec76e60347d67c6ed1aadb45e41f7c7005f0f2f3a0c4a9fa"
            }
          }
        },
        "grafana-agent": {
          "Name": "grafana-agent",
          "ImageName": "cache.dev.trex.network/docker.io/grafana/agent",
          "ImageTag": "v0.40.4",
          "Statuses": {
            "0": {
              "ContainerID": "b62f3cc440fb28d41c7e5502ae95f0cb11458996f08a29d110acf042efc6a499"
            }
          }
        },
        "linkerd-init": {
          "Name": "linkerd-init",
          "ImageName": "cache.dev.trex.network/cr.l5d.io/linkerd/proxy-init",
          "ImageTag": "v2.2.3",
          "Statuses": {
            "0": {
              "ContainerID": "bded4fb82cf01750708c35f4f2a602cc53dcdcb23b5f2b5d3628ec971cbc729a"
            }
          }
        },
        "linkerd-proxy": {
          "Name": "linkerd-proxy",
          "ImageName": "cache.dev.trex.network/cr.l5d.io/linkerd/proxy",
          "ImageTag": "stable-2.14.10",
          "Statuses": {
            "0": {
              "ContainerID": "01562d68846aff25e8e960139c0eb29c1c33c8c31ef9e94077e4556af0d4b268"
            }
          }
        }
      }
    },
    "DeletedAt": "0001-01-01T00:00:00Z"
  }
}
@jseiser jseiser added the bug Something isn't working label May 10, 2024
@jseiser
Copy link
Author

jseiser commented May 10, 2024

Note in the above logs, all those attributes are grafana agent, even though the trace really came a pod called console in a namespace of qa1-dev

Since all traces have the grafana agent attributes, it makes searching for a specific environments traces near impossible.

If i remove this, i get no attributes

      pod_association {
          source {
              from = "connection"
          }
      }

This feels similar to this issue: open-telemetry/opentelemetry-collector-contrib#29630

But there doesnt seem to be a solution

@jseiser
Copy link
Author

jseiser commented Jun 6, 2024

Wanted to note, if we actually configure all of the ENV vars for OTLP on the pod, things start to work, but it actuallys/removes any linking between the pods and linkerd.

Have yet to find a way to run grafana agent in eks, that can actually consolidate your traces.

Copy link
Contributor

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@github-actions github-actions bot added the needs-attention An issue or PR has been sitting around and needs attention. label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-attention An issue or PR has been sitting around and needs attention.
Projects
None yet
Development

No branches or pull requests

1 participant