-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NH-57036: Fix metrics relation to Nodes on Fargate #381
Conversation
pstranak-sw
commented
Sep 20, 2023
•
edited
Loading
edited
- Don't use 'kubernetes_io_hostname' on Fargate Nodes
- Use 'service.instance.id' from Resource attributes instead of DataPoint attributes
- Remove nonexistent metrics from mocked data
- set(attributes["k8s.node.name"], attributes["node"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["k8s.node.name"] == nil | ||
# "kubernetes_io_hostname", unlike "service.instance.id", provides a nice Node name in environments like local Docker, but for Fargate, its value is different from the other attributes | ||
- set(attributes["k8s.node.name"], attributes["kubernetes_io_hostname"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["eks_amazonaws_com_compute_type"] != "fargate" and attributes["k8s.node.name"] == nil | ||
# use "service.instance.id" for Node name when the above attributes are not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's weirs aws-otel-collector is taking node name directly from kubernetes_io_hostname for fargate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with Peter I do understand this needs.
- set(attributes["k8s.node.name"], attributes["node"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["k8s.node.name"] == nil | ||
# "kubernetes_io_hostname", unlike "service.instance.id", provides a nice Node name in environments like local Docker, but for Fargate, its value is different from the other attributes | ||
- set(attributes["k8s.node.name"], attributes["kubernetes_io_hostname"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["eks_amazonaws_com_compute_type"] != "fargate" and attributes["k8s.node.name"] == nil | ||
# use "service.instance.id" for Node name when the above attributes are not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with Peter I do understand this needs.
bc1ba77
to
3ffaffc
Compare
* Don't use 'kubernetes_io_hostname' on Fargate Nodes * Use 'service.instance.id' from Resource attributes instead of DataPoint attributes * Remove nonexistent metrics from mocked data
3ffaffc
to
42278c2
Compare
@@ -199,11 +199,6 @@ kube_pod_container_resource_limits{container="test-container",endpoint="http",in | |||
kube_pod_container_resource_limits{container="test-container",endpoint="http",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="memory",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="byte",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 3.221225472e+09 1675856675021 | |||
kube_pod_container_resource_limits{container="test-container",endpoint="tcp-model",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="memory",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="byte",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 3.221225472e+09 1675856675021 | |||
kube_pod_container_resource_limits{container="test-container",endpoint="tcp-model",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="cpu",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="core",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 0.1 1675856675021 | |||
# TYPE kube_pod_container_resource_limits_cpu_cores untyped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics existed only in kube-state-metrics
v2.0.0-alpha
and were removed in v2.0.0-alpha.2
. I'm not even sure how they appeared in the mocked data.
…s are more stable
@@ -198,17 +211,20 @@ def merge_resources(existing_resource, new_resource): | |||
existing_scopes.append(new_scope) | |||
|
|||
def custom_json_merge(result, new_json): | |||
new_resources = {resource_sorting_key(resource): resource for resource in new_json["resourceMetrics"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstranak-sw This line (and few similar above) was main source of instability. Basically what it did was that it create Map and in case there was multiple resources in the same json with the same key, it did not merge them, but override them completely. So instead of creating Map I now create array of Tuples so not overriding anything
3f5ed9c
to
03e66d5
Compare