Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NH-57036: Fix metrics relation to Nodes on Fargate #381

Merged
merged 8 commits into from
Oct 9, 2023

Conversation

pstranak-sw
Copy link
Contributor

@pstranak-sw pstranak-sw commented Sep 20, 2023

  • Don't use 'kubernetes_io_hostname' on Fargate Nodes
  • Use 'service.instance.id' from Resource attributes instead of DataPoint attributes
  • Remove nonexistent metrics from mocked data

@pstranak-sw pstranak-sw requested review from a team September 20, 2023 08:03
- set(attributes["k8s.node.name"], attributes["node"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["k8s.node.name"] == nil
# "kubernetes_io_hostname", unlike "service.instance.id", provides a nice Node name in environments like local Docker, but for Fargate, its value is different from the other attributes
- set(attributes["k8s.node.name"], attributes["kubernetes_io_hostname"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["eks_amazonaws_com_compute_type"] != "fargate" and attributes["k8s.node.name"] == nil
# use "service.instance.id" for Node name when the above attributes are not available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it's weirs aws-otel-collector is taking node name directly from kubernetes_io_hostname for fargate

https://github.com/aws-observability/aws-otel-collector/blob/main/deployment-template/eks/otel-fargate-container-insights.yaml#L103C24-L103C46

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Peter I do understand this needs.

etichy
etichy previously approved these changes Sep 21, 2023
- set(attributes["k8s.node.name"], attributes["node"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["k8s.node.name"] == nil
# "kubernetes_io_hostname", unlike "service.instance.id", provides a nice Node name in environments like local Docker, but for Fargate, its value is different from the other attributes
- set(attributes["k8s.node.name"], attributes["kubernetes_io_hostname"]) where IsMatch(metric.name, "^(container_.*)|(kube_node_.*)|(kube_pod_info)|(kube_pod_container_resource_requests)|(kube_pod_container_resource_limits)|(kube_pod_init_container_resource_requests)|(kube_pod_init_container_resource_limits)$") == true and attributes["eks_amazonaws_com_compute_type"] != "fargate" and attributes["k8s.node.name"] == nil
# use "service.instance.id" for Node name when the above attributes are not available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Peter I do understand this needs.

* Don't use 'kubernetes_io_hostname' on Fargate Nodes
* Use 'service.instance.id' from Resource attributes instead of DataPoint attributes
* Remove nonexistent metrics from mocked data
@pstranak-sw pstranak-sw force-pushed the feature/NH-57036-fargateNodeNames branch from 3ffaffc to 42278c2 Compare October 4, 2023 07:35
@@ -199,11 +199,6 @@ kube_pod_container_resource_limits{container="test-container",endpoint="http",in
kube_pod_container_resource_limits{container="test-container",endpoint="http",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="memory",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="byte",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 3.221225472e+09 1675856675021
kube_pod_container_resource_limits{container="test-container",endpoint="tcp-model",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="memory",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="byte",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 3.221225472e+09 1675856675021
kube_pod_container_resource_limits{container="test-container",endpoint="tcp-model",instance="test-node",job="test-job",namespace="test-namespace",node="test-node",pod="test-pod",resource="cpu",service="test-service",uid="bafeef2c-1292-4a5e-a92c-d709480b04b6",unit="core",prometheus="prometheus-system/kube-prometheus-kube-prome-prometheus",prometheus_replica="prometheus-kube-prometheus-kube-prome-prometheus-0"} 0.1 1675856675021
# TYPE kube_pod_container_resource_limits_cpu_cores untyped
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics existed only in kube-state-metrics v2.0.0-alpha and were removed in v2.0.0-alpha.2. I'm not even sure how they appeared in the mocked data.

@@ -198,17 +211,20 @@ def merge_resources(existing_resource, new_resource):
existing_scopes.append(new_scope)

def custom_json_merge(result, new_json):
new_resources = {resource_sorting_key(resource): resource for resource in new_json["resourceMetrics"]}
Copy link
Contributor

@gantrior gantrior Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pstranak-sw This line (and few similar above) was main source of instability. Basically what it did was that it create Map and in case there was multiple resources in the same json with the same key, it did not merge them, but override them completely. So instead of creating Map I now create array of Tuples so not overriding anything

@gantrior gantrior force-pushed the feature/NH-57036-fargateNodeNames branch from 3f5ed9c to 03e66d5 Compare October 6, 2023 15:51
@gantrior gantrior merged commit 3616f74 into master Oct 9, 2023
@gantrior gantrior deleted the feature/NH-57036-fargateNodeNames branch October 9, 2023 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants