You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chinaran
changed the title
OTel jvm system cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
Nov 25, 2024
@laurit Thank you for your reply.
It does look like jvm.cpu.time is the more appropriate metric.
For non-comparability, does it mean that the percentage returned by getProcessCpuLoad is more real-time, while the container_cpu_usage_seconds_total/container_spec_cpu_quota values are cumulative statistics?
Describe the bug
OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
Steps to reproduce
The Java service is injected into the OTel Agent by opentelemetry-operator and runs for a while.
Expected behavior
The cpu utilization values, captured by the OTel Agent and the k8s cadvisor, are roughly the same.
Actual behavior
OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
Javaagent or library instrumentation version
1.33.5
Environment
JDK: Temurin-21.0.5+11
OS: CentOS Linux 7
start command: java -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/./urandom -jar ./otel-demo-provider-0.0.1-SNAPSHOT.jar
exec to container: java -XshowSettings:system -version:
Additional context
I tried looking at the corresponding source code, not entirely sure if the source location is correct.
process_runtime_jvm_cpu_utilization Definition: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/v1.33.5/instrumentation/runtime-telemetry/runtime-telemetry-java17/library/src/main/java/io/opentelemetry/instrumentation/runtimemetrics/java17/internal/cpu/OverallCpuLoadHandler.java#L23
It is implemented through the getProcessCpuLoad() function:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L327
container_cpu_usage_seconds_total Definition: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/google/cadvisor/metrics/prometheus.go#L164
This is accomplished by reading cpuacct.usage under the container cgroup: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/cpuacct.go#L54
The text was updated successfully, but these errors were encountered: