Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor #12793

Open
chinaran opened this issue Nov 25, 2024 · 2 comments
Labels
bug Something isn't working needs triage New issue that requires triage

Comments

@chinaran
Copy link

chinaran commented Nov 25, 2024

Describe the bug

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor

Steps to reproduce

The Java service is injected into the OTel Agent by opentelemetry-operator and runs for a while.

Expected behavior

The cpu utilization values, captured by the OTel Agent and the k8s cadvisor, are roughly the same.

Actual behavior

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
image

Javaagent or library instrumentation version

1.33.5

Environment

JDK: Temurin-21.0.5+11
OS: CentOS Linux 7


start command: java -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/./urandom -jar ./otel-demo-provider-0.0.1-SNAPSHOT.jar

exec to container: java -XshowSettings:system -version:

Operating System Metrics:
    Provider: cgroupv1
    Effective CPU Count: 1
    CPU Period: 100000us
    CPU Quota: 50000us
    CPU Shares: 307us
    List of Processors, 16 total: 
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
    List of Effective Processors, 16 total: 
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
    List of Memory Nodes, 1 total: 
    0 
    List of Available Memory Nodes, 1 total: 
    0 
    Memory Limit: 1000.00M
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: 1000.00M
    Maximum Processes Limit: Unlimited

openjdk version "21.0.5" 2024-10-15 LTS
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)

Additional context

I tried looking at the corresponding source code, not entirely sure if the source location is correct.

process_runtime_jvm_cpu_utilization Definition: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/v1.33.5/instrumentation/runtime-telemetry/runtime-telemetry-java17/library/src/main/java/io/opentelemetry/instrumentation/runtimemetrics/java17/internal/cpu/OverallCpuLoadHandler.java#L23

It is implemented through the getProcessCpuLoad() function:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L327


container_cpu_usage_seconds_total Definition: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/google/cadvisor/metrics/prometheus.go#L164

This is accomplished by reading cpuacct.usage under the container cgroup: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/cpuacct.go#L54

@chinaran chinaran added bug Something isn't working needs triage New issue that requires triage labels Nov 25, 2024
@chinaran chinaran changed the title OTel jvm system cpu utilization metrics values are higher than cpu values captured by k8s cadvisor OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor Nov 25, 2024
@laurit
Copy link
Contributor

laurit commented Nov 26, 2024

As you probably have noticed yourself these metrics are not comparable. getProcessCpuLoad returns a percentage and the kubernetes metric returns cpu time in seconds.Try https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmcputime metric. You may need to update you agent or enable the stable semconv support if you insist on using the 1.x version of the agent.

@laurit laurit added the needs author feedback Waiting for additional feedback from the author label Nov 26, 2024
@chinaran
Copy link
Author

@laurit Thank you for your reply.
It does look like jvm.cpu.time is the more appropriate metric.
For non-comparability, does it mean that the percentage returned by getProcessCpuLoad is more real-time, while the container_cpu_usage_seconds_total/container_spec_cpu_quota values are cumulative statistics?

@github-actions github-actions bot removed the needs author feedback Waiting for additional feedback from the author label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

2 participants