-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG][exporter] Process metrics still exist when the process is gone #106
Comments
Hi @caotangdaiduong, do you set up a |
And currently I'm using cron to restart the service every minute, this may sound crazy but the metric is completely accurate. |
@caotangdaiduong I can see the metrics are updating on my side. I'm running watch --differences 'curl -s http://127.0.0.1:8000/metrics'
The metrics for GPU processes are actively updated on my side. I can confirm if the GPU process is gone, the gauge keys still exist. Do you mean you want to remove these keys if the corresponding processes are gone? |
@caotangdaiduong I can confirm this and opened a PR #107 to resolve this. You can try it via: python3 -m pip install "git+https://github.com/XuehaiPan/nvitop.git@exporter-remove-gone-process#egg=nvitop-exporter&subdirectory=nvitop-exporter" |
Hi @XuehaiPan Thanks for your efforts, I tested it and it works as expected |
Required prerequisites
What version of nvitop are you using?
1.3.1
Operating system and version
Ubuntu 20.04.4 LTS
NVIDIA driver version
510.47.03
NVIDIA-SMI
Python environment
3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] linux
nvidia-ml-py==12.535.133
nvitop==1.3.1
nvitop-exporter==1.3.1
Problem description
nvitop-exporter cache value
Metric values are retained and not refreshed
Steps to Reproduce
The Python snippets (if any):
Command lines:
Traceback
No response
Logs
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: