Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][exporter] Process metrics still exist when the process is gone #106

Closed
3 tasks done
caotangdaiduong opened this issue Nov 22, 2023 · 5 comments · Fixed by #107
Closed
3 tasks done

[BUG][exporter] Process metrics still exist when the process is gone #106

caotangdaiduong opened this issue Nov 22, 2023 · 5 comments · Fixed by #107
Assignees
Labels
bug Something isn't working exporter Something related to the metrics exporter

Comments

@caotangdaiduong
Copy link

caotangdaiduong commented Nov 22, 2023

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.3.1

Operating system and version

Ubuntu 20.04.4 LTS

NVIDIA driver version

510.47.03

NVIDIA-SMI

Wed Nov 22 16:23:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+

Python environment

3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] linux
nvidia-ml-py==12.535.133
nvitop==1.3.1
nvitop-exporter==1.3.1

Problem description

nvitop-exporter cache value

Metric values are retained and not refreshed

Steps to Reproduce

The Python snippets (if any):

Command lines:

Traceback

No response

Logs

No response

Expected behavior

No response

Additional context

No response

@caotangdaiduong caotangdaiduong added the bug Something isn't working label Nov 22, 2023
@caotangdaiduong caotangdaiduong changed the title [BUG] [BUG] Metric values are retained and not refreshed Nov 22, 2023
@caotangdaiduong caotangdaiduong changed the title [BUG] Metric values are retained and not refreshed [BUG] Metric values are not refreshed Nov 22, 2023
@XuehaiPan XuehaiPan added the exporter Something related to the metrics exporter label Nov 22, 2023
@XuehaiPan
Copy link
Owner

Metric values are retained and not refreshed

Hi @caotangdaiduong, do you set up a prometheus service to retrieve the latest metrics automatically?

@caotangdaiduong
Copy link
Author

And currently I'm using cron to restart the service every minute, this may sound crazy but the metric is completely accurate.

@XuehaiPan
Copy link
Owner

I know by default nvitop default interval is 1s but I have added the interval option with different values like 15s, 30s but the result is still the same.

@caotangdaiduong I can see the metrics are updating on my side. I'm running watch --differences:

watch --differences 'curl -s http://127.0.0.1:8000/metrics'

This is similar to pushgateway, it only updates the value with the last key name and if there is a new key, there will be new values. I think it's similar to the case with many different values (in my case, every time the PID, index is changed, it creates a new one, and the old PID, index is still there).

The metrics for GPU processes are actively updated on my side.

I can confirm if the GPU process is gone, the gauge keys still exist. Do you mean you want to remove these keys if the corresponding processes are gone?

@XuehaiPan XuehaiPan changed the title [BUG] Metric values are not refreshed [BUG] Process metrics still exist when the process is gone Nov 22, 2023
@XuehaiPan XuehaiPan changed the title [BUG] Process metrics still exist when the process is gone [BUG][exporter] Process metrics still exist when the process is gone Nov 22, 2023
@XuehaiPan
Copy link
Owner

  • You will see that both the old and new PIDs exist when calling curl to the exporter

@caotangdaiduong I can confirm this and opened a PR #107 to resolve this. You can try it via:

python3 -m pip install "git+https://github.com/XuehaiPan/nvitop.git@exporter-remove-gone-process#egg=nvitop-exporter&subdirectory=nvitop-exporter"

@caotangdaiduong
Copy link
Author

Hi @XuehaiPan

Thanks for your efforts, I tested it and it works as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter Something related to the metrics exporter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants