-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu monitor on dcgm doc #2292
base: main
Are you sure you want to change the base?
gpu monitor on dcgm doc #2292
Conversation
Signed-off-by: [email protected] <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2292 +/- ##
==========================================
+ Coverage 66.03% 66.05% +0.01%
==========================================
Files 457 457
Lines 53751 53751
==========================================
+ Hits 35495 35504 +9
+ Misses 15705 15696 -9
Partials 2551 2551
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -51,7 +51,7 @@ Currently, plugins from resmanager in Koordlet are mixed together, they should b | |||
## Proposal | |||
### Design | |||
|
|||
![image](../../images/qos-manager.svg) | |||
_![image](../../images/qos-manager.svg)_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个改动是为啥
|
||
### Goals | ||
- The GPU NRI HOOK writes the Pod list and corresponding GPU list information to the mount file on the GPU NRI maintenance terminal. | ||
- Adjust the logic of the DCGM component reading Pod and GPU: switch from reading from kubelet.sock to reading from the above mount file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't modify the dcgm code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to implement through configuration and is being analyzed
Ⅰ. Describe what this PR does
In order to be compatible with dcgm for GPU monitoring, adjust the adaptation of dcgm and GPU Hook NRI
Ⅱ. Does this pull request fix one issue?
fixes #2171
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test