-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu monitor on dcgm doc #2292
Open
ferris-cx
wants to merge
2
commits into
koordinator-sh:main
Choose a base branch
from
ferris-cx:monitor-dcgm
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
gpu monitor on dcgm doc #2292
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
172 changes: 172 additions & 0 deletions
172
docs/proposals/koordlet/20241210-GPU monitor on DCGM.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
# GPU Monitor on DCGM | ||
|
||
## Summary | ||
|
||
To support koordinator's GPU monitoring capabilities, koordinator needs smooth integration with the GPU monitoring component dcgm. In order to achieve this goal, a joint adaptation scheme of koordinator and dcgm is proposed to collect GPU indicators on a single node. | ||
|
||
|
||
## Motivation | ||
|
||
dcgm is a GPU monitoring component, which reads the kubelet.sock on the side to obtain pod and allocate GPU information, and then associate GPU monitoring indicators. The GPU allocation of Koordlet framework is a centralized allocation architecture, and kublete does not have GPU allocation information on the end side, so adaptation modification is required. | ||
|
||
### Goals | ||
- The GPU NRI HOOK writes the Pod list and corresponding GPU list information to the mount file on the GPU NRI maintenance terminal. | ||
- Adjust the logic of the DCGM component reading Pod and GPU: switch from reading from kubelet.sock to reading from the above mount file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we shouldn't modify the dcgm code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be possible to implement through configuration and is being analyzed |
||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
|
||
As a user, I want to observe the performance metrics of GPU at runtime. | ||
|
||
#### Story 2 | ||
|
||
As a cluster administrator, I want to monitor the GPU resource usage of the cluster. | ||
|
||
### Design | ||
|
||
#### GPU NRI Hook | ||
|
||
The GPU NRI Hook maintains a list of all Pods on the node and a list of corresponding allocated GPUs in the gpu.go code each time a Pod's GPU request is processed. | ||
|
||
The path of the gpu.go code file is: | ||
|
||
```go | ||
pkg/koordlet/runtimehooks/hooks/gpu | ||
- gpu.go | ||
``` | ||
|
||
Define a function in the gpu.go code file: maintainPodList(). | ||
|
||
Maintain PodList logic is implemented in the function maintainPodList(), and function named InjectContainerGPUEnv() in the Gpu.go code file actively calls the function to dynamically maintain the latest Pod list each time the GPU environment variable is set.The function definition and function call are as follows in the gpu.go code file: | ||
|
||
```go | ||
func maintainPodList(request ContainerRequest) error { | ||
... | ||
|
||
} | ||
|
||
func (p *gpuPlugin) InjectContainerGPUEnv(proto protocol.HooksProtocol) error { | ||
... | ||
|
||
maintainPodList(request) | ||
} | ||
``` | ||
|
||
In order to be compatible with dcgm's pod and GPU mapping data structures, a structure named PodInfo is defined in the NRI HOOK to maintain Pod information, including the following properties: Pod's name, namespace, container name, and list of assigned GPUs. | ||
|
||
The structure PodInfo is defined as follows: | ||
|
||
```go | ||
type PodInfo struct { | ||
Name string `json:"name"` | ||
Namespace string `json:"namespace"` | ||
Container string `json:"container"` | ||
GPUList []string `json:"gpu_list"` | ||
} | ||
``` | ||
|
||
Then koordlet mounts the directory file of the host, such as /var/lib/kubelet/pod-resources/podlist.json, which is used to write Pod and GPU information. Mount information is edited at koordlet.yaml: | ||
|
||
```yaml | ||
volumeMounts: | ||
- mountPath: /podlist.json | ||
name: podInfo | ||
...... | ||
volumes: | ||
- hostPath: | ||
path: /var/lib/kubelet/pod-resources/podlist.json | ||
type: "" | ||
name: podInfo | ||
``` | ||
|
||
The GPU NRI Hook is then responsible for converting the maintained podInfo list information into a json string and writing it to a file named podlist.json. | ||
|
||
<img src="/docs/images/dcgm-gpuhook-architecture.png" style="zoom:25%;" /> | ||
|
||
#### DCGM | ||
|
||
In the DCGM native project, the logic to extract the list of Pods and the list of Gpus is shown in the following code: | ||
|
||
```go | ||
#dcgm\pkg\dcgmexporter\kubernetes.go | ||
|
||
func (p *PodMapper) Process(metrics MetricsByCounter, sysInfo SystemInfo) error { | ||
socketPath := p.Config.PodResourcesKubeletSocket | ||
_, err := os.Stat(socketPath) | ||
if os.IsNotExist(err) { | ||
logrus.Info("No Kubelet socket, ignoring") | ||
return nil | ||
} | ||
|
||
c, cleanup, err := connectToServer(socketPath) | ||
if err != nil { | ||
return err | ||
} | ||
defer cleanup() | ||
|
||
pods, err := p.listPods(c) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
deviceToPod := p.toDeviceToPod(pods, sysInfo) | ||
|
||
...... | ||
|
||
return nil | ||
} | ||
``` | ||
|
||
The connectToServer(socketPath) above indicates: PodResourcesKubeletSocket pointing Kubernetes is a particular node in the cluster on the Unix domain socket file, through this socket can query scheduling to the Pod of this node and its information resources.That is, use gRPC to query Pod resource information from kubelet. | ||
|
||
Because the architecture of the koordinator framework allocates GPU devices centrally on the central side, rather than by Kubelet end-to-side, kubelet does not save Pod and GPU allocation information. Therefore, the logic of DCGM querying pod and GPU devices needs to be adjusted, from the previous query kubelet to read the file podlist.json to query. Since dcgm is deployed in DaemonSet, the dcgm component on each node must be mounted to the directory of the host machine: /var/lib/kubelet/pod-resources/podlist.json, so that the file can be accessed directly from inside the dcgm container.The architecture diagram of dcgm is as follows: | ||
|
||
<img src="/docs/images/dcgm-gpuhook-readwrite.png" style="zoom:25%;" /> | ||
|
||
Defines a function that reads a json file to construct a pod and gpu mapping. | ||
|
||
```go | ||
func (p *PodMapper) listPods(string filePath) ([]PodInfo, error) {} | ||
``` | ||
|
||
Define another function to handle the listPods return value: the PodInfo array. Walk through the set, converted to map[string]PodInfo | ||
|
||
```go | ||
func (p *PodMapper) toDeviceToPod([]PodInfo, sysInfo SystemInfo) map[string]PodInfo {} | ||
``` | ||
|
||
Finally, the function Process of dcgm is changed to: | ||
|
||
``` | ||
#dcgm\pkg\dcgmexporter\kubernetes.go | ||
|
||
func (p *PodMapper) Process(metrics MetricsByCounter, sysInfo SystemInfo) error { | ||
socketPath := p.Config.PodResourcesKubeletSocket | ||
_, err := os.Stat(socketPath) | ||
if os.IsNotExist(err) { | ||
logrus.Info("No Kubelet socket, ignoring") | ||
return nil | ||
} | ||
|
||
c, cleanup, err := connectToServer(socketPath) | ||
if err != nil { | ||
return err | ||
} | ||
defer cleanup() | ||
|
||
pods, err := p.listPods(/podlist.json) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
deviceToPod := p.toDeviceToPod(pods, sysInfo) | ||
|
||
...... | ||
|
||
return nil | ||
} | ||
``` | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个改动是为啥