Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust resource requests for dcgm exporter #788

Merged
merged 6 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/tfy-gpu-operator/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: tfy-gpu-operator
version: 0.1.21
version: 0.1.22
description: "Truefoundry GPU Operator"
maintainers:
- name: truefoundry
Expand Down
18 changes: 9 additions & 9 deletions charts/tfy-gpu-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `aws-eks-gpu-operator.dcgmExporter.version` | Image tag version for DCGM Exporter. Find all tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/dcgm-exporter/tags | `3.3.8-3.6.0-ubuntu22.04` |
| `aws-eks-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `300Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `aws-eks-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### gcp-gke-standard-driver Configuration for the GKE Standard Nvidia Drivers. This section will only be used when clusterType.gcpGkeStandard is set to true.
Expand Down Expand Up @@ -130,15 +130,15 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `azure-aks-gpu-operator.dcgm.version` | Image tag for DCGM container. Find all image tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cloud-native/containers/dcgm/tags | `3.3.8-1-ubuntu22.04` |
| `azure-aks-gpu-operator.dcgm.resources.requests.cpu` | CPU request for standalone DCGM container | `10m` |
| `azure-aks-gpu-operator.dcgm.resources.requests.memory` | Memory request for standalone DCGM container | `100Mi` |
| `azure-aks-gpu-operator.dcgm.resources.limits.cpu` | CPU limit for standalone DCGM container | `50m` |
| `azure-aks-gpu-operator.dcgm.resources.limits.memory` | Memory limit for standalone DCGM container | `400Mi` |
| `azure-aks-gpu-operator.dcgm.resources.limits.cpu` | CPU limit for standalone DCGM container | `100m` |
| `azure-aks-gpu-operator.dcgm.resources.limits.memory` | Memory limit for standalone DCGM container | `1000Mi` |
| `azure-aks-gpu-operator.dcgmExporter.enabled` | Enabled/Disable DCGM Exporter. | `true` |
| `azure-aks-gpu-operator.dcgmExporter.version` | Image tag version for DCGM Exporter. Find all tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/dcgm-exporter/tags | `3.3.8-3.6.0-ubuntu22.04` |
| `azure-aks-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `azure-aks-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `azure-aks-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### civo-talos-gpu-operator Configuration for the Civo Talos GPU Operator. This section will only be used when clusterType.civoTalos is set to true.
Expand Down Expand Up @@ -178,8 +178,8 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `civo-talos-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `civo-talos-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `civo-talos-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### generic-gpu-operator Configuration for the GPU Operator. This section will only be used when clusterType.generic is set to true.
Expand Down
18 changes: 9 additions & 9 deletions charts/tfy-gpu-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -315,10 +315,10 @@ aws-eks-gpu-operator:
resources:
requests:
cpu: 10m
memory: 300Mi
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param aws-eks-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down Expand Up @@ -731,8 +731,8 @@ azure-aks-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi

## DCGM Exporter configuration.
dcgmExporter:
Expand Down Expand Up @@ -762,8 +762,8 @@ azure-aks-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param azure-aks-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down Expand Up @@ -973,8 +973,8 @@ civo-talos-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param civo-talos-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down
Loading