Skip to content

Commit

Permalink
Adjust resource requests for dcgm exporter (#788)
Browse files Browse the repository at this point in the history
* Adjust resource requests for dcgm exporter

* Update README.md with readme-generator-for-helm

Signed-off-by: chiragjn <[email protected]>

* Keep requests low for dcgm exporter

* Update README.md with readme-generator-for-helm

Signed-off-by: chiragjn <[email protected]>

---------

Signed-off-by: chiragjn <[email protected]>
Co-authored-by: chiragjn <[email protected]>
  • Loading branch information
chiragjn and chiragjn authored Nov 25, 2024
1 parent 154205b commit 8b9571a
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 19 deletions.
2 changes: 1 addition & 1 deletion charts/tfy-gpu-operator/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
name: tfy-gpu-operator
version: 0.1.21
version: 0.1.22
description: "Truefoundry GPU Operator"
maintainers:
- name: truefoundry
Expand Down
18 changes: 9 additions & 9 deletions charts/tfy-gpu-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `aws-eks-gpu-operator.dcgmExporter.version` | Image tag version for DCGM Exporter. Find all tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/dcgm-exporter/tags | `3.3.8-3.6.0-ubuntu22.04` |
| `aws-eks-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `300Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `aws-eks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `aws-eks-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### gcp-gke-standard-driver Configuration for the GKE Standard Nvidia Drivers. This section will only be used when clusterType.gcpGkeStandard is set to true.
Expand Down Expand Up @@ -130,15 +130,15 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `azure-aks-gpu-operator.dcgm.version` | Image tag for DCGM container. Find all image tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cloud-native/containers/dcgm/tags | `3.3.8-1-ubuntu22.04` |
| `azure-aks-gpu-operator.dcgm.resources.requests.cpu` | CPU request for standalone DCGM container | `10m` |
| `azure-aks-gpu-operator.dcgm.resources.requests.memory` | Memory request for standalone DCGM container | `100Mi` |
| `azure-aks-gpu-operator.dcgm.resources.limits.cpu` | CPU limit for standalone DCGM container | `50m` |
| `azure-aks-gpu-operator.dcgm.resources.limits.memory` | Memory limit for standalone DCGM container | `400Mi` |
| `azure-aks-gpu-operator.dcgm.resources.limits.cpu` | CPU limit for standalone DCGM container | `100m` |
| `azure-aks-gpu-operator.dcgm.resources.limits.memory` | Memory limit for standalone DCGM container | `1000Mi` |
| `azure-aks-gpu-operator.dcgmExporter.enabled` | Enabled/Disable DCGM Exporter. | `true` |
| `azure-aks-gpu-operator.dcgmExporter.version` | Image tag version for DCGM Exporter. Find all tags at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/dcgm-exporter/tags | `3.3.8-3.6.0-ubuntu22.04` |
| `azure-aks-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `azure-aks-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `azure-aks-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `azure-aks-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### civo-talos-gpu-operator Configuration for the Civo Talos GPU Operator. This section will only be used when clusterType.civoTalos is set to true.
Expand Down Expand Up @@ -178,8 +178,8 @@ Tfy-gpu-operator is a Helm chart that facilitates the deployment and management
| `civo-talos-gpu-operator.dcgmExporter.serviceMonitor.enabled` | Enable or disable ServiceMonitor for DCGM Exporter. | `false` |
| `civo-talos-gpu-operator.dcgmExporter.resources.requests.cpu` | CPU request for the DCGM Exporter. | `10m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.requests.memory` | Memory request for the DCGM Exporter. | `100Mi` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `50m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `400Mi` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.cpu` | CPU limit for the DCGM Exporter. | `100m` |
| `civo-talos-gpu-operator.dcgmExporter.resources.limits.memory` | Memory limit for the DCGM Exporter. | `1000Mi` |
| `civo-talos-gpu-operator.dcgmExporter.args` | Arguments for the DCGM Exporter. | `["-c","5000"]` |

### generic-gpu-operator Configuration for the GPU Operator. This section will only be used when clusterType.generic is set to true.
Expand Down
18 changes: 9 additions & 9 deletions charts/tfy-gpu-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -315,10 +315,10 @@ aws-eks-gpu-operator:
resources:
requests:
cpu: 10m
memory: 300Mi
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param aws-eks-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down Expand Up @@ -731,8 +731,8 @@ azure-aks-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi

## DCGM Exporter configuration.
dcgmExporter:
Expand Down Expand Up @@ -762,8 +762,8 @@ azure-aks-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param azure-aks-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down Expand Up @@ -973,8 +973,8 @@ civo-talos-gpu-operator:
cpu: 10m
memory: 100Mi
limits:
cpu: 50m
memory: 400Mi
cpu: 100m
memory: 1000Mi
## @param civo-talos-gpu-operator.dcgmExporter.args Arguments for the DCGM Exporter.
args: ["-c", "5000"]

Expand Down

0 comments on commit 8b9571a

Please sign in to comment.