The resource requests and limits are not being applied to the pod as expected. #1145

IndhumithaR · 2024-11-28T12:04:06Z

Gpu operator version: v24.6.1
driver.version: 535.154.05
device plugin verion: v0.16.2-ubi8

Kubernetes distribution
EKS

Kubernetes version
v1.27.0

Hi,

We attempted to install the Nvidia driver directly on our node's base image instead of using the GPU operator. However, after doing so, the resource requests and limits set for the pods are no longer effective, and all containers within the pods are able to access all the GPUs.

Sample pod spec

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-smi-pod-3
spec:

  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values:
            - g5.48xlarge
  containers:
  - name: nvidia-smi-container
    image: nvidia/cuda:12.6.2-cudnn-devel-ubuntu20.04
    command: ["sleep", "infinity"]
    resources:
      limits:
        nvidia.com/gpu: 5
      requests:
        nvidia.com/gpu: 5
 
    securityContext:
      capabilities:
        add:
        - SYS_NICE
      privileged: true
  tolerations:
  - key: "nvidia.com/gpu"
    value: "true"
    effect: "NoSchedule"

Here I am trying to set request and limit to 5.
But when I enter into the container and check, I am able to see all the 8 gpus.

However, we tested running the same pod in a different environment where the same driver version was installed using the GPU operator (instead of directly in the base image), and it worked as expected.

What could be the problem? Is there a way to fix it?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The resource requests and limits are not being applied to the pod as expected. #1145

The resource requests and limits are not being applied to the pod as expected. #1145

IndhumithaR commented Nov 28, 2024 •

edited

Loading

The resource requests and limits are not being applied to the pod as expected. #1145

The resource requests and limits are not being applied to the pod as expected. #1145

Comments

IndhumithaR commented Nov 28, 2024 • edited Loading

IndhumithaR commented Nov 28, 2024 •

edited

Loading