You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We attempted to install the Nvidia driver directly on our node's base image instead of using the GPU operator. However, after doing so, the resource requests and limits set for the pods are no longer effective, and all containers within the pods are able to access all the GPUs.
Here I am trying to set request and limit to 5.
But when I enter into the container and check, I am able to see all the 8 gpus.
However, we tested running the same pod in a different environment where the same driver version was installed using the GPU operator (instead of directly in the base image), and it worked as expected.
What could be the problem? Is there a way to fix it?
The text was updated successfully, but these errors were encountered:
Gpu operator version: v24.6.1
driver.version: 535.154.05
device plugin verion: v0.16.2-ubi8
Kubernetes distribution
EKS
Kubernetes version
v1.27.0
Hi,
We attempted to install the Nvidia driver directly on our node's base image instead of using the GPU operator. However, after doing so, the resource requests and limits set for the pods are no longer effective, and all containers within the pods are able to access all the GPUs.
Sample pod spec
Here I am trying to set request and limit to 5.
But when I enter into the container and check, I am able to see all the 8 gpus.
However, we tested running the same pod in a different environment where the same driver version was installed using the GPU operator (instead of directly in the base image), and it worked as expected.
What could be the problem? Is there a way to fix it?
The text was updated successfully, but these errors were encountered: