From 9e8e131e2901d042b35485e36e9cdb4647e64cdd Mon Sep 17 00:00:00 2001 From: Fabien Dupont Date: Wed, 14 Aug 2024 06:41:40 -0400 Subject: [PATCH] Modify condition on NVIDIA FabricManager service The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager service, so it cannot be a condition for the `nvidia-fabricmanager` service. Looking at the NVIDIA driver startup script for Kubernetes, the actual check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the fact that it's not empty [1]. This change modifies the condition to `ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which verifies that a certain path exists and is a non-empty directory. [1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269 Signed-off-by: Fabien Dupont --- training/nvidia-bootc/Containerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/training/nvidia-bootc/Containerfile b/training/nvidia-bootc/Containerfile index dc4b282d..ba4fc0e2 100644 --- a/training/nvidia-bootc/Containerfile +++ b/training/nvidia-bootc/Containerfile @@ -146,7 +146,7 @@ RUN mv /etc/selinux /etc/selinux.tmp \ && mv /etc/selinux.tmp /etc/selinux \ && ln -s /usr/lib/systemd/system/nvidia-toolkit-firstboot.service /usr/lib/systemd/system/basic.target.wants/nvidia-toolkit-firstboot.service \ && echo "blacklist nouveau" > /etc/modprobe.d/blacklist_nouveau.conf \ - && sed -i '/\[Unit\]/a ConditionPathExists = /dev/nvidia-nvswitchctl' /usr/lib/systemd/system/nvidia-fabricmanager.service \ + && sed -i '/\[Unit\]/a ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices' /usr/lib/systemd/system/nvidia-fabricmanager.service \ && ln -s /usr/lib/systemd/system/nvidia-fabricmanager.service /etc/systemd/system/multi-user.target.wants/nvidia-fabricmanager.service \ && ln -s /usr/lib/systemd/system/nvidia-persistenced.service /etc/systemd/system/multi-user.target.wants/nvidia-persistenced.service