Skip to content

Commit

Permalink
Modify condition on NVIDIA FabricManager service
Browse files Browse the repository at this point in the history
The `/dev/nvswitchctl` device is created by the NVIDIA Fabric Manager
service, so it cannot be a condition for the `nvidia-fabricmanager`
service.

Looking at the NVIDIA driver startup script for Kubernetes, the actual
check is the presence of `/proc/driver/nvidia-nvswitch/devices` and the
fact that it's not empty [1].

This change modifies the condition to
`ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices`, which
verifies that a certain path exists and is a non-empty directory.

[1] https://gitlab.com/nvidia/container-images/driver/-/blob/main/rhel9/nvidia-driver?ref_type=heads#L262-269

Signed-off-by: Fabien Dupont <[email protected]>
  • Loading branch information
fabiendupont committed Aug 14, 2024
1 parent 0ba38e0 commit 9e8e131
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion training/nvidia-bootc/Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ RUN mv /etc/selinux /etc/selinux.tmp \
&& mv /etc/selinux.tmp /etc/selinux \
&& ln -s /usr/lib/systemd/system/nvidia-toolkit-firstboot.service /usr/lib/systemd/system/basic.target.wants/nvidia-toolkit-firstboot.service \
&& echo "blacklist nouveau" > /etc/modprobe.d/blacklist_nouveau.conf \
&& sed -i '/\[Unit\]/a ConditionPathExists = /dev/nvidia-nvswitchctl' /usr/lib/systemd/system/nvidia-fabricmanager.service \
&& sed -i '/\[Unit\]/a ConditionDirectoryNotEmpty=/proc/driver/nvidia-nvswitch/devices' /usr/lib/systemd/system/nvidia-fabricmanager.service \
&& ln -s /usr/lib/systemd/system/nvidia-fabricmanager.service /etc/systemd/system/multi-user.target.wants/nvidia-fabricmanager.service \
&& ln -s /usr/lib/systemd/system/nvidia-persistenced.service /etc/systemd/system/multi-user.target.wants/nvidia-persistenced.service

Expand Down

0 comments on commit 9e8e131

Please sign in to comment.