Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NVIDIA GPU Operator for GPU nodes #81

Open
davidspek opened this issue Apr 29, 2021 · 0 comments
Open

Use NVIDIA GPU Operator for GPU nodes #81

davidspek opened this issue Apr 29, 2021 · 0 comments

Comments

@davidspek
Copy link
Contributor

NVIDIA provides a GPU operator that handles deploying the DaemonSet, installing the driver, setting up the container runtime, and provides a metrics endpoint for Prometheus to scrape. It would be nice if this was deployed on GPU nodes automatically so people can use them without further configuration.
Docs for installing the operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-kubernetes

Requirements are that the NVIDIA driver is not installed on the host system and that nouveau drivers are disabled. For Ubuntu based systems you can disable nouveau at boot with the following kernel parameters, avoiding the need to reboot nodes:

nomodeset modprobe.blacklist=nouveau nouveau.blacklist=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant