-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: putting libcuda.so
in our Gentoo Prefix installation.
#79
Comments
wato-github-automation bot
pushed a commit
to WATonomous/watcloud-website
that referenced
this issue
Mar 17, 2024
We can use it to run software like CUDA without having to install the software. Main doc: https://docs.alliancecan.ca/wiki/Accessing_CVMFS ### Notes on the Compute Canada CUDA library requirements `/usr/lib64/nvidia/libcuda.so` and `/usr/lib64/nvidia/libcuda.so.1` must exist. Otherwise we get error `CUDA driver version is insufficient for CUDA runtime version`. https://docs.alliancecan.ca/wiki/Accessing_CVMFS#CUDA_location https://github.com/ComputeCanada/software-stack-config/blob/a5557c946ca25e2ca41b74716557eb9f5ab5e9c1/lmod/SitePackage.lua#L203-L219 Related issues: ComputeCanada/software-stack#58 ComputeCanada/software-stack#79 This works in ubuntu22.04 on CUDA 12.2 (driver version`535.161.07`): ```bash mkdir /usr/lib64/nvidia cd /usr/lib64/nvidia ln -s ../../lib/x86_64-linux-gnu/libcuda.so . ln -s ../../lib/x86_64-linux-gnu/libcuda.so.1 . ``` Can test this by compiling and running the `vectorAdd` program in cuda-samples: https://github.com/NVIDIA/cuda-samples/tree/3559ca4d088e12db33d6918621cab5c998ccecf1/Samples/0_Introduction/vectorAdd Here's a diff to print out driver and runtime versions: ``` diff --git a/Samples/0_Introduction/vectorAdd/vectorAdd.cu b/Samples/0_Introduction/vectorAdd/vectorAdd.cu index 284b0f0e..3b22df2b 100644 --- a/Samples/0_Introduction/vectorAdd/vectorAdd.cu +++ b/Samples/0_Introduction/vectorAdd/vectorAdd.cu @@ -64,6 +64,30 @@ int main(void) { // Print the vector length to be used, and compute its size int numElements = 50000; size_t size = numElements * sizeof(float); + + int driverVersion = 0, runtimeVersion = 0; + + + cudaError_t error; + + // Get CUDA Driver Version + error = cudaDriverGetVersion(&driverVersion); + printf("cudaDriverGetVersion() - error: %d\n", error); + if (error != cudaSuccess) { + printf("cudaDriverGetVersion error: %d\n", error); + } else { + printf("CUDA Driver Version: %d.%d\n", driverVersion / 1000, (driverVersion % 100) / 10); + } + + // Get CUDA Runtime Version + error = cudaRuntimeGetVersion(&runtimeVersion); + printf("cudaRuntimeGetVersion() - error: %d\n", error); + if (error != cudaSuccess) { + printf("cudaRuntimeGetVersion error: %d\n", error); + } else { + printf("CUDA Runtime Version: %d.%d\n", runtimeVersion / 1000, (runtimeVersion % 100) / 10); + } + printf("[Vector addition of %d elements]\n", numElements); // Allocate the host input vector A ``` When the `/usr/lib64/nvidia/libcuda.so{,.1}` files don't exist, we get: ``` cudaDriverGetVersion() - error: 0 CUDA Driver Version: 0.0 cudaRuntimeGetVersion() - error: 0 CUDA Runtime Version: 12.2 [Vector addition of 50000 elements] Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)! ``` When everything works properly, we get: ``` cudaDriverGetVersion() - error: 0 CUDA Driver Version: 12.2 cudaRuntimeGetVersion() - error: 0 CUDA Runtime Version: 12.2 [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ``` Works on older driver versions as well, because CUDA has forward compatibility if the major version is the same. ``` cudaDriverGetVersion() - error: 0 CUDA Driver Version: 12.0 cudaRuntimeGetVersion() - error: 0 CUDA Runtime Version: 12.2 [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done ``` Binaries compiled in this way require `/cvmfs` and `/usr/lib64/nvidia` to work: ``` docker run --rm -it --gpus all -v /cvmfs:/cvmfs:ro -v /usr/lib64/nvidia:/usr/lib64/nvidia:ro -v /home/ben/Projects/cuda-samples:/workspace nvidia/cuda:12.0.0-runtime-ubuntu22.04 /workspace/Samples/0_Introduction/vectorAdd/vectorAdd ``` Actually, those are the only paths (other than a matching base OS image) required for it to work: ``` docker run --rm -it --gpus all -v /cvmfs:/cvmfs:ro -v /usr/lib64/nvidia:/usr/lib64/nvidia:ro -v /home/ben/Projects/cuda-samples:/workspace ubuntu /workspace/Samples/0_Introduction/vectorAdd/vectorAdd ``` Note that `/usr/lib64/nvidia/libcuda.so{,.1}` is a runtime dependency and not a build-time dependency.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some important info about the CUDA software stack, and how it could change (subject to testing):
As most of you know we have 3 layers:
/lib/modules/$(uname -r)/extra/nvidia.ko.xz
and related)/usr/lib64/nvidia/libcuda.so
)module load cuda
)up so far we assumed that 1 & 2 are tightly coupled. But an NVidia employee in the EasyBuild slack clarified they are not, and
libcuda.so.1
is forward compatible and the newest libcuda (465.x) is compatible with kernel drivers going all the way back to 418.40.04+.Note that in fact there are four maintained driver families: the long term support ones (R418, EOL Mar 2022, R450, EOL Jul 2023) and short term ones (R460, EOL Jan 2022, and R465). Béluga and Graham are running an R460 version, Cedar is at R455, which is no longer supported.
So this means that we could put the newest libcuda in cvmfs and the sysadmins only need to worry about the kernel modules. This will need to be tested of course (which we can do via
LD_LIBRARY_PATH
and/or the cvmfs-dev repo).Once libcuda is in place all cuda toolkit modules, including 11.3, can then be used on all clusters, irrespective of the kernel driver (as long as it's >= R418.40.04), and the present Lmod check could become obsolete.
As for kernel modules, clusters could consider staying with an R450 version, since with libcuda in cvmfs it no longer needs to be upgraded to 460 to stay compatible with newer CUDA toolkit versions.
see this
https://docs.nvidia.com/datacenter/tesla/drivers/#lifecycle
and this:
https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cuda-compatibility-platform
The text was updated successfully, but these errors were encountered: