You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @jlowe, just to double confirm, even after we build JNI on cuda 12.2, for this core dump feature, if it's only supported from Drivers that are CUDA 12.1+, your code will automatically detect the driver version like 12.0.x then disable the feature. Make sure no failure on the old Driver with CUDA 12.0.x. Am I right?
Make sure no failure on the old Driver with CUDA 12.0.x. Am I right?
Yes. We will need a test pipeline against a CUDA 12.0 driver to help verify there are no regressions there.
Thanks for the clarification!
Can we also add some flags for CI to mark those cases that should be verified in older drivers? This would help save much resources and time by enabling tests only w/ specific labels and no need to run others
Drivers that are CUDA 12.1+ compatible provide the ability to programmatically control GPU core dumps, see https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__COREDUMP.html#group__CUDA__COREDUMP. This can remove the limitations encountered with #9238 where we cannot always programmatically control the environment variables of an executor process.
We should add native bindings to use these APIs and the ability to safely detect when they are available.
The text was updated successfully, but these errors were encountered: