Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Handle minimum GPU architecture supported [databricks] (#10540)
Fixes #10430. This PR ensures that Spark RAPIDS jobs are executed on supported GPU architectures without relying on manual configuration. ### Changes: 1. Processes `gpu_architectures` property from the `*version-info.properties` file generated by the native builds. 2. Verifies if the user is running the job on an architecture supported by the cuDF and JNI libraries and throws an exception if the architecture is unsupported. ### Testing Tested on a Dataproc VM running on Nvidia P4 (GPU Architecture 6.1) ``` 24/03/06 17:44:58 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU. 24/03/06 17:45:10 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down! java.lang.RuntimeException: Device architecture 61 is unsupported. Minimum supported architecture: 75. at com.nvidia.spark.rapids.RapidsPluginUtils$.checkGpuArchitectureInternal(Plugin.scala:366) at com.nvidia.spark.rapids.RapidsPluginUtils$.checkGpuArchitecture(Plugin.scala:375) at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:461) ``` ### Related PR * NVIDIA/spark-rapids-jni#1840 * Add conf for minimum supported CUDA and error handling Signed-off-by: Partho Sarthi <[email protected]> * Revert "Add conf for minimum supported CUDA and error handling" This reverts commit 7b8eaea. * Verify the GPU architecture is supported by the plugin libraries Signed-off-by: Partho Sarthi <[email protected]> * Use semi-colon as delimiter and use intersection of supported gpu architectures Signed-off-by: Partho Sarthi <[email protected]> * Allow for compatibility with major architectures Signed-off-by: Partho Sarthi <[email protected]> * Check for version as integers Signed-off-by: Partho Sarthi <[email protected]> * Modify compatibility check for same major version and same or higher minor version Signed-off-by: Partho Sarthi <[email protected]> * Add a config to skip verification and refactor checking Signed-off-by: Partho Sarthi <[email protected]> * Update RapidsConf.scala Co-authored-by: Jason Lowe <[email protected]> * Update verification logic Signed-off-by: Partho Sarthi <[email protected]> * Update warning message Signed-off-by: Partho Sarthi <[email protected]> * Add unit tests and update warning message. Signed-off-by: Partho Sarthi <[email protected]> * Update exception class Signed-off-by: Partho Sarthi <[email protected]> * Address review comments Signed-off-by: Partho Sarthi <[email protected]> --------- Signed-off-by: Partho Sarthi <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
- Loading branch information