PR openxla#11164: [XLA:GPU] bump up minimum PTX ISA to be 8.1 for CUD…

…A >= 12.1 Imported from GitHub PR openxla#11164 NV internal workloads facing an error rn: `error : Feature 'Kernel parameter size larger than 4352 bytes' requires PTX ISA .version 8.1 or later`. Bumping up to PTX81 solved the issue however it requires minimum CUDA 12.1 to work. So for WAR, I added a check to use PTX81 if CUDA 12.1 is available. Copybara import of the project: -- 4a15c97 by cjkkkk <ske@nvidia.com>: bump up minimum PTX ISA to be 8.1 for CUDA >= 12.1 -- b04703e by cjkkkk <ske@nvidia.com>: include cuda.h Merging this change closes openxla#11164 COPYBARA_INTEGRATE_REVIEW=openxla#11164 from Cjkkkk:bump_to_ptx81_cuda12.1 b04703e PiperOrigin-RevId: 622120255
intelligent-machine-learning · zjjott · Mar 28, 2024 · Mar 28, 2024 · Mar 28, 2024 · Mar 28, 2024
commit 75d64d4456ee2dd2b9423d5c2c516be9df6dec40
diff --git a/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc b/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc
@@ -76,6 +76,10 @@ limitations under the License.
 #include "rocm/rocm_config.h"
 #endif
 
+#if GOOGLE_CUDA
+#include "third_party/gpus/cuda/include/cuda.h"
+#endif
+
 namespace xla {
 namespace gpu {
 namespace {
@@ -294,6 +298,11 @@ std::unique_ptr<llvm::TargetMachine> NVPTXGetTargetMachine(
     const DebugOptions& debug_options) {
   // Figure out the exact name of the processor as known to the NVPTX backend
   // from the gpu_architecture flag.
+#if defined(GOOGLE_CUDA) && CUDA_VERSION >= 12010
+  // use ptx81 for CUDA >= 12.1
+  return GetTargetMachine(target_triple, GetSmName(compute_capability),
+                          debug_options, /*feature_str=*/"+ptx81");
+#endif
   return GetTargetMachine(target_triple, GetSmName(compute_capability),
                           debug_options, /*feature_str=*/"+ptx74");
 }