Skip to content

Commit

Permalink
[flang][cuda] Fix GPULaunchKernelConversion to generate correct kerne…
Browse files Browse the repository at this point in the history
…l launch parameters (llvm#119431)

For the call to _FortranACUFLaunchKernel, we store the pointer to a
member of a temporary structure in a parameter array. However, when we
obtain an element pointer from the parameter array, its address is
calculated based on the type of the structure. This PR properly treats
the parameter array as an array of pointers.

Example:

```mlir
%30 = llvm.load %29 : !llvm.ptr -> i32
%31 = llvm.mlir.constant(1 : i32) : i32
%32 = llvm.alloca %31 x !llvm.struct<(i64, i64, i32, ptr)> : (i32) -> !llvm.ptr
%33 = llvm.mlir.constant(4 : i32) : i32
%34 = llvm.alloca %33 x !llvm.ptr : (i32) -> !llvm.ptr
%35 = llvm.mlir.constant(0 : i32) : i32
%36 = llvm.getelementptr %32[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %8, %36 : i64, !llvm.ptr
%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>
llvm.store %36, %37 : !llvm.ptr, !llvm.ptr
...
llvm.call @_FortranACUFLaunchKernel(%47, %8, %8, %8, %2, %8, %8, %7, %34, %48) : (!llvm.ptr, i64, i64, i64, i64, i64, i64, i32, !llvm.ptr, !llvm.ptr) -> () 
```
In this example, `%37 = llvm.getelementptr %34[%35] : (!llvm.ptr, i32)
-> !llvm.ptr, !llvm.struct<(i64, i64, i32, ptr)>` will be `%37 =
llvm.getelementptr %34[%35] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr`.
  • Loading branch information
khaki3 authored Dec 10, 2024
1 parent ffb19f4 commit e9866d5
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 3 deletions.
2 changes: 1 addition & 1 deletion flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ static mlir::Value createKernelArgArray(mlir::Location loc,
loc, ptrTy, structTy, argStruct, mlir::ArrayRef<mlir::Value>({indice}));
rewriter.create<LLVM::StoreOp>(loc, arg, structMember);
mlir::Value arrayMember = rewriter.create<LLVM::GEPOp>(
loc, ptrTy, structTy, argArray, mlir::ArrayRef<mlir::Value>({indice}));
loc, ptrTy, ptrTy, argArray, mlir::ArrayRef<mlir::Value>({indice}));
rewriter.create<LLVM::StoreOp>(loc, structMember, arrayMember);
}
return argArray;
Expand Down
11 changes: 9 additions & 2 deletions flang/test/Fir/CUDA/cuda-gpu-launch-func.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,16 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<i1, dense<8> : ve
}

// CHECK-LABEL: _QMmod1Phost_sub

// CHECK: %[[STRUCT:.*]] = llvm.alloca %{{.*}} x !llvm.struct<(ptr)> : (i32) -> !llvm.ptr
// CHECK: %[[PARAMS:.*]] = llvm.alloca %{{.*}} x !llvm.ptr : (i32) -> !llvm.ptr
// CHECK: %[[ZERO:.*]] = llvm.mlir.constant(0 : i32) : i32
// CHECK: %[[STRUCT_PTR:.*]] = llvm.getelementptr %[[STRUCT]][%[[ZERO]]] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.struct<(ptr)>
// CHECK: llvm.store %{{.*}}, %[[STRUCT_PTR]] : !llvm.ptr, !llvm.ptr
// CHECK: %[[PARAM_PTR:.*]] = llvm.getelementptr %[[PARAMS]][%[[ZERO]]] : (!llvm.ptr, i32) -> !llvm.ptr, !llvm.ptr
// CHECK: llvm.store %[[STRUCT_PTR]], %[[PARAM_PTR]] : !llvm.ptr, !llvm.ptr
// CHECK: %[[KERNEL_PTR:.*]] = llvm.mlir.addressof @_QMmod1Psub1 : !llvm.ptr
// CHECK: llvm.call @_FortranACUFLaunchKernel(%[[KERNEL_PTR]], {{.*}})
// CHECK: %[[NULL:.*]] = llvm.mlir.zero : !llvm.ptr
// CHECK: llvm.call @_FortranACUFLaunchKernel(%[[KERNEL_PTR]], {{.*}}, %[[PARAMS]], %[[NULL]])

// -----

Expand Down

0 comments on commit e9866d5

Please sign in to comment.