Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: flang-new: runtime and math functions don't link for OpenMP target regions #201

Open
VeeEM opened this issue Nov 8, 2024 · 4 comments

Comments

@VeeEM
Copy link

VeeEM commented Nov 8, 2024

Problem Description

I get many linker errors for OpenMP target regions when offloading to GPU. Symbols from libFortranRuntime show as undefined and so do some math intrinsics like cosh.

There are some other math intrinsics that do link successfully, like tanh.

@sfantao

Operating System

SUSE Linux Enterprise Server 15 SP5 (Cray OS on LUMI)

CPU

AMD EPYC 7742 64-Core

GPU

AMD Instinct MI250X

ROCm Version

ROCm 6.2.2

ROCm Component

flang

Steps to Reproduce

flang-new --version
AMD AFAR drop #4.0 9/28/24 flang-new version 20.0.0git (ssh://gerritgit/lightning/ec/llvm-project amd-feature/atd-fortran/2024.09.28 24385 1ad3ac337fa4b1a5a7621a4c5480028b54fffada)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin
Build config: +assertions
$ cat link.F90 
program link
implicit none
real :: r
real, dimension(5) :: xs

!$omp target map(xs, r)
xs = 2
xs = modulo(xs, 3)
r = cosh(r)
r = tanh(r)
!$omp end target

end program
flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 link.F90
ld.lld: error: undefined symbol: _FortranAAssign
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)

ld.lld: error: undefined symbol: _FortranAModuloReal8
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)

ld.lld: error: undefined symbol: cosh
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
/pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin/clang-linker-wrapper: error: 'clang' failed
flang-new: error: linker command failed with exit code 1 (use -v to see invocation)

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@bcornille
Copy link

bcornille commented Nov 25, 2024

In the user guide we have documented that adding -lFortranRuntimeHostDevice to the link line will resolve these link issues. As noted, using this device version of the FortranRuntime will result in low performance but allow linking and running of user programs. We however very much appreciate the reports of what functionality is needed by user codes from the runtime so the runtime calls can be circumvented. E.g. one would expect cosh to be able to be lowered directly without a call to the Fortran runtime.

@bcornille
Copy link

Math functions may alternatively require -lm when linking.

@VeeEM
Copy link
Author

VeeEM commented Nov 26, 2024

Thanks! With the drop 4.2 compiler I am able to link the runtime with -lFortranRuntimeHostDevice. I'm curious, why is performance poor with the device runtime? Is it just overhead from calling library functions or something else entirely? The program I'm working on uses assign, dot_product, mod, modulo and sum in some target regions.

With the math functions I do still have the same problem, adding -lm to the compiler invocation does not help with linking cosh. tanh works fine, just as before.

$ flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 -lFortranRuntimeHostDevice -lm link.F90
ld.lld: error: undefined symbol: cosh
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)

If I compile with --save-temps and look into link-openmp-amdgcn-amd-amdhsa-gfx90a-llvmir.mlir, I see that the symbols for cosh and tanh look quite different to eachother. cosh is cosh, but the symbol for tanh is __ocml_tanh_f64.

@bcornille
Copy link

I've opened an internal ticket regarding cosh so we will investigate. A drop 4.3 is available (need to update the user guide still, https://repo.radeon.com/rocm/misc/flang/). It may improve some of the assignment performance issues. The runtime is not really a device optimized library and is mostly the existing runtime compiled for device, which is a highly templated C++ library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants