-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lld hangs targeting amdgpu and so does opt using amdgpu-attributor pass #58639
Comments
@llvm/issue-subscribers-backend-amdgpu |
Are you sure llvm-reduce didn't break it? This is all dead code, it's just deleted: |
It's still broken whether or not the code is functional, but the deadness is a red herring. If there is a real use, I still observe the hang / stack overflow. I cut down the test slightly, but I don't see this reproduce with tip of tree. Probably need to bisect this to see if it was deliberately fixed
|
I bisected with @arsenm and we found that @jdoerfert fixed the hang in bf789b1. Below I describe how I created this reproducer and bisected. My llvm installationI am using rocm to target mi250x on linux. My code is written in c++ and uses hip. I can build with rocm 5.2.3, but when I try 5.3.0 my lld process hangs. Making a reproducerI attached gdb to the lld process. I cut off the stacktrace at 1 million frames. Many were Minimizing the reproducerMy reproducer was almost 400,000 lines long (
and ran After about 1 hour, llvm-reduce terminated. It output a reduced.ll that was only 51 lines, an 8000x improvement! I ran Running git bisectThus far I had only run tools in my rocm install's I created a build space (because each step of the bisection needs to build and run opt).
I started the bisection with
where
and
I ran the job on a system with 2 18-core broadwell sockets and 128 GB of DRAM per node. The job finished after about 2 hours. I ran
Thus, bf789b1 fixed the hang. Action items@arsenm will ensure that bf789b1 makes it into the next rocm release (I tried to cherry-pick bf789b1 on the rocm-5.3.x branch of RadeonOpenCompute's llvm-project fork but I got a conflict.) |
Pushed testcase in bcedeef |
The following hangs for me,
opt --amdgpu-attributor hang.ll -o foo.bc
wherehang.ll
isThe text was updated successfully, but these errors were encountered: