-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails with ROCm on Gentoo Linux #81
Comments
There are other issues about compiling ROCm which you can investigate. Unfortunately, those issues are really coming from Bazel, so there may not be much we can do from this project. |
@josevalim For now I don't have any ideas, but I can work on my setup if you have some. I saw that not much people use |
Yeah, it's really Bazel and XLA. ROCm is definitely not as prioritized and widely used, so there seem to be more issues with getting the build environment right. I would try building the binary within Docker, see #63 (comment). |
@jonatanklosko may I ask how are you able to build the binary in docker? I am trying to reproduce it in Linux machine using the provided Dockerfile and I get a ton of errors, I am able to solve some, but I reach a point where it seems I need to start modifying code in the libraries not only in the environment. |
@jalberto interesting, the build itself doesn't require an actual GPU, so the Docker build should be reproducible. What kind of errors are you getting? |
@jonatanklosko I tried in a clean env, with a new clone of the repo, I also remove 1st error, easy to solve:
after that fix, we are in the correct path: After a while:
Then I changed
|
@jalberto ah yeah, the first error is because I removed the file and forget to update, I've just fixed on main. The build error is very confusing, I was suspecting the base image may have changed, but it hasn't. I can't think of anything else that could've changed since I built using that image :< |
@jalberto I've just run |
@jonatanklosko that could be, but I am not mounting any device, so the container has not access to I will continue trying around, maybe is my system, but the main reason to use containers to build is to isolate from the host, so it is very odd |
@Eiji7 you can try the new release and use ROCm 6.0, see #82 (comment). |
@jonatanklosko Oh, that's definitely interesting, however I would need to wait for
|
Yeah, it looks like latest XLA requires 6.0+, so I think this ship has sailed on this side. I don't think there's anything else we can do for 5.7, so I'm going to close this in favour of #82. Feel free to drop more comments if anything changes! |
For what it's worth, IREE might be able to provide a way out. We're focusing on Metal support, but we just might get ROCm "for free" |
Hi, I have
Gentoo Linux
with latest updates.I was fighting with
ROCm
support and ended up with this package set:with following
USE
flags forgcc
:and such environment variables:
Regardless of what I should and can install there are lots of weird problems:
TF_ROCM_AMDGPU_TARGETS
is set in code without a way to change it and is set to:"gfx900,gfx906,gfx908,gfx90a,gfx1030"
. Not only this builds support for manyGPUs
which rarely is important, but also I need to editxla
source code to support new cards (my usesgfx1100
)rocm_configure.bzl
only in theory supportsROCM_PATH
which is not/opt/rocm
or/opt/rocm-version
. In practice it forces some paths to be withinhip
androctracer
sub-directories which is not a case for installingROCm
packages in/usr
like:/usr/lib64/libamdhip64.so
. The file tries few path versions which is nice as long as it does not assumes putting a sub-directory. I would not be surprised if such sub-directory would have each case, but it's about 2 of 12 libsxla
does not specify a dependencies list - reading all of that error messages and not ending up with a working setup is truly exhausting 😮💨The only know success builds are using old
gcc
versions which is a serious problem onprod
machinesmeanwhile
emerge
command returns:Of course nobody expects support of a
14.0.1_pre*
releases ofGCC
, but requiring at most 5 versions major versions back excluding even latest updates for9.x
branch is a critical issue for aprod
machines.Anyway, I have tried to use
GCC
version8.5
as well as13.2.1
withclang
version16
and17
, but none of them compiled successfully.Firstly the logs before fixing
rocm_configure.bzl
:After mentioned fix:
Somehow it does not detects properly the
gcc
. Surprisingly by default it's specific location is not in thePATH
variable:The final result is:
However the header files already existing within
gcc
installation:/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13
. What have surprised me is lots ofLoding:
lines without any other information. In last build attempt the number of such lines decreased to just 2. Maybe I still don't have 2 things found or installed?So far I was unmasking unsupported packages, compilling few configurations of
gcc
andclang
and even editing source files. I'm a bit tired today and it would be a big relief if somebody could help me with this environment setup. Have I missed something? Are newAMD
GPUs even supported? Or maybe there are other problems in source files? Maybe should I try some unreleased branches?Here are some information about my setup:
The text was updated successfully, but these errors were encountered: