Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost: 1.7.4 -> 1.7.5 #224150

Merged
merged 1 commit into from
Apr 21, 2023
Merged

xgboost: 1.7.4 -> 1.7.5 #224150

merged 1 commit into from
Apr 21, 2023

Conversation

nviets
Copy link
Contributor

@nviets nviets commented Apr 1, 2023

Description of changes

WIP. Updating xgboost to latest version, including a bump to cudaPackages_11_8. C++ library builds with and without Cuda support, but the R library fails on the error below:

➜ nix-build -E "with (import $NIXPKGS{}); let xgb = xgboost.override{rLibrary = true; cudaSupport = true; cudaPackages = cudaPackages_11_8; doCheck = false;}; in rWrapper.override{ packages = [ xgb ]; }"
...
[ 98%] Built target objxgboost
[ 99%] Building CXX object CMakeFiles/runxgboost.dir/src/cli_main.cc.o
[ 99%] Linking CXX executable /build/source/xgboost
/nix/store/8qm6sjqa09a03glzmafprpp69k74l4lm-binutils-2.40/bin/ld: /nix/store/vl893j5kphwcnqyf3qrxcmmjc8zrfa5q-icu4c-72.1/lib/libicuuc.so.72: undefined reference to `std::condition_variable::wait(std::unique_lock<std::mutex>&)@GLIBCXX_3.4.30'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/runxgboost.dir/build.make:325: /build/source/xgboost] Error 1
make[1]: *** [CMakeFiles/Makefile2:224: CMakeFiles/runxgboost.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

@trivialfis - any suggestions on what I can try next?

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin

Sorry, something went wrong.

@ofborg ofborg bot requested a review from abbradar April 1, 2023 01:02
@ofborg ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 labels Apr 1, 2023
@bcdarwin
Copy link
Member

bcdarwin commented Apr 1, 2023

The R CUDA issue might be related to #220341

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 1, 2023

RE: possible relation to the libstdc++ mismatch issue

So far we've only observed runtime issues (a process loads gcc11's libstdc++, then dlopen's a library that seeks for symbols from gcc12's libstdc++). Your error message mentions ld, so it's a new one, but let's see further.

Footnotes

  1. This script for gisting failed logs may come in handy: https://gist.github.com/ConnorBaker/b32a7f69d318e3f338b6b4fedeef37ef, e.g.

    (nixpkgs-review pr 224150 --extra-nixpkgs-config '{ cudaCapabilities = [ "8.6" ]; }' --post-result --eval local --no-shell |& tee XXXX.log) && ./nixpkgs_review_failure_helper.sh XXXX.log
    

    or

    (nixpkgs-review pr 224150 --extra-nixpkgs-config '{ cudaSupport = true; cudaCapabilities = [ "8.6" ]; }' --post-result --eval local --no-shell |& tee XXXX.log) && ./nixpkgs_review_failure_helper.sh XXXX.log
    

@nviets
Copy link
Contributor Author

nviets commented Apr 5, 2023

Result of nixpkgs-review pr 224150 --extra-nixpkgs-config '{ cudaSupport = true; cudaCapabilities = [ "8.6" ]; }' run on x86_64-linux 1

5 packages failed to build:
  • python310Packages.xgboost
  • python310Packages.xgboost.dist
  • python311Packages.xgboost
  • python311Packages.xgboost.dist
  • xgboost (xgboostWithCuda)

@nviets
Copy link
Contributor Author

nviets commented Apr 5, 2023

@SomeoneSerge - thanks for the suggestions. Not sure if I used nixpkgs-review correctly there. I tried updating from the master branch and using #223664, but now I'm having trouble building cudaPackages on my machine. The nixpkgs-review didn't seem to publish the error, but I'll try to share manually soon. xgboost 1.7.5 now requires 11.8, which I'm setting in this PR. I'll keep an eye on CUDA and give this another go soon.

@SomeoneSerge
Copy link
Contributor

Re: cuda 11.8

CC #222778

@trivialfis
Copy link

Hi, the CUDA requirement is updated to 11.8 only for compatibility with xgboost's binary wheel pipeline. If it's causing too much trouble, a lower version (like 11.5) can be used instead. We haven't backported any CUDA feature related change to the 1.7.5 release.

@nviets
Copy link
Contributor Author

nviets commented Apr 13, 2023

Thanks @trivialfis - I'll try rolling back the CUDA version as you suggested. I was waiting to see if #220341 resolved the trouble.

Do you see anything wrong with the cmake flags I have set? I originally referenced your work in tests/ci_build/build_r_pkg_with_cuda.sh. Is that script building inside the Centos 7 container with R 3.3.0?

The only combination that's failing in this PR is rLibrary + cudaSupport.

@nviets
Copy link
Contributor Author

nviets commented Apr 14, 2023

I'm getting the same error with cudaPackages_11_7 now, so it looks like something is going wrong with an input or some sort of library linkage.

@Ericson2314, @artturi - xgboost with R/CUDA support started to fail a few weeks ago on the error below. I'm not familiar with icu4c and wondered if you could advise on what I might try to fix the build.

➜ nix-build -E "with (import $NIXPKGS{}); let xgb = xgboost.override{rLibrary = true; cudaSupport = true; cudaPackages = cudaPackages_11_7; doCheck = false;}; in rWrapper.override{ packages = [ xgb ]; }"
[1/1/4 built, 0.0 MiB DL] building r-xgboost-1.7.5 (buildPhase): [ 98%] Building CUDA object src/CMakeFiles/objxgboost.dir/tree/updater_gpu_hist.cu.o
error: builder for '/nix/store/a5nsvrhrgyc31j47iqwsykz6417zjyyb-r-xgboost-1.7.5.drv' failed with exit code 2;
       last 10 log lines:
       > [ 97%] Building CUDA object src/CMakeFiles/objxgboost.dir/tree/gpu_hist/row_partitioner.cu.o
       > [ 98%] Building CUDA object src/CMakeFiles/objxgboost.dir/tree/updater_gpu_hist.cu.o
       > [ 98%] Built target objxgboost
       > [ 99%] Building CXX object CMakeFiles/runxgboost.dir/src/cli_main.cc.o
       > [ 99%] Linking CXX executable /build/source/xgboost
       > /nix/store/8qm6sjqa09a03glzmafprpp69k74l4lm-binutils-2.40/bin/ld: /nix/store/vl893j5kphwcnqyf3qrxcmmjc8zrfa5q-icu4c-72.1/lib/libicuuc.so.72: undefined reference to `std::condition_variable::wait(std::unique_lock<std::mutex>&)@GLIBCXX_3.4.30'
       > collect2: error: ld returned 1 exit status
       > make[2]: *** [CMakeFiles/runxgboost.dir/build.make:325: /build/source/xgboost] Error 1
       > make[1]: *** [CMakeFiles/Makefile2:224: CMakeFiles/runxgboost.dir/all] Error 2
       > make: *** [Makefile:156: all] Error 2
       For full logs, run 'nix log /nix/store/a5nsvrhrgyc31j47iqwsykz6417zjyyb-r-xgboost-1.7.5.drv'.
error: 1 dependencies of derivation '/nix/store/1d1f73xci70d9zj2snnbmfcc8pfz700f-R-4.2.3-wrapper.drv' failed to build
error: 1 dependencies of derivation '/nix/store/jwj4n0w8a6qjnbxl8nvq6lj4pna0s2wm-run-r.drv' failed to build

Sorry I'm not able to share a nixpkgs-review, but I didn't know how to override the default arguments to build the rLibrary configuration.

@SomeoneSerge
Copy link
Contributor

/nix/store/8qm6sjqa09a03glzmafprpp69k74l4lm-binutils-2.40/bin/ld: /nix/store/vl893j5kphwcnqyf3qrxcmmjc8zrfa5q-icu4c-72.1/lib/libicuuc.so.72: undefined reference to `std::condition_variable::wait(std::unique_lockstd::mutex&)@GLIBCXX_3.4.30'

It's a known error. We partially addressed this with #223664, but since we didn't properly extend wrapCCWith to support our use-case, there still are exceptions: #225661

Maybe you could try https://github.com/NixOS/nixpkgs/blob/93b1f3fb0cf6303faf15908e5057373aa118ff47/pkgs/development/libraries/science/math/faiss/default.nix#L39 meanwhile

@nviets
Copy link
Contributor Author

nviets commented Apr 14, 2023

Thanks for the ideas, @SomeoneSerge. I didn't have any luck with backendStdenv, so I'll keep an eye on #226165. Hopefully, it will do the trick here.

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 15, 2023

Odd, cause I think it helped me at least on master: SomeoneSerge@bf67283

❯ nix-build -E "with (import ./. {config.allowUnfree = true;}); let xgb = xgboost.override{rLibrary = true; cudaSupport = true; cudaPackages = cudaPackages_11_7; doCheck = false;}; in rWrapper.override{ packages = [ xgb ]; }"
/nix/store/iz07pw24ndhlyvsg48laq0yp6w5g1dag-R-4.2.3-wrapper

@nviets nviets marked this pull request as ready for review April 15, 2023 15:09
@nviets
Copy link
Contributor Author

nviets commented Apr 15, 2023

Wow, thanks for all your tips @SomeoneSerge. The R library is finally building, and I tested against several versions of CUDA. The libstdc++ interactions with CUDA are something I hadn't seen before, and I really appreciate what you're doing with CUDA. I'll have another look at the build once #226165 is ready for a test.

This PR is ready for merge.

@SomeoneSerge
Copy link
Contributor

Result of nixpkgs-review pr 224150 --extra-nixpkgs-config '{ cudaCapabilities = [ "8.6" ]; }' run on x86_64-linux 1

6 packages built:
  • python310Packages.xgboost
  • python310Packages.xgboost.dist
  • python311Packages.xgboost
  • python311Packages.xgboost.dist
  • xgboost
  • xgboostWithCuda

@SomeoneSerge SomeoneSerge added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Apr 16, 2023
@nviets
Copy link
Contributor Author

nviets commented Apr 21, 2023

@Mindavi - could you help with a merge? Thank you!

@Mindavi Mindavi merged commit c8981de into NixOS:master Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 11.by: package-maintainer This PR was created by the maintainer of the package it changes 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants