-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix UCX support for cuML/RAFT in wheels packages #57
Comments
I'll take a look at the initial steps here since the UCX wheel build will probably be different enough from the other RAPIDS packages that the guidance I've given Michael so far may not apply, and since I have a pretty good idea of what it'll take to get it off the ground so that this particular wheel doesn't slip given the relative urgency for Peter's work. I'll try and get something minimal working ASAP so that we have confidence that we'll be able to move forward (otherwise we'll probably want to find another way to unblock Peter's work for 24.06, but I'd really like to avoid that since it's guaranteeing duplicate work). I'll coordinate a handoff with Michael soon too, though; he's getting far enough along with the corresponding libcudf work that a handoff shouldn't be hard. |
I have UCX wheels building in rapidsai/ucx-wheels#1. They needed a couple of tricky tweaks, but they nominally seem to work now. I'm testing them out with ucxx in rapidsai/ucxx#226. |
UCX wheels are now available, and the ucxx PR has been updated to use those wheels directly from the nightly index. CI is passing on the ucxx PR. We should be able to get that in soon. I've requested that someone from the build-infra team take a look at porting similar changes to ucx-py. If that is also urgent for 24.06, then it would probably be good to collaborate with Peter on that work to get it done as fast as possible. |
@vyasr and I talked about this and I'm gonna do this one. |
Contributes to rapidsai/build-planning#57. `libucx.load_library()` defined here tries to pre-load `libcuda.so` and `libnvidia-ml.so`, to raise an informative error (instead of a cryptic one from a linker) if someone attempts to use the libraries from this wheel on a system without a GPU. Some of the projects using these wheels, like `ucxx` and `ucx-py`, are expected to be usable on systems without a GPU. See rapidsai/ucx-py#1041 (comment). To avoid those libraries needing to try-catch these errors, this proposes the following: * removing those checks and deferring to downstream libraries to handle the non-GPU case * modifying the build logic so we can publish patched versions of these wheels like `v1.15.0.post1` ### Notes for Reviewers Proposing starting with `1.15.0.post1` right away, since that's the version that `ucx-py` will use. I'm proposing the following sequence of PRs here (assuming downstream testing goes well): 1. this one 2. another changing the version to `1.14.0.post1` 3. another changing the version to `1.16.0.post1`
Contributes to rapidsai/build-planning#57. Similar to rapidsai/ucxx#226, proposes using the new UCX wheels from https://github.com/rapidsai/ucx-wheels, instead of vendoring system versions of `libuc{m,p,s,t}.so`. ## Benefits of these changes Allows users of `ucx-py` to avoid needing system installations of the UCX libraries. Shrinks the `ucx-py` wheels by 6.7MB compressed (77%) and 19.1 MB uncompressed (73%). <details><summary>how I calculated that (click me)</summary> Mounting in a directory with a wheel built from this branch... ```shell docker run \ --rm \ -v $(pwd)/final_dist:/opt/work \ -it python:3.10 \ bash pip install pydistcheck pydistcheck --inspect /opt/work/*.whl ``` ```text ----- package inspection summary ----- file size * compressed size: 2.0M * uncompressed size: 7.0M * compression space saving: 71.3% contents * directories: 10 * files: 38 (2 compiled) size by extension * .so - 6.9M (97.7%) * .py - 0.1M (2.0%) * .pyx - 9.3K (0.1%) * no-extension - 7.1K (0.1%) * .pyi - 3.9K (0.1%) * .c - 1.7K (0.0%) * .txt - 39.0B (0.0%) largest files * (5.3M) ucp/_libs/ucx_api.cpython-310-x86_64-linux-gnu.so * (1.6M) ucp/_libs/arr.cpython-310-x86_64-linux-gnu.so * (36.3K) ucp/core.py * (20.3K) ucp/benchmarks/cudf_merge.py * (12.1K) ucp/benchmarks/send_recv.py ``` Compared to a recent nightly release. ```shell pip download \ -d /tmp/delete-me \ --prefer-binary \ --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple \ 'ucx-py-cu12>=0.38.0a' pydistcheck --inspect /tmp/delete-me/*.whl ``` ```text ----- package inspection summary ----- file size * compressed size: 8.7M * uncompressed size: 26.1M * compression space saving: 66.8% contents * directories: 11 * files: 65 (21 compiled) size by extension * .0 - 14.4M (55.4%) * .so - 8.4M (32.2%) * .a - 1.8M (6.7%) * .140 - 0.7M (2.5%) * .12 - 0.7M (2.5%) * .py - 0.1M (0.5%) * .pyx - 9.3K (0.0%) * no-extension - 7.3K (0.0%) * .la - 4.2K (0.0%) * .pyi - 3.9K (0.0%) * .c - 1.7K (0.0%) * .txt - 39.0B (0.0%) largest files * (8.7M) ucx_py_cu12.libs/libucp-5720f0c9.so.0.0.0 * (5.3M) ucp/_libs/ucx_api.cpython-310-x86_64-linux-gnu.so * (2.0M) ucx_py_cu12.libs/libucs-3c3009f0.so.0.0.0 * (1.6M) ucp/_libs/arr.cpython-310-x86_64-linux-gnu.so * (1.5M) ucx_py_cu12.libs/libuct-2a15b69b.so.0.0.0 ``` </details> ## Notes for Reviewers Left some comments on the diff describing specific design choices. ### The libraries from the `libucx` wheel are only used if a system installation isn't available Built a wheel in a container using the same image used here in CI. ```shell docker run \ --rm \ --gpus 1 \ --env-file "${HOME}/.aws/creds.env" \ --env CI=true \ -v $(pwd):/opt/work \ -w /opt/work \ -it rapidsai/ci-wheel:cuda12.2.2-rockylinux8-py3.10 \ bash ci/build_wheel.sh ``` </details> Found that the libraries from the `libucx` wheel are correctly found at build time, and are later found at import time. <details><summary>using 'rapidsai/citestwheel' image and LD_DEBUG (click me)</summary> ```shell # run a RAPIDS wheel-testing container, mount in the directory with the built wheel docker run \ --rm \ --gpus 1 \ -v $(pwd)/final_dist:/opt/work \ -w /opt/work \ -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \ bash ``` `rapidsai/citestwheel` does NOT the UCX libraries installed at `/usr/lib*`. ```shell find /usr -name 'libucm.so*' # (empty) ``` Installed the `ucx-py` wheel. ```shell # install the wheel pip install ./*.whl # now libuc{m,p,s,t} at found in site-packages find /usr -name 'libucm.so*' # (empty) find /pyenv -name 'libucm.so*' # /pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucm.so.0.0.0 # /pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucm.so.0 # /pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucm.so # try importing ucx-py and track where 'ld' finds the ucx libraries LD_DEBUG="files,libs" LD_DEBUG_OUTPUT=out.txt \ python -c "from ucp._libs import arr" # 'ld' creates multiple files... combine them to 1 for easier searching cat out.txt.* > out-full.txt ``` In that output, saw that `ld` was finding `libucs.so` first. It searched all the system paths before finally finding it in the `libucx` wheel. ```text 1037: file=libucs.so [0]; dynamically loaded by /pyenv/versions/3.10.14/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so [0] 1037: find library=libucs.so [0]; searching 1037: search path= (LD_LIBRARY_PATH) 1037: search path=/pyenv/versions/3.10.14/lib (RUNPATH from file /pyenv/versions/3.10.14/bin/python) 1037: trying file=/pyenv/versions/3.10.14/lib/libucs.so 1037: search cache=/etc/ld.so.cache 1037: search path=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3:/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2:/lib/x86_64-linux-gnu/tls/haswell/x86_64:/lib/x86_64-linux-gnu/tls/haswell:/lib/x86_64-linux-gnu/tls/x86_64:/lib/x86_64-linux-gnu/tls:/lib/x86_64-linux-gnu/haswell/x86_64:/lib/x86_64-linux-gnu/haswell:/lib/x86_64-linux-gnu/x86_64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3:/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2:/usr/lib/x86_64-linux-gnu/tls/haswell/x86_64:/usr/lib/x86_64-linux-gnu/tls/haswell:/usr/lib/x86_64-linux-gnu/tls/x86_64:/usr/lib/x86_64-linux-gnu/tls:/usr/lib/x86_64-linux-gnu/haswell/x86_64:/usr/lib/x86_64-linux-gnu/haswell:/usr/lib/x86_64-linux-gnu/x86_64:/usr/lib/x86_64-linux-gnu:/lib/glibc-hwcaps/x86-64-v3:/lib/glibc-hwcaps/x86-64-v2:/lib/tls/haswell/x86_64:/lib/tls/haswell:/lib/tls/x86_64:/lib/tls:/lib/haswell/x86_64:/lib/haswell:/lib/x86_64:/lib:/usr/lib/glibc-hwcaps/x86-64-v3:/usr/lib/glibc-hwcaps/x86-64-v2:/usr/lib/tls/haswell/x86_64:/usr/lib/tls/haswell:/usr/lib/tls/x86_64:/usr/lib/tls:/usr/lib/haswell/x86_64:/usr/lib/haswell:/usr/lib/x86_64:/usr/lib (system search path) 1037: trying file=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/tls/haswell/x86_64/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/tls/haswell/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/tls/x86_64/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/tls/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/haswell/x86_64/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/haswell/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/x86_64/libucs.so 1037: trying file=/lib/x86_64-linux-gnu/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/tls/haswell/x86_64/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/tls/haswell/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/tls/x86_64/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/tls/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/haswell/x86_64/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/haswell/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/x86_64/libucs.so 1037: trying file=/usr/lib/x86_64-linux-gnu/libucs.so 1037: trying file=/lib/glibc-hwcaps/x86-64-v3/libucs.so 1037: trying file=/lib/glibc-hwcaps/x86-64-v2/libucs.so 1037: trying file=/lib/tls/haswell/x86_64/libucs.so 1037: trying file=/lib/tls/haswell/libucs.so 1037: trying file=/lib/tls/x86_64/libucs.so 1037: trying file=/lib/tls/libucs.so 1037: trying file=/lib/haswell/x86_64/libucs.so 1037: trying file=/lib/haswell/libucs.so 1037: trying file=/lib/x86_64/libucs.so 1037: trying file=/lib/libucs.so 1037: trying file=/usr/lib/glibc-hwcaps/x86-64-v3/libucs.so 1037: trying file=/usr/lib/glibc-hwcaps/x86-64-v2/libucs.so 1037: trying file=/usr/lib/tls/haswell/x86_64/libucs.so 1037: trying file=/usr/lib/tls/haswell/libucs.so 1037: trying file=/usr/lib/tls/x86_64/libucs.so 1037: trying file=/usr/lib/tls/libucs.so 1037: trying file=/usr/lib/haswell/x86_64/libucs.so 1037: trying file=/usr/lib/haswell/libucs.so 1037: trying file=/usr/lib/x86_64/libucs.so 1037: trying file=/usr/lib/libucs.so 1037: 1037: file=/pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucs.so [0]; dynamically loaded by /pyenv/versions/3.10.14/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so [0] 1037: file=/pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucs.so [0]; generating link map 1037: dynamic: 0x00007f4ce42d7c80 base: 0x00007f4ce427e000 size: 0x000000000006fda0 1037: entry: 0x00007f4ce4290ce0 phdr: 0x00007f4ce427e040 phnum: 1 ``` Then the others were found via the RPATH entries on `libucs.so`. `libucm.so.0`: ```text 196: file=libucm.so.0 [0]; needed by /pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucs.so [0] 196: find library=libucm.so.0 [0]; searching 196: search path=...redacted...:/pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib (RPATH from file /pyenv/versions/3.10.14/lib/python3.10/site-packages/libucx/lib/libucs.so) ... ``` </details> However, the libraries from the `libucx` wheel appear to be **the last place `ld` searches**. That means that if you use these wheels on a system with a system installation of `libuc{m,p,s,t}`, that system installation's libraries will be loaded instead. <details><summary>using 'rapidsai/ci-wheel' image and LD_DEBUG (click me)</summary> ```shell docker run \ --rm \ --gpus 1 \ -v $(pwd)/final_dist:/opt/work \ -w /opt/work \ -it rapidsai/ci-wheel:cuda12.2.2-rockylinux8-py3.10 \ bash ``` `rapidsai/ci-wheel` has the UCX libraries installed at `/usr/lib64`. ```shell find /usr/ -name 'libucm.so*' # /usr/lib64/libucm.so.0.0.0 # /usr/lib64/libucm.so.0 # /usr/lib64/libucm.so ``` Installed a wheel and tried to import from it. ```shell pip install ./*.whl LD_DEBUG="files,libs" LD_DEBUG_OUTPUT=out.txt \ python -c "from ucp._libs import arr" cat out.txt.* > out-full.txt ``` In that situation, I saw the system libraries found before the one from the wheel. ```text 226: file=libucs.so [0]; dynamically loaded by /pyenv/versions/3.10.14/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so [0] 226: find library=libucs.so [0]; searching 226: search path=/pyenv/versions/3.10.14/lib (RPATH from file /pyenv/versions/3.10.14/bin/python) 226: trying file=/pyenv/versions/3.10.14/lib/libucs.so 226: search path=/pyenv/versions/3.10.14/lib (RPATH from file /pyenv/versions/3.10.14/bin/python) 226: trying file=/pyenv/versions/3.10.14/lib/libucs.so 226: search path=/opt/rh/gcc-toolset-11/root/usr/lib64/tls:/opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib (LD_LIBRARY_PATH) 226: trying file=/opt/rh/gcc-toolset-11/root/usr/lib64/tls/libucs.so 226: trying file=/opt/rh/gcc-toolset-11/root/usr/lib64/libucs.so 226: trying file=/opt/rh/gcc-toolset-11/root/usr/lib/libucs.so 226: search cache=/etc/ld.so.cache 226: trying file=/usr/lib64/libucs.so ``` In this case, when the system libraries are available, `site-packages/libucx/lib` isn't even searched. </details> To avoid any RAPIDS-specific stuff tricking me, I tried in a generic `python:3.10` image. Found that the library could be loaded and all the `libuc{m,p,s,t}` libraries from the `libucx` wheel are found 🎉 . <details><summary>using 'python:3.10' wheel (click me)</summary> ```shell docker run \ --rm \ --gpus 1 \ -v $(pwd)/final_dist:/opt/work \ -w /opt/work \ -it python:3.10 \ bash pip install \ --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple \ ./*.whl LD_DEBUG="files,libs" LD_DEBUG_OUTPUT=out.txt \ python -c "from ucp._libs import arr" ``` 💥 ```text 16: opening file=/usr/local/lib/python3.10/site-packages/libucx/lib/libucm.so.0 [0]; direct_opencount=1 16: 16: opening file=/usr/local/lib/python3.10/site-packages/libucx/lib/libucs.so [0]; direct_opencount=1 ``` </details> Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #1041
Contributes to rapidsai/build-planning#57. Follow-up to #226. Proposes the following changes for wheel builds: * removing system-installed UCX *headers* * making the code to remove system-installed UCX libraries a bit more specific - *(to minimize the risk of accidentally deleting some non-UCX thing who name matches the pattern `libuc*`)* ## Notes for Reviewers Before applying similar changes to `ucx-py`, I noticed it being compiled with the system-installed headers but then linking against the libraries provided by the `libucx` wheels: rapidsai/ucx-py#1041 (comment) This change should reduce the risk of that happening. ### How I tested this Poked around the filesystem that `build_wheel.sh` runs in by pulling one of our standard wheel-building container images used in CI. ```shell docker run \ --rm \ -v $(pwd):/opt/work \ -w /opt/work \ -it rapidsai/ci-wheel:cuda12.2.2-rockylinux8-py3.10 \ bash find /usr -type f -name 'libucm*' # /usr/lib64/libucm.la # /usr/lib64/libucm.a # /usr/lib64/libucm.so.0.0.0 # /usr/lib64/ucx/libucm_cuda.a # /usr/lib64/ucx/libucm_cuda.la # /usr/lib64/ucx/libucm_cuda.so.0.0.0 find /usr -type d -name 'uct' # /usr/include/uct ``` Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #230
PR re-enabling the raft-dask wheel tests: rapidsai/raft#2307 |
Correct me if I'm wrong @pentschev , but I think all that remains for this issue is rapidsai/cuml#5697, right? |
Presumably we want something similar for cugraph, but otherwise yes I think that's right. |
@pentschev given the testing portion of this issue remains which I believe is being worked by you, I am adding you as a co-assignee on this issue apart from James. |
@pentschev do you plan to put together a cugraph PR for testing ucx-py/ucxx? Once that's done I believe that this issue can be closed. |
I think that would be good, but I definitely won't have the bandwidth for the next month or so. |
That's fine. I'm going to close this issue because the underlying ask (fixing UCX support) has been addressed. We now have working wheels for both ucxx and ucx-py, and the cuml testing provides us a good indication of that. The cugraph testing is a nice-to-have, but at this point I'd say it tells us more about whether cugraph's usage of UCX than about the correctness of our ucx* wheels. Feel free to add the cugraph testing whenever you have time. |
Thanks @vyasr and @jameslamb for all the work to make UCX wheels working (again?)! 😄 |
Thanks @pentschev for your patience and for all the help! |
For rapidsai/build-planning#57, #1041 switched `ucx-py` over to `libucx` wheels. To test that that was working, it added some code to building scripts to remove system installations of UCX libraries. That should no longer be necessary as of rapidsai/ci-imgs#154. This proposes removing that code for managing system dependencies of UCX libraries, to simplify those build scripts a bit. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1053
For rapidsai/build-planning#57, #226 switched `ucxx` over to `libucx` wheels. To test that that was working, it added some code to building scripts to remove system installations of UCX libraries. That should no longer be necessary as of rapidsai/ci-imgs#154. This proposes removing that code for managing system dependencies of UCX libraries, to simplify those build scripts a bit. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Mike Sarahan (https://github.com/msarahan) URL: #241
In the context of adding support for UCXX in RAFT and cuML, we've found out that UCX wheels support in those libraries is broken. So far there are no UCX tests running in CI and probably few people running wheels also run UCX, and thus this problem has gone unnoticed possibly for months or even since the existence of wheels packages.
I've attempted to enable UCX-Py tests in rapidsai/cuml#5843 but they segfault, the PR is a simplification of rapidsai/cuml#5697 where both UCXX and UCX-Py are being added, but both segfault similarly and also depends on rapidsai/raft#1983, but before merging the RAFT PR we want to make sure that cuML runs appropriately with UCXX and thus requires UCX support to be fixed.
I've earlier this week discussed with @vyasr about this issue and ideally we would have it fixed by 24.06, since it blocks UCXX from being tested across RAPIDS and thus delays it schedule once more. One way to resolve this would be making UCX a proper wheels package which would then be dynamically linked to, a repo for that has been created in https://github.com/rapidsai/ucx-wheels, and the latest news I had was that @msarahan would be working on that sometime soon.
@mmccarty as per your request. 🙂
The text was updated successfully, but these errors were encountered: