add build wheel script and accompanying version info #167

msarahan · 2024-01-17T19:18:33Z

This PR adds wheel builds for ucxx. A follow-up PR will add wheel builds for distributed-ucxx.

Closes #145

…able

python/CMakeLists.txt

cpp/CMakeLists.txt

ci/build_wheel.sh

pentschev · 2024-01-18T10:24:22Z

@vyasr do we have a guideline on what we normally test/should test for wheels builds? Are other projects running the full test set or only a subset of everything that runs for conda? For now I've only added subset of the more common uses as reference.

pentschev · 2024-01-29T20:37:39Z

ci/test_wheel_distributed_ucxx.sh

+# TODO: We need distributed installed in developer mode to provide test utils,
+# we still need to match to the `rapids-dask-dependency` version.
+rapids-logger "Install Distributed in developer mode"
+git clone https://github.com/dask/distributed /tmp/distributed
+python -m pip install -e /tmp/distributed


@vyasr we don't need to address this immediately, but do you think it would work if we had optional dependencies for tests in https://github.com/rapidsai/rapids-dask-dependency/blob/ac821e6a3e396340f65fe79dc834f5b711d3b0cb/pip/rapids-dask-dependency/pyproject.toml#L14-L17 , where we essentially pip install -e the packages? I don't know if that's possible -- and I think it isn't -- but would like to know if you have any ideas how this could be done. We would also need something similar for conda packages, I currently pip install -e in conda CI as well which isn't a good long-term solution.

I had a question about this as well. Let's discuss in this thread.

@pentschev Can you explain a bit more about this requirement, and how you would resolve the TODO here? It looks like this is a requirement to run distributed-ucxx tests, but it isn't available in the distributed package? Having a dependency on installing distributed from source (as opposed to something available in the wheel) might make it hard to reproduce CI failures in a local development environment.

Also adding here what I wrote in the other thread:

Yes, we need some testing dependencies from distributed, specifically

ucxx/python/distributed-ucxx/distributed_ucxx/tests/test_ucxx.py

Line 173 in 94a19fd

from distributed.comm.tests.test_comms import check_deserialize

I don't know of a good solution for this problem either though and I very much don't want to copy-paste the code in here because that will eventually cause us to get out-of-sync with Distributed code and essentially mean we're not testing it properly anymore.

Do you think dask devs would be open to moving some of the functionality of distributed.comm.tests.test_comms into a utility module that is actually shipped with the package? It sounds like everything that we're doing here is to work around the fact that we're trying to use functionality that they explicitly and intentionally do not ship. Before we look into strange workarounds like this one, is there an alternative that involves them actually shipping the necessary APIs?

You're right that they intentionally do not ship because they are only used in that particular test file. I very much wanted to avoid having to modify things there, the file is reasonably large with many functions that would need to move and we would have to bear the burden of properly documenting/annotating those functions that I'm not even familiar with internals, and eventually we would also need to bear the burden of some maintenance, which we currently don't have the bandwidth to. If we can't find a reasonably simple solution to install Distributed in dev mode, I think I'd prefer to limit or skip that test for now.

I'd rather limit or skip. This logic:

I very much don't want to copy-paste the code in here because that will eventually cause us to get out-of-sync with Distributed code and essentially mean we're not testing it properly anymore.

feels directly contradictory to

we would have to bear the burden of properly documenting/annotating those functions that I'm not even familiar with internals, and eventually we would also need to bear the burden of some maintenance, which we currently don't have the bandwidth to

If we don't want to bear the maintenance burden then I'd much rather not try and take on some partial maintenance burden via a hacky installation of distributed here that forces us to stay partially in sync.

I don't see where the contradiction as the statements are talking about two different aspects:

Copy-pasting those function in distributed-ucxx means they will eventually become out-of-sync with the actual implementation from Distributed, which is the source-of-truth;

The maintenance burden I'm referring to is the maintenance burden of those functions in Distributed as we will now have "some authorship" because we're requesting they live as part of distributed packages and will have to document and properly type-annotate them, etc.

IOW, we still want to use them here for testing purposes only but I'd very much avoid touching much of Distributed code if we don't absolutely need to.

Right, but right now we implicitly incur part of the maintenance burden of 2 because if dask decides to change the behavior of that function in some way we'll see it in our test. We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. Conversely, if it was public there would be a higher maintenance burden (because we'd have to maintain docs/annotations/etc as you say) but we would also be less likely to be broken without warning (of course, given dask's low stability it could still happen, but it reduces the odds). The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.

In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.

if dask decides to change the behavior of that function in some way we'll see it in our test.

Yes, we definitely want to see any of that. If the behavior changed there's a high chance we need to do adjustments on our end, which we must catch ASAP.

We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. [...] given dask's low stability it could still happen, but it reduces the odds [...]

I agree with you here, but for now this is a price I'm willing to pay to have an extra 1% confidence that we're less likely to be broken with little/no warning due to something we're skipping tests for.

The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.

I absolutely agree with you and it's annoying. What if I would then split that one test (and perhaps others in the future) into a tests_internals directory and run everything else with "vanilla" Distributed, and have essentially the following:

pytest python/distributed-ucxx/distributed_ucxx/tests/ pip install -e ... pytest python/distributed-ucxx/distributed_ucxx/tests_internals/

Does that sound like a reasonable tradeoff to you? I just want to test as much as possible and sometimes that means we need to test bits that are not part of Distributed's public API, like this.

In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.

Agreed, hopefully what I'm proposing above is a reasonable tradeoff for now.

I think the split you proposed for getting extra coverage in CI makes sense. Let's make that change in a follow-up PR where you apply it to both conda and pip testing.

bdice

Just a few comments to address, which need some input from others.

ci/test_wheel_distributed_ucxx.sh

ci/test_wheel_ucxx.sh

ci/test_wheel_distributed_ucxx.sh

ci/test_wheel_ucxx.sh

ci/test_wheel_distributed_ucxx.sh

…ld-ci

msarahan

I'm amazed and horrified by the library manipulation that has to be done, but needs must. I'm glad that the three of you are so knowledgeable. I can't approve this, but I don't have anything left to add that would improve this.

vyasr · 2024-01-30T19:01:59Z

I'm amazed and horrified by the library manipulation that has to be done, but needs must. I'm glad that the three of you are so knowledgeable. I can't approve this, but I don't have anything left to add that would improve this.

Yeah it's pretty horrific. At least this time around I already knew what was needed, the first time was an adventure to figure out why cuml dask tests were seg faulting 😅 Especially since we don't have MG CI for wheel tests so that was a bit of a last-minute discovery for wheels.

The full history's in this ucx-py PR. A couple of long live debugging sessions with me and Ben Z to get this working IIRC.

vyasr

This is very close now, we should be able to merge soon.

.github/workflows/build.yaml

.github/workflows/pr.yaml

ci/build_wheel.sh

ci/build_wheel_distributed_ucxx.sh

ci/build_wheel_ucxx.sh

ci/wheel_smoke_test_distributed_ucxx.py

ci/wheel_smoke_test_ucxx.py

cpp/CMakeLists.txt

Co-authored-by: Vyas Ramasubramani <[email protected]>

pentschev

I think I've addressed all your comments @vyasr , please have another look.

ci/build_wheel.sh

pentschev · 2024-01-30T22:26:08Z

ci/test_wheel_distributed_ucxx.sh

+# TODO: We need distributed installed in developer mode to provide test utils,
+# we still need to match to the `rapids-dask-dependency` version.
+rapids-logger "Install Distributed in developer mode"
+git clone https://github.com/dask/distributed /tmp/distributed
+python -m pip install -e /tmp/distributed


I don't see where the contradiction as the statements are talking about two different aspects:

Copy-pasting those function in distributed-ucxx means they will eventually become out-of-sync with the actual implementation from Distributed, which is the source-of-truth;

The maintenance burden I'm referring to is the maintenance burden of those functions in Distributed as we will now have "some authorship" because we're requesting they live as part of distributed packages and will have to document and properly type-annotate them, etc.

IOW, we still want to use them here for testing purposes only but I'd very much avoid touching much of Distributed code if we don't absolutely need to.

ci/wheel_smoke_test_distributed_ucxx.py

ci/test_common.sh

ci/wheel_smoke_test_distributed_ucxx.py

ci/wheel_smoke_test_ucxx.py

cpp/CMakeLists.txt

vyasr

A couple of minor leftover items then we should be good to go.

ci/build_wheel_distributed_ucxx.sh

ci/build_wheel_ucxx.sh

vyasr · 2024-01-30T23:15:26Z

ci/test_wheel_distributed_ucxx.sh

+# TODO: We need distributed installed in developer mode to provide test utils,
+# we still need to match to the `rapids-dask-dependency` version.
+rapids-logger "Install Distributed in developer mode"
+git clone https://github.com/dask/distributed /tmp/distributed
+python -m pip install -e /tmp/distributed


Right, but right now we implicitly incur part of the maintenance burden of 2 because if dask decides to change the behavior of that function in some way we'll see it in our test. We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. Conversely, if it was public there would be a higher maintenance burden (because we'd have to maintain docs/annotations/etc as you say) but we would also be less likely to be broken without warning (of course, given dask's low stability it could still happen, but it reduces the odds). The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.

In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.

Co-authored-by: Vyas Ramasubramani <[email protected]>

bdice

LGTM! Thanks for all the teamwork on this!

bdice · 2024-01-31T20:57:59Z

/merge

add build wheel script and accompanying version info

28afe76

msarahan requested review from a team as code owners January 17, 2024 19:18

move pip args to requirements output

a92cf98

msarahan requested a review from a team as a code owner January 17, 2024 19:57

vyasr added 11 commits January 17, 2024 21:13

Remove hardcoded versions

14c8799

Fix sed expressions so that they run (overzealous, but OK)

d48d5a4

Find Development.Embed component to make Python3::Python target avail…

99b940d

…able

SKBUILD_CMAKE_ARGS is semicolon-separated

7e2426c

Don't change to dir and just use absolute paths everywhere

955b360

Make sure ucxx_python is also installed

6c894db

Make sure all associated targets are set

a175ee7

Enabling Python is now automatic

ef88198

Don't load gtest if not necessary

a0c72b0

Consolidate rmm logic and ensure it is only called when requested

0d6c53a

Properly enable Python by default

b7a9d6e

pentschev reviewed Jan 17, 2024

View reviewed changes

python/CMakeLists.txt Show resolved Hide resolved

cpp/CMakeLists.txt Show resolved Hide resolved

vyasr and others added 2 commits January 17, 2024 22:28

Switch back to raw find_package

df1150d

Make ci/test_wheel.sh executable

d0e2046

vyasr reviewed Jan 17, 2024

View reviewed changes

ci/build_wheel.sh Outdated Show resolved Hide resolved

Remove policy from upload artifact

697c799

vyasr reviewed Jan 18, 2024

View reviewed changes

ci/build_wheel.sh Outdated Show resolved Hide resolved

vyasr and others added 4 commits January 17, 2024 21:04

Also update cudf

e24e23c

Fix cupy dependencies and patching

3580d0b

Move CI C++/Python test implementations to common script

cd01361

Add basic wheel test set

47ed425

pentschev added 3 commits January 18, 2024 02:57

Fix BINARY_PATH

fd4aab7

Check for ucx_info presence before running

5f8b0c9

Fix ucx_info chek

3742bb5

pentschev reviewed Jan 29, 2024

View reviewed changes

bdice reviewed Jan 29, 2024

View reviewed changes

ci/test_wheel_distributed_ucxx.sh Show resolved Hide resolved

ci/test_wheel_distributed_ucxx.sh Outdated Show resolved Hide resolved

ci/test_wheel_ucxx.sh Show resolved Hide resolved

Add smoke tests for aarch wheel

3ef814b

vyasr reviewed Jan 30, 2024

View reviewed changes

ci/test_wheel_distributed_ucxx.sh Outdated Show resolved Hide resolved

ci/test_wheel_ucxx.sh Outdated Show resolved Hide resolved

vyasr and others added 2 commits January 29, 2024 16:44

Don't print ucx info

dd4e439

Run pytest via python

e782fc7

pentschev reviewed Jan 30, 2024

View reviewed changes

ci/test_wheel_distributed_ucxx.sh Outdated Show resolved Hide resolved

pentschev added 2 commits January 30, 2024 05:19

Fix path to wheel smoke tests

cc017be

Merge remote-tracking branch 'msarahan/wheel-build-ci' into wheel-bui…

90c3152

…ld-ci

msarahan commented Jan 30, 2024

View reviewed changes

vyasr requested changes Jan 30, 2024

View reviewed changes

pentschev and others added 6 commits January 30, 2024 14:33

Fix smoke test copyright headers

3d43a5c

Remove irrelevant comment from wheel smoke test

2107f16

Fix CMake comment on spdlog

1b3977f

Move ci/test_utils.sh functions to ci/test_common.sh

9e6ce88

Remove unneeded wheel test import

d6af7c6

GH workflow fixes

6e4482f

Co-authored-by: Vyas Ramasubramani <[email protected]>

pentschev approved these changes Jan 30, 2024

View reviewed changes

vyasr requested changes Jan 30, 2024

View reviewed changes

pentschev and others added 3 commits January 31, 2024 15:49

Make wheel build script more consistent

4fa91fb

Co-authored-by: Vyas Ramasubramani <[email protected]>

Move RAPIDS_PY_CUDA_SUFFIX and build to ci/build_wheel.sh

2a8495d

Do not prevent errors in print_ucx_config

035c453

vyasr approved these changes Jan 31, 2024

View reviewed changes

bdice approved these changes Jan 31, 2024

View reviewed changes

bdice removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 31, 2024

rapids-bot bot merged commit 5d50ef9 into rapidsai:branch-0.37 Jan 31, 2024
47 checks passed

msarahan deleted the wheel-build-ci branch May 29, 2024 16:40

msarahan mentioned this pull request May 29, 2024

Support dynamic linking between RAPIDS wheels rapidsai/build-planning#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add build wheel script and accompanying version info #167

add build wheel script and accompanying version info #167

msarahan commented Jan 17, 2024 •

edited by pentschev

Loading

pentschev commented Jan 18, 2024

pentschev Jan 29, 2024

bdice Jan 29, 2024 •

edited

Loading

pentschev Jan 29, 2024

vyasr Jan 30, 2024

pentschev Jan 30, 2024

vyasr Jan 30, 2024

pentschev Jan 30, 2024

vyasr Jan 30, 2024

pentschev Jan 31, 2024

vyasr Jan 31, 2024 •

edited

Loading

bdice left a comment

msarahan left a comment

vyasr commented Jan 30, 2024 •

edited

Loading

vyasr left a comment

pentschev left a comment

pentschev Jan 30, 2024

vyasr left a comment

vyasr Jan 30, 2024

bdice left a comment

bdice commented Jan 31, 2024

add build wheel script and accompanying version info #167

add build wheel script and accompanying version info #167

Conversation

msarahan commented Jan 17, 2024 • edited by pentschev Loading

pentschev commented Jan 18, 2024

Choose a reason for hiding this comment

bdice Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

msarahan left a comment

Choose a reason for hiding this comment

vyasr commented Jan 30, 2024 • edited Loading

vyasr left a comment

Choose a reason for hiding this comment

pentschev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

bdice commented Jan 31, 2024

msarahan commented Jan 17, 2024 •

edited by pentschev

Loading

bdice Jan 29, 2024 •

edited

Loading

vyasr Jan 31, 2024 •

edited

Loading

vyasr commented Jan 30, 2024 •

edited

Loading