-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add build wheel script and accompanying version info #167
Conversation
@vyasr do we have a guideline on what we normally test/should test for wheels builds? Are other projects running the full test set or only a subset of everything that runs for conda? For now I've only added subset of the more common uses as reference. |
# TODO: We need distributed installed in developer mode to provide test utils, | ||
# we still need to match to the `rapids-dask-dependency` version. | ||
rapids-logger "Install Distributed in developer mode" | ||
git clone https://github.com/dask/distributed /tmp/distributed | ||
python -m pip install -e /tmp/distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vyasr we don't need to address this immediately, but do you think it would work if we had optional dependencies for tests in https://github.com/rapidsai/rapids-dask-dependency/blob/ac821e6a3e396340f65fe79dc834f5b711d3b0cb/pip/rapids-dask-dependency/pyproject.toml#L14-L17 , where we essentially pip install -e
the packages? I don't know if that's possible -- and I think it isn't -- but would like to know if you have any ideas how this could be done. We would also need something similar for conda packages, I currently pip install -e
in conda CI as well which isn't a good long-term solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a question about this as well. Let's discuss in this thread.
@pentschev Can you explain a bit more about this requirement, and how you would resolve the TODO here? It looks like this is a requirement to run distributed-ucxx tests, but it isn't available in the
distributed
package? Having a dependency on installingdistributed
from source (as opposed to something available in the wheel) might make it hard to reproduce CI failures in a local development environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also adding here what I wrote in the other thread:
Yes, we need some testing dependencies from distributed, specifically
from distributed.comm.tests.test_comms import check_deserialize
I don't know of a good solution for this problem either though and I very much don't want to copy-paste the code in here because that will eventually cause us to get out-of-sync with Distributed code and essentially mean we're not testing it properly anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think dask devs would be open to moving some of the functionality of distributed.comm.tests.test_comms
into a utility module that is actually shipped with the package? It sounds like everything that we're doing here is to work around the fact that we're trying to use functionality that they explicitly and intentionally do not ship. Before we look into strange workarounds like this one, is there an alternative that involves them actually shipping the necessary APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that they intentionally do not ship because they are only used in that particular test file. I very much wanted to avoid having to modify things there, the file is reasonably large with many functions that would need to move and we would have to bear the burden of properly documenting/annotating those functions that I'm not even familiar with internals, and eventually we would also need to bear the burden of some maintenance, which we currently don't have the bandwidth to. If we can't find a reasonably simple solution to install Distributed in dev mode, I think I'd prefer to limit or skip that test for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather limit or skip. This logic:
I very much don't want to copy-paste the code in here because that will eventually cause us to get out-of-sync with Distributed code and essentially mean we're not testing it properly anymore.
feels directly contradictory to
we would have to bear the burden of properly documenting/annotating those functions that I'm not even familiar with internals, and eventually we would also need to bear the burden of some maintenance, which we currently don't have the bandwidth to
If we don't want to bear the maintenance burden then I'd much rather not try and take on some partial maintenance burden via a hacky installation of distributed here that forces us to stay partially in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where the contradiction as the statements are talking about two different aspects:
- Copy-pasting those function in distributed-ucxx means they will eventually become out-of-sync with the actual implementation from Distributed, which is the source-of-truth;
- The maintenance burden I'm referring to is the maintenance burden of those functions in Distributed as we will now have "some authorship" because we're requesting they live as part of distributed packages and will have to document and properly type-annotate them, etc.
IOW, we still want to use them here for testing purposes only but I'd very much avoid touching much of Distributed code if we don't absolutely need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but right now we implicitly incur part of the maintenance burden of 2 because if dask decides to change the behavior of that function in some way we'll see it in our test. We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. Conversely, if it was public there would be a higher maintenance burden (because we'd have to maintain docs/annotations/etc as you say) but we would also be less likely to be broken without warning (of course, given dask's low stability it could still happen, but it reduces the odds). The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.
In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dask decides to change the behavior of that function in some way we'll see it in our test.
Yes, we definitely want to see any of that. If the behavior changed there's a high chance we need to do adjustments on our end, which we must catch ASAP.
We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. [...] given dask's low stability it could still happen, but it reduces the odds [...]
I agree with you here, but for now this is a price I'm willing to pay to have an extra 1% confidence that we're less likely to be broken with little/no warning due to something we're skipping tests for.
The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.
I absolutely agree with you and it's annoying. What if I would then split that one test (and perhaps others in the future) into a tests_internals
directory and run everything else with "vanilla" Distributed, and have essentially the following:
pytest python/distributed-ucxx/distributed_ucxx/tests/
pip install -e ...
pytest python/distributed-ucxx/distributed_ucxx/tests_internals/
Does that sound like a reasonable tradeoff to you? I just want to test as much as possible and sometimes that means we need to test bits that are not part of Distributed's public API, like this.
In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.
Agreed, hopefully what I'm proposing above is a reasonable tradeoff for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the split you proposed for getting extra coverage in CI makes sense. Let's make that change in a follow-up PR where you apply it to both conda and pip testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments to address, which need some input from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm amazed and horrified by the library manipulation that has to be done, but needs must. I'm glad that the three of you are so knowledgeable. I can't approve this, but I don't have anything left to add that would improve this.
Yeah it's pretty horrific. At least this time around I already knew what was needed, the first time was an adventure to figure out why cuml dask tests were seg faulting 😅 Especially since we don't have MG CI for wheel tests so that was a bit of a last-minute discovery for wheels. The full history's in this ucx-py PR. A couple of long live debugging sessions with me and Ben Z to get this working IIRC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close now, we should be able to merge soon.
Co-authored-by: Vyas Ramasubramani <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've addressed all your comments @vyasr , please have another look.
# TODO: We need distributed installed in developer mode to provide test utils, | ||
# we still need to match to the `rapids-dask-dependency` version. | ||
rapids-logger "Install Distributed in developer mode" | ||
git clone https://github.com/dask/distributed /tmp/distributed | ||
python -m pip install -e /tmp/distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where the contradiction as the statements are talking about two different aspects:
- Copy-pasting those function in distributed-ucxx means they will eventually become out-of-sync with the actual implementation from Distributed, which is the source-of-truth;
- The maintenance burden I'm referring to is the maintenance burden of those functions in Distributed as we will now have "some authorship" because we're requesting they live as part of distributed packages and will have to document and properly type-annotate them, etc.
IOW, we still want to use them here for testing purposes only but I'd very much avoid touching much of Distributed code if we don't absolutely need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor leftover items then we should be good to go.
# TODO: We need distributed installed in developer mode to provide test utils, | ||
# we still need to match to the `rapids-dask-dependency` version. | ||
rapids-logger "Install Distributed in developer mode" | ||
git clone https://github.com/dask/distributed /tmp/distributed | ||
python -m pip install -e /tmp/distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but right now we implicitly incur part of the maintenance burden of 2 because if dask decides to change the behavior of that function in some way we'll see it in our test. We're not maintaining the function, but we are effectively signing up to keep track of an implementation detail of dask's tests. Conversely, if it was public there would be a higher maintenance burden (because we'd have to maintain docs/annotations/etc as you say) but we would also be less likely to be broken without warning (of course, given dask's low stability it could still happen, but it reduces the odds). The current approach also makes it harder for devs to understand why the tests fail if they don't know that they need to download the source of distributed to access this functionality.
In any case, I don't think there is a good solution for installing the editable version from source. Every solution I can think of is equally hacky, so for now I think we can leave this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for all the teamwork on this!
/merge |
This PR adds wheel builds for
ucxx
. A follow-up PR will add wheel builds fordistributed-ucxx
.Closes #145