-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable sorting on column with nulls using query-planning #15639
Enable sorting on column with nulls using query-planning #15639
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor request for a little more doc on the differences.
|
||
|
||
def _cudf_layer(self): | ||
if hasattr(self._meta, "to_pandas"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this canary for whether or not we have a cudf-backed object is also true for pyarrow objects I think. I know this is a hypothetical right now, but just noting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a short comment about what exactly this code does differently from the upstream version such that it can handle nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this canary for whether or not we have a cudf-backed object is also true for pyarrow objects I think. I know this is a hypothetical right now, but just noting it.
Yeah, sorry. This was used as a simple toggle when I was experimenting in the dask-expr code itself. If the code lives in dask-cudf, we can do an instance check.
Could you add a short comment about what exactly this code does differently from the upstream version such that it can handle nulls?
Yes. I agree that a more-thorough comment is definitely needed here.
RepartitionQuantiles
in dask-expr@@ -72,7 +72,7 @@ def test_sort_repartition(): | |||
dd.assert_eq(len(new_ddf), len(ddf)) | |||
|
|||
|
|||
@xfail_dask_expr("dask-expr code path fails with nulls") | |||
@xfail_dask_expr("missing null support", lt_version="2024.5.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like this test to run for anything newer than the "2024.5.0"
release (including main
). Not sure of the "correct" way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is correct. pytest.mark.xfail(some_condition, reason="some_reason")
which is what this expands to is the canonical way.
/merge |
Description
Related to #15027
Checklist