-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polars: DataFrame Serialization #17062
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small notes
873d813
to
205cc36
Compare
205cc36
to
1ea98ab
Compare
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes make sense to me. I left some tiny suggestions.
Co-authored-by: Matthew Murray <[email protected]>
/merge |
dependencies.yaml
Outdated
- output_types: [conda, requirements, pyproject] | ||
packages: | ||
- *numpy | ||
- rapids-dask-dependency==24.12.*,>=0.0.0a0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just a test dependency. It’s experimental but seems required for import (not just in tests)? Should it be an “extra”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would make sense, @wence- WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let's make it an experimental
extra.
Note that it's not required for import, because we deliberately do not import the experimental submodule in cudf-polars proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to specify an optional extra
for the py_run_cudf_polars
target. If that is indeed the one we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just look at this in a later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I did it. @bdice, I believe that it is not possible to specify optional dependencies for conda packages. What is the recommended best practice for rapids in this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we include all feature-related extras in conda dependencies, as if they were hard requirements. We never do this for test dependencies though. It just depends on what is desired: getting an import error at runtime with a nice message and installing a package to fix it, or having the package ahead of time and paying for the extra disk space and environment solve complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for my recommendation — until this feature is more usable / less experimental, perhaps leave it off of the conda dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it off for now.
/merge |
Use pylibcudf’s pack and unpack to implement Dask compatible serialization.