Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars: DataFrame Serialization #17062

Merged
merged 42 commits into from
Nov 14, 2024
Merged

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Oct 11, 2024

Use pylibcudf’s pack and unpack to implement Dask compatible serialization.

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cudf.polars Issues specific to cudf.polars labels Oct 11, 2024
@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Oct 11, 2024
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small notes

python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
@github-actions github-actions bot removed CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Oct 12, 2024
@madsbk madsbk marked this pull request as ready for review October 22, 2024 14:34
@madsbk madsbk requested a review from a team as a code owner October 22, 2024 14:34
@madsbk madsbk requested review from mroeschke and Matt711 October 22, 2024 14:34
@madsbk madsbk requested a review from a team as a code owner October 23, 2024 08:22
@madsbk madsbk requested a review from KyleFromNVIDIA October 23, 2024 08:22
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/cudf_polars/containers/dataframe.py Outdated Show resolved Hide resolved
python/cudf_polars/tests/containers/test_dataframe.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes make sense to me. I left some tiny suggestions.

@wence-
Copy link
Contributor

wence- commented Nov 11, 2024

/merge

python/cudf_polars/cudf_polars/experimental/__init__.py Outdated Show resolved Hide resolved
- output_types: [conda, requirements, pyproject]
packages:
- *numpy
- rapids-dask-dependency==24.12.*,>=0.0.0a0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just a test dependency. It’s experimental but seems required for import (not just in tests)? Should it be an “extra”?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense, @wence- WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's make it an experimental extra.

Note that it's not required for import, because we deliberately do not import the experimental submodule in cudf-polars proper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to specify an optional extra for the py_run_cudf_polars target. If that is indeed the one we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just look at this in a later PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I did it. @bdice, I believe that it is not possible to specify optional dependencies for conda packages. What is the recommended best practice for rapids in this scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we include all feature-related extras in conda dependencies, as if they were hard requirements. We never do this for test dependencies though. It just depends on what is desired: getting an import error at runtime with a nice message and installing a package to fix it, or having the package ahead of time and paying for the extra disk space and environment solve complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for my recommendation — until this feature is more usable / less experimental, perhaps leave it off of the conda dependencies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it off for now.

@wence- wence- requested a review from a team as a code owner November 13, 2024 11:55
@wence-
Copy link
Contributor

wence- commented Nov 13, 2024

/merge

@rapids-bot rapids-bot bot merged commit 5d5b35d into rapidsai:branch-24.12 Nov 14, 2024
105 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.polars Issues specific to cudf.polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants