-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python]: Support PyCapsule Interface Objects as input in more places #43410
Comments
Specifically for But in general, certainly +1 on more widely supporting the interface. Some other possible areas:
|
Started with exploring |
That sounds awesome. For reference in my own experiments in https://github.com/kylebarron/arro3, I created an ArrayReader class, essentially just a |
…w-pycapsule-dataset
…taset (#43771) ### Rationale for this change Expanding the support internally in pyarrow where we accept objects implementing the Arrow PyCapsule interface. This PR adds support in `ds.write_dataset()` since we already accept a RecordBatchReader as well. ### What changes are included in this PR? `ds.write_dataset()` and `ds.Scanner.from_baches()` now accept any object implementing the Arrow PyCapsule interface for streams. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #43410 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Issue resolved by pull request 43771 |
Describe the enhancement requested
Now that the PyCapsule Interface is starting to gain more traction (#39195), I think it would be great if some of pyarrow's functional APIs accepted any PyCapsule Interface object, and not just pyarrow objects.
Do people have opinions on what functions should or should not check for these objects? I'd argue that file format writers should check for them, because it's only a couple lines of code, and the input stream will be fully iterated over regardless. E.g. looking at the Parquet writer: the high level API doesn't currently accept a
RecordBatchReader
either, so support for both can come at the same time.I'd argue that the writer should be generalized to accept any object with an
__arrow_c_stream__
dunder, and to ensure the stream is not materialized as a table.Component(s)
Python
The text was updated successfully, but these errors were encountered: