-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): add filters argument to DeltaTable.to_pandas() for filter pushdown #1316
Comments
Have you tried |
@wjones127 - Here's the signature for that function I don't see any |
I see. We can extend that method to pass down filters in the dataset. |
Great! Can we expose the filters to use this syntax |
We may even want to combine the |
Yes, I think we'll deprecate |
Hi, guys. I apologize for interrupting your argument. Although it may take me some time to fully grasp the structure of delta-rs, I would like to work on this issue. Would it be possible to assign me to this issue? |
We would gladly accept a PR for this. 😄 Note there is a function called |
Thank you @wjones127 @MrPowers! Once there's a consensus on the implementation, please feel free to assign me to this issue, and I will follow the discussion here as well. Also, I will check the source code that you linked. |
@ognis1205 - I think there is a consensus on the next steps and you can get started. See here for more information on the DNF filter format that Will is referring to. Here's the existing to_pandas method signature: You can submit a PR that will update the method signature to this: Let me know if you'd like any additional clarification! |
Thanks again @MrPowers @wjones127 !! |
Hi, everyone! I have a question. https://github.com/delta-io/delta-rs/actions/runs/4926020116/jobs/8800992032?pr=1349#step:8:18 So, I am considering implementing a fallback function for Should I still implement the fallback function if the Python EDIT: |
Yes, we will drop support for Python 3.7, but we are keeping support (for now) for PyArrow 7. |
Thanks for answering! |
@MrPowers @wjones127 |
…er pushdown (#1349) # Description This pull request adds support for filtering to the `to_pandas` method of `DeltaTable`. The new filters argument allows users to specify filtering criteria in a format that is compatible with `pyarrow.compute.Expression`. - Adding the `filters` argument to `DeltaTable.to_pandas`. - Adding the `_filters_to_expression` function to `table` module, which is based on [this implementation](https://github.com/apache/arrow/blob/b9cc5df8a4f7c7fe09f40ba92a74981dee5e536a/python/pyarrow/parquet/core.py#LL155C5-L155C26), but with improved type consistency. - ~~Based on [the existing conventional unit tests](https://github.com/delta-io/delta-rs/blob/b5230835bc1d1b01d59da3649033f1180232ddb7/python/tests/test_table_read.py#L392), I did not add any additional test cases for this feature. Instead, I tested the feature on my local development environment.~~ - Adding a unit test for the feature. # Related Issue(s) - closes #1316 # Documentation [The DNF filter format](https://github.com/apache/arrow/blob/b9cc5df8a4f7c7fe09f40ba92a74981dee5e536a/python/pyarrow/parquet/core.py#LL155C5-L155C26). --------- Signed-off-by: Shingo OKAWA <[email protected]>
Sorry for bumping an ancient ticket - maybe the docs online are outdated, but my understanding is this new (old) route supports Is my understanding of the current limitations correct? I'm interested in helping contribute either code or documentation updates. |
Reading Delta tables into pandas DataFrames could be more intuitive.
Here's the current syntax:
Could we provide an interface that's more similar to
pandas.read_parquet
? Something like this:The list of tuples of how predicates are passed in
pandas.read_parquet
.I'm open to other interfaces too. I think something that hides
pyarrow.dataset
, is less code, and is familiar with existing pandas syntax would be good.The text was updated successfully, but these errors were encountered: