Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom resolver for query and eval with nested frames. #150

Merged
merged 11 commits into from
Oct 15, 2024

Conversation

gitosaurus
Copy link
Contributor

Verify preflighting of nested expressions using AST visitation.

Remove logic for splitting queries by string. Now the evaluation is handled by a nested column resolver, and the mixed-mode expressions are preflighted by examining the parsed abstract syntax tree for the query expression.

Resolves #59.

Change Description

  • My PR includes a link to the issue that I am addressing

Solution Description

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@gitosaurus gitosaurus requested review from dougbrn and hombit October 9, 2024 16:01
Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.49%. Comparing base (12d1293) to head (437cca7).
Report is 59 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
+ Coverage   99.47%   99.49%   +0.01%     
==========================================
  Files          14       14              
  Lines         948      981      +33     
==========================================
+ Hits          943      976      +33     
  Misses          5        5              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Verify preflighting of nested expressions using AST visitation.

Remove logic for splitting queries by string.  Now the evaluation is
handled by a nested column resolver, and the mixed-mode expressions
are preflighted by examining the parsed abstract syntax tree for
the query expression.
Also remove `_ensure_spacing`, no longer used.
Copy link

github-actions bot commented Oct 9, 2024

Before [9e76172] After [17feb3c] Ratio Benchmark (Parameter)
7.28±0.2ms 8.25±0.2ms 1.13 benchmarks.NestedFrameQuery.time_run
9.48±0.1ms 9.63±0.2ms 1.02 benchmarks.NestedFrameAddNested.time_run
90.9M 92.9M 1.02 benchmarks.NestedFrameQuery.peakmem_run
1.59±0.02ms 1.62±0.05ms 1.02 benchmarks.NestedFrameReduce.time_run
256M 258M 1.01 benchmarks.AssignSingleDfToNestedSeries.peakmem_run
279M 281M 1.01 benchmarks.ReassignHalfOfNestedSeries.peakmem_run
83.8M 83.8M 1 benchmarks.NestedFrameAddNested.peakmem_run
87.5M 87.4M 1 benchmarks.NestedFrameReduce.peakmem_run
58.4±2ms 55.7±2ms 0.95 benchmarks.ReassignHalfOfNestedSeries.time_run
31.2±2ms 28.8±2ms 0.92 benchmarks.AssignSingleDfToNestedSeries.time_run

Click here to view all benchmarks.

This logic was copied from pd.DataFrame.query, and the accompanying
comment said that it was to handle an occasional case where
`self.loc[b]` would raise an error on a multi-dimensional `b`, but
`self[b]` would succeed.  I can't cause this error with `.loc` anymore,
so the code coverage complains about the unexercised exception
clause.  Removing it since the limitation that inspired it seems
to be gone now.
Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I'd ask for more testing and not-implemented error-handling. For example, I'm not sure that assignment with eval is supported: .eval("nested.c = nested.a + nested.b") or .eval("nested.b = x + nested.a").

It looks like it may also close #144 and #146? If yes, could you please add tests for covering those issues?

.gitignore Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/utils.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
@gitosaurus gitosaurus linked an issue Oct 9, 2024 that may be closed by this pull request
3 tasks
@gitosaurus
Copy link
Contributor Author

Thank you!

I'd ask for more testing and not-implemented error-handling. For example, I'm not sure that assignment with eval is supported: .eval("nested.c = nested.a + nested.b") or .eval("nested.b = x + nested.a").

It looks like it may also close #144 and #146? If yes, could you please add tests for covering those issues?

Yes, this PR does resolve #146, thanks for the tip! #144 is not resolved by this PR, but perhaps this same approach could be extended to cover it in a separate PR.

Good catch on the assignment problem. The existing ability for eval() to insert base columns still works, but you're right, it can't insert into the nest yet. I'll add those tests and see if adding nested columns can be done within the scope of this PR. If not, I'll at least make a clearer error.

Prevent users from depending on the public interface, and also
from assuming that a `NestedSeries` has a nest within it, since
that is the existing meaning of that prefix.
Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @gitosaurus , looks really good! I added a few comments, in addition to the feedback Kostya has provided.

src/nested_pandas/nestedframe/core.py Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Show resolved Hide resolved
Permits the creation of new nests and new nested columns within
the `.eval()` method.
Include more unit tests of `NestedFrame.eval`.
@gitosaurus
Copy link
Contributor Author

Thank you!
I'd ask for more testing and not-implemented error-handling. For example, I'm not sure that assignment with eval is supported: .eval("nested.c = nested.a + nested.b") or .eval("nested.b = x + nested.a").
It looks like it may also close #144 and #146? If yes, could you please add tests for covering those issues?

Yes, this PR does resolve #146, thanks for the tip! #144 is not resolved by this PR, but perhaps this same approach could be extended to cover it in a separate PR.

Good catch on the assignment problem. The existing ability for eval() to insert base columns still works, but you're right, it can't insert into the nest yet. I'll add those tests and see if adding nested columns can be done within the scope of this PR. If not, I'll at least make a clearer error.

The changes I pushed yesterday do permit assignment of nested columns, and even the creation of new nests, from within eval()! Added tests of this as well.

Some of the dict-like overloads were an overachievement.
@gitosaurus gitosaurus requested review from hombit and dougbrn October 15, 2024 17:25
Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you so much!

Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think the eval interface here is a huge addition to nested-pandas, to the level that I would love to see it in our documentation. I wouldn't ask it for this PR, but perhaps in a future PR we could add a section to the data manipulation tutorial showcasing some of the uses of eval?

@gitosaurus
Copy link
Contributor Author

This is great! I think the eval interface here is a huge addition to nested-pandas, to the level that I would love to see it in our documentation. I wouldn't ask it for this PR, but perhaps in a future PR we could add a section to the data manipulation tutorial showcasing some of the uses of eval?

Great idea! I'll add a task for that.

@gitosaurus gitosaurus merged commit a96a508 into main Oct 15, 2024
10 checks passed
@gitosaurus gitosaurus deleted the query-extension branch October 15, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query("nested.b.isna()") fails Nested-pandas query does not handle scientific notation
3 participants