feat: Series.index_of() #19894

itamarst · 2024-11-20T18:14:30Z

Categoricals don't work yet; see #20171 and #20318.

codecov · 2024-11-20T18:49:19Z

Codecov Report

Attention: Patch coverage is 98.68421% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.03%. Comparing base (11dd4b3) to head (3cb65df).

Files with missing lines	Patch %	Lines
crates/polars-ops/src/series/ops/index_of.rs	98.76%	1 Missing ⚠️
.../polars-python/src/lazyframe/visitor/expr_nodes.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main   #19894       +/-   ##
===========================================
+ Coverage   58.69%   79.03%   +20.33%     
===========================================
  Files        1564     1566        +2     
  Lines      220765   220918      +153     
  Branches     2504     2504               
===========================================
+ Hits       129584   174598    +45014     
+ Misses      90607    45747    -44860     
+ Partials      574      573        -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crates/polars-ops/src/series/ops/index_of.rs

crates/polars-plan/src/dsl/mod.rs

crates/polars-ops/src/series/ops/index_of.rs

itamarst · 2024-12-02T18:31:29Z

I think I've figured out how to use row encoding, so now I just need to write lots and lots of tests and make sure it actually works beyond the trivial case I've already tested.

itamarst · 2024-12-02T23:04:58Z

Unfortunately categorical and enum don't work (they also don't work for search_sorted(), which would be nice to fix); they ought to work, since e.g. pl.Series(["A", "B"], dtype=pl.Categorical) == "B" works, but I'm not sure how that is different than what I'm doing, so would appreciate any hints.

E.g. for Categorical:

>>> import polars as pl
>>> pl.Series(["a", "b", "a"], dtype=pl.Categorical).index_of("a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/itamarst/devel/polars/py-polars/polars/series/series.py", line 4771, in index_of
    return F.select(F.lit(self).index_of(element)).item()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/functions/lazy.py", line 1913, in select
    return pl.DataFrame().select(*exprs, **named_exprs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/dataframe/frame.py", line 9113, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/lazyframe/frame.py", line 2029, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: got invalid or ambiguous dtypes: '[cat, str]' in expression 'index_of'

Consider explicitly casting your input types to resolve potential ambiguity.

Resolved plan until failure:

        ---> FAILED HERE RESOLVING 'select' <---
 SELECT [Series.index_of([String(a)])] FROM
  DF []; PROJECT */0 COLUMNS; SELECTION: None

coastalwhite · 2024-12-04T07:51:10Z

My guess is that you are treating a categorical as a string when it goes into the row encoding. If you want to compare the row encoding of a series with the row encoding of another series they need to have been encoded with the exact same dtype (i.e. so the same RevMap as well) otherwise the output is undefined. If search_sorted doesn't do that either, that is a bug and I can look into it.

itamarst · 2024-12-04T13:14:59Z

@coastalwhite search_sorted() does gets it wrong, yes. And separately if memory serves, you pass in a non-matching pl.lit("a", dtype=pl.Categorical) it doesn't error out with mismatching categoricals, it gives the wrong result.

itamarst · 2024-12-04T13:16:03Z

@coastalwhite and the question is how/where do I convert to an enum/categorical, my attempts have failed so far.

crates/polars-ops/src/series/ops/index_of.rs

crates/polars-plan/src/dsl/function_expr/index_of.rs

ritchie46 · 2024-12-16T16:28:42Z

That was mostly before the rebase. If I remember correctly I talked to ritchie and we both agreed that pl.Series([1, 2]).index_of(2.1) should not return 1. I am not sure if it still does that.

It should raise. We should not allow for lossy comparisons. That's why we need a new supertype flag.

…s are fixed.

itamarst · 2024-12-16T17:00:03Z

To clarify, pl.Series([1, 2]).index_of(2.1) == None, and pl.Series([1, 2]).index_of(2.0) == 1. So neither result is incorrect, at least. Would you like it to error out in both cases?

itamarst · 2024-12-16T17:01:26Z

Or to put it another way, it is correctly searching float needles in int haystacks, but I can certainly see that you'd want to disallow that.

itamarst · 2024-12-16T20:25:15Z

After #20323 is merged I can change this PR to start testing Enum as non-xfail tests.

crates/polars-ops/src/series/ops/search_sorted.rs

crates/polars-plan/src/dsl/mod.rs

crates/polars-ops/src/series/ops/index_of.rs

itamarst · 2025-01-02T17:28:31Z

Thank you for the new casting logic! I've updated to use it, and addressed the other two comments.

ritchie46 · 2025-01-05T12:04:58Z

Alright, looks great @itamarst. Thanks. I believe we only need docs entries on the python side (so that they end up in the ref guide), then it is good to go.

rodrigogiraoserrao

Do we really need the tiny user-guide page? It's pretty much the same as the docstrings, so I feel like it's enough to have the docstrings.

rodrigogiraoserrao · 2025-01-06T15:43:26Z

docs/source/user-guide/expressions/searching.md

Do we need this page? The examples shown here are already available in the docstrings, so I think we can delete this.
The docstrings in the Python side are enough to make this appear in the API reference.

I thought this was what Rich was asking for, maybe I misunderstood.

I meant adding the entries in the reference guide. We don't need this in the User guide indeed.

rodrigogiraoserrao · 2025-01-06T15:50:11Z

docs/source/user-guide/expressions/searching.md

+API, which is similar to Python lists' `index()` method.
+Given a dataframe:
+
+{{code_block('user-guide/expressions/casting', 'dfnum', [])}}


If the page is not removed, as I'd like it to, then the [] here need to reference index_of, which will also require adding Python and Rust entries to docs/source/_build/API_REFERENCE_LINKS.yml.

The first one is just setting up the DataFrame, so doesn't need it, but I'll add to the other two.

docs/source/user-guide/expressions/searching.md

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 20, 2024

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-plan/src/dsl/mod.rs Show resolved Hide resolved

itamarst marked this pull request as ready for review November 21, 2024 14:13

itamarst requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa, wence- and orlp as code owners November 21, 2024 14:13

nameexhaustion reviewed Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

ritchie46 reviewed Nov 22, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst marked this pull request as draft November 27, 2024 15:00

itamarst closed this Dec 2, 2024

itamarst reopened this Dec 2, 2024

itamarst changed the title ~~feat: Start of Series.index_of(), for primitive numeric types~~ feat: Series.index_of() Dec 2, 2024

coastalwhite reviewed Dec 4, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

coastalwhite reviewed Dec 4, 2024

View reviewed changes

crates/polars-plan/src/dsl/function_expr/index_of.rs Outdated Show resolved Hide resolved

itamarst mentioned this pull request Dec 5, 2024

search_sorted on Categorial and Enum Series fails to work if given a string #20171

Open

2 tasks

itamarst marked this pull request as ready for review December 5, 2024 16:16

Null can be cast to anything.

b7d689b

pythonspeed added 3 commits December 16, 2024 11:46

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

ccecc38

Update to latest API.

8cfe43d

Give good error messages, which can be removed when corresponding bug…

21eedc9

…s are fixed.

pythonspeed added 2 commits December 16, 2024 12:06

Error out instead of giving the wrong result

aad9544

Format

a52a4e6

pythonspeed added 4 commits December 17, 2024 08:26

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

0a02b48

Enum literals work now.

dbebe7c

Add missing cfg

138bf73

Remove redundant type annotations

dbc0cbd

ritchie46 reviewed Dec 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/search_sorted.rs Show resolved Hide resolved

crates/polars-plan/src/dsl/mod.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Show resolved Hide resolved

pythonspeed added 5 commits January 2, 2025 11:37

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

35250de

Switch to strict casting.

731fd6a

Remove duplicate logic.

7ee4ede

Don't panic.

3179992

Improve testing slightly, and pacify mypy.

a9a06af

itamarst requested a review from ritchie46 January 2, 2025 17:28

pythonspeed added 4 commits January 6, 2025 09:18

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

b0196ae

Minimal guide level documentation for index_of().

0fe814e

Pacify linter

b358e22

Reformat so dprint is happy.

541049a

rodrigogiraoserrao reviewed Jan 6, 2025

View reviewed changes

pythonspeed added 2 commits January 6, 2025 10:56

fix reference

68ebd34

Add index references

3cb65df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Series.index_of() #19894

feat: Series.index_of() #19894

itamarst commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

itamarst commented Dec 2, 2024

itamarst commented Dec 2, 2024 •

edited

Loading

coastalwhite commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Dec 4, 2024

ritchie46 commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Jan 2, 2025

ritchie46 commented Jan 5, 2025

rodrigogiraoserrao left a comment

rodrigogiraoserrao Jan 6, 2025 •

edited

Loading

itamarst Jan 6, 2025

ritchie46 Jan 6, 2025

rodrigogiraoserrao Jan 6, 2025

itamarst Jan 6, 2025

feat: Series.index_of() #19894

Are you sure you want to change the base?

feat: Series.index_of() #19894

Conversation

itamarst commented Nov 20, 2024 • edited Loading

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

itamarst commented Dec 2, 2024

itamarst commented Dec 2, 2024 • edited Loading

coastalwhite commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Dec 4, 2024

ritchie46 commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Dec 16, 2024

itamarst commented Jan 2, 2025

ritchie46 commented Jan 5, 2025

rodrigogiraoserrao left a comment

Choose a reason for hiding this comment

rodrigogiraoserrao Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

itamarst Jan 6, 2025

Choose a reason for hiding this comment

ritchie46 Jan 6, 2025

Choose a reason for hiding this comment

rodrigogiraoserrao Jan 6, 2025

Choose a reason for hiding this comment

itamarst Jan 6, 2025

Choose a reason for hiding this comment

itamarst commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

itamarst commented Dec 2, 2024 •

edited

Loading

rodrigogiraoserrao Jan 6, 2025 •

edited

Loading