Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(snowflake): make semantics of array filtering match everything else #10469

Merged
merged 2 commits into from
Nov 11, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Nov 11, 2024

  • style(trino): dedent
  • fix(snowflake): make semantics of array filtering match everything else

Fixes snowflake failures on main, caused by slightly different behavior around NULLs used in array filtering.

This PR ensures that behavior is the same for all backends that support array
filtering with an index.

Snowflake is a bit more complex because its higher order functions only accept
a single argument, so before we compile we run a rewrite rule (the use of
a rewrite is not new in this PR) to extract the field being referenced in
the body of the function.

The changes here are only relevant for the case of array filtering with an
index. If an index isn't used, then the behavior is the same as before.

This changes here are effectively making the snowflake backend implement array
filtering with an index like Trino and PySpark, where we construct a struct
containing whether to keep a value or not, along with the value.

This allows preservation of NULL values when the index is used for filtering,
for example.

@cpcloud cpcloud added this to the 10.0 milestone Nov 11, 2024
@cpcloud cpcloud added bug Incorrect behavior inside of ibis snowflake The Snowflake backend labels Nov 11, 2024
@github-actions github-actions bot added the sql Backends that generate SQL label Nov 11, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Nov 11, 2024

Snowflake is passing:

…/ibis on  null-handling-array-filter-clouds is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️  impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
.x..........................xx..................................s.......................................................... [  6%]
....................................................x.......................x...........................x...x........x..... [ 12%]
.....x......................x..............................................x...x....x.x.x.......x..x...x....x..xx..x....x.. [ 18%]
...............x...................x.............x.......x.......x.................x....x.................................. [ 24%]
..xx...x...............x........x......x...................x...x.........................x................................. [ 30%]
....................................................x.............x.......................x................................ [ 36%]
....................................x.x..xx..x..xxxx...xx.xx....xx.xx....x....xxx....x..x.x.xx...x.x.x..x.xx.....x......x.x [ 42%]
..xxx..xxx..xxx..xxxx.xxx.x...xxx..x...xxx.x......x.......xx.xx.........x....x....x..x...............x.......x...........x. [ 48%]
.............x.....x................x..............x...x............x............xx.......x...............x................ [ 54%]
......................................x...xx....x.....s.......................x..x.xxxxx.xxxx.xx.....xx....x..x.xx.......x. [ 60%]
.x..x....xx....xx..x....x...x...x........xx...x......xx..............x.x.x...x..xx....x........x...xx.x............xxx..... [ 66%]
.......xx..x.....x........x.........xxxxxxxxxxx..s........x.......................................s..x..............s...... [ 72%]
........................................x.......x..x....x....x....x..x......................................x.............. [ 78%]
x..x.........x.....x..x..x...............xx...x...x.....x...x....x.xx..............x...................x............x..x.x. [ 84%]
.........s........................ss....x.s.............x.................................................................. [ 90%]
..........x..............................................s................................................................. [ 97%]
............................................................                                                                                                                                  [100%]
1792 passed, 10 skipped, 226 xfailed in 251.20s (0:04:11)

if index is None:
return self.f.filter(arg, sge.Lambda(this=body, expressions=[param]))
else:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no changes here except dedenting.

@cpcloud cpcloud force-pushed the null-handling-array-filter-clouds branch from f5e3a8a to 6d90f18 Compare November 11, 2024 13:07
@cpcloud
Copy link
Member Author

cpcloud commented Nov 11, 2024

Snowflake passing after most recent force push:

…/ibis on  null-handling-array-filter-clouds is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️  impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
.....................................................x......x....................x....xx......x..................x...........x........x....x....x....................x........x.............. [  9%]
..x.......x.......x...........x.x....x..x....x......xx..x...x.........x.x.xx.xxx..x.....x..xx......xx.x.x.xxx...xxx.x.x.xxx...xxx...x.xx....x.x.x..x..x...xxx.x.x.xx.x...xx.xx..xxx.xx..xxxx. [ 18%]
....xxx....x.xx......xxx.....x....x.x...xx.....x............xx..........................................................................x.......x...................x..............x......... [ 27%]
.....x..............................s...........................s.............................................x.....x...s.........x..........x....x.x.....x........x.....x...x..x.......xx... [ 37%]
x.x.......x.....................xxx........x.....x.xx.........x..x.x...xx.x.............x.......x....xx.x..........x.........................xxxx...........x..x................x...x.x...... [ 46%]
.......x............x.........x........x..s......x.................x...x......xxxxxxx..xx.....xx..xx.......x...x...........x.........x............x....x........s.......................x.... [ 55%]
..........x.........................x...x...................x....x...............x....x..............x.....x......x...........x.......x........................x..................x.......xxx [ 65%]
.....................................................................x....s..x............................................................................................................... [ 74%]
...........x.x.........x........................................................................x...xx..............x..................x..x..........................x......x................ [ 83%]
.........x.....x..................x....xxxx....xx..xx..x.xxx.x.x.......x........x..............................................s..s..............................................s........... [ 93%]
...............................................................s....x.....................................................................                                                    [100%]
1792 passed, 10 skipped, 226 xfailed in 261.65s (0:04:21)

@cpcloud cpcloud added the trino The Trino backend label Nov 11, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Nov 11, 2024

Merging to get CI green again.

@cpcloud cpcloud merged commit bad487b into ibis-project:main Nov 11, 2024
76 checks passed
@cpcloud cpcloud deleted the null-handling-array-filter-clouds branch November 11, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis snowflake The Snowflake backend sql Backends that generate SQL trino The Trino backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant