Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_window_aggs_for_batched_finite_row_windows_partitioned fail #10143

Conversation

mythrocks
Copy link
Collaborator

Fixes #10134.

This commit fixes test failures in test_window_aggs_for_batched_finite_row_windows_partitioned, resulting from ambiguous ordering in the window function input.

The failing tests partition by a, and order by b,c. When the values of b,c have repeated values, the results from the window function execution is indeterminate.

This commit changes the definition of the aggregation column c (that's also included in the order-by clause), to use unique long values. This guarantees deterministic output.

…lure.

Fixes NVIDIA#10134.

This commit fixes test failures in `test_window_aggs_for_batched_finite_row_windows_partitioned`, resulting from ambiguous
ordering in the window function input.
The failing tests partition by `a`, and order by `b,c`.  When the values of `b,c` have repeated values, the results from
the window function execution is indeterminate.
This commit changes the definition of the aggregation column `c` (that's also included in the order-by clause), to use unique
long values.  This guarantees deterministic output.

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks self-assigned this Jan 2, 2024
@mythrocks mythrocks added the test Only impacts tests label Jan 2, 2024
@mythrocks
Copy link
Collaborator Author

Build

@jlowe
Copy link
Member

jlowe commented Jan 3, 2024

@mythrocks are there major performance issues with these tests? CI timed out, and I see a huge time gap while running what looks like related tests. From the premerge log:

[2024-01-03T01:50:55.077Z] [gw1] [ 99%] PASSED ../../../../integration_tests/src/main/python/window_function_test.py::test_window_aggs_for_batched_finite_row_windows_fallback[[('a', RepeatSeq(Integer)), ('b', Integer), ('c', UniqueLong(not_null))]][DATAGEN_SEED=1704242734, IGNORE_ORDER({'local': True})] 
[2024-01-03T03:17:45.878Z] [gw2] [ 99%] PASSED ../../../../integration_tests/src/main/python/window_function_test.py::test_window_aggs_for_batched_finite_row_windows_unpartitioned[[('a', RepeatSeq(Integer)), ('b', Long), ('c', UniqueLong(not_null))]-1000][DATAGEN_SEED=1704242734, IGNORE_ORDER({'local': True})] Sending interrupt signal to process

@mythrocks
Copy link
Collaborator Author

mythrocks commented Jan 3, 2024

No, no performance issues of which I'm aware. I'll double-check locally.

Edit: These both run end-to-end in about 20 seconds locally.

@mythrocks mythrocks closed this Jan 3, 2024
@mythrocks mythrocks reopened this Jan 3, 2024
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit 48b7988 into NVIDIA:branch-24.02 Jan 4, 2024
77 of 78 checks passed
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Jan 16, 2024
Fixes NVIDIA#10195.

This is similar to the fix in NVIDIA#10143.  This commit changes the test datagens used in
the window function tests such that the order-by columns produce deterministic ordering.

When the ordering is ambiguous, it can produce unexpected results from window functions,
if the `order-by` spec includes the ambiguous columns.

Signed-off-by: MithunR <[email protected]>
mythrocks added a commit that referenced this pull request Jan 18, 2024
Fixes #10195.

This is similar to the fix in #10143.  This commit changes the test datagens used in
the window function tests such that the order-by columns produce deterministic ordering.

When the ordering is ambiguous, it can produce unexpected results from window functions,
if the `order-by` spec includes the ambiguous columns.

Signed-off-by: MithunR <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
2 participants