Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid testing inset with NaN for Spark before 3.2.0 #9911

Closed
wants to merge 70 commits into from

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Nov 30, 2023

Before Apache Spark 3.2.0, 3.1.3, 3.0.4, the inset operator may treat NaN as different values (https://issues.apache.org/jira/browse/SPARK-36792) while our plugin and Spark from 3.2.0 compares NaN as equal values. This eliminates NaN from the input test for Spark before version 3.2 and updates documentation about such inconsistent outcomes of NaN comparison.

Closes #9687.

nvauto and others added 30 commits November 14, 2023 14:14
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
nvauto and others added 8 commits November 30, 2023 08:04
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
fix NVIDIA#9493
fix NVIDIA#9844

The python runner uses two separate threads to write and read data with Python processes, 
however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread.
Now the first reading is always ahead of the first writing. But the original BatchQueue will wait
on the first reading until the first writing is done. Then it will wait forever.

Change made:

- Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. 
   This can eliminate the order requirement of reading and writing.
- Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number
   peek on demand for the reading.
- Apply this new BatchQueue to relevant plans.
- Update the Python runners to support writing one batch one time for the singled-threaded model.
- Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3,
   and add a test (test_window_aggregate_udf_on_cpu) for it.
- Other small refactors
---------

Signed-off-by: Firestarman <[email protected]>
…VIDIA#9888)

* Refactor deploy to support build and deploy arm64 artifacts

Signed-off-by: Peixin Li <[email protected]>

* test only

* reset test code and update

* address comment

---------

Signed-off-by: Peixin Li <[email protected]>
…A#9890)" [databricks] (NVIDIA#9900)

* Revert "Remove Databricks 13.3 from release 23.12 [databricks] (NVIDIA#9890)"

This reverts commit c59b0a2.

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

---------

Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia added documentation Improvements or additions to documentation test Only impacts tests labels Nov 30, 2023
@ttnghia ttnghia requested a review from jlowe November 30, 2023 22:46
@ttnghia ttnghia self-assigned this Nov 30, 2023
docs/compatibility.md Outdated Show resolved Hide resolved
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 30, 2023

build

docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
Comment on lines 91 to 92
our plugin cannot guarantee to always match its output with Apache Spark if there are `NaN` values
in the input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very scary statement as written since it's so vague. Ideally we need to be more specific to what operators are affectged or otherwise users may think they can't trust any query that has floating point comparisons in it since there might be NaN values and the output will be very wrong.

We just went through an exercise to remove "avoid NaNs" floating point generators, so I wouldn't expect our NaN behavior to not match Spark for a lot of things. There should be relatively few things to enumerate here, or am I missing something? Do we know of places where NaNs don't match after Spark 3.2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to be very specific and Ideally point to specific issues that we have to fix the problems, even if they are low priority because we don't think NaN is common in these specific cases.

docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia changed the base branch from branch-24.02 to branch-23.12 December 1, 2023 22:14
@ttnghia ttnghia marked this pull request as draft December 1, 2023 22:18
@ttnghia
Copy link
Collaborator Author

ttnghia commented Dec 1, 2023

I messed up with the branch code so closing this and will open a new PR.

@ttnghia ttnghia closed this Dec 1, 2023
@ttnghia ttnghia deleted the fix_test_in_set branch December 1, 2023 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] test_in_set fails when DATAGEN_SEED=1698940723
8 participants