Avoid testing `inset` with `NaN` for Spark before 3.2.0 #9911

ttnghia · 2023-11-30T22:46:54Z

Before Apache Spark 3.2.0, 3.1.3, 3.0.4, the inset operator may treat NaN as different values (https://issues.apache.org/jira/browse/SPARK-36792) while our plugin and Spark from 3.2.0 compares NaN as equal values. This eliminates NaN from the input test for Spark before version 3.2 and updates documentation about such inconsistent outcomes of NaN comparison.

Closes #9687.

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Signed-off-by: Peixin Li <[email protected]>

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

fix NVIDIA#9493 fix NVIDIA#9844 The python runner uses two separate threads to write and read data with Python processes, however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread. Now the first reading is always ahead of the first writing. But the original BatchQueue will wait on the first reading until the first writing is done. Then it will wait forever. Change made: - Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. This can eliminate the order requirement of reading and writing. - Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number peek on demand for the reading. - Apply this new BatchQueue to relevant plans. - Update the Python runners to support writing one batch one time for the singled-threaded model. - Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3, and add a test (test_window_aggregate_udf_on_cpu) for it. - Other small refactors --------- Signed-off-by: Firestarman <[email protected]>

…VIDIA#9888) * Refactor deploy to support build and deploy arm64 artifacts Signed-off-by: Peixin Li <[email protected]> * test only * reset test code and update * address comment --------- Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: Haoyang Li <[email protected]>

…A#9890)" [databricks] (NVIDIA#9900) * Revert "Remove Databricks 13.3 from release 23.12 [databricks] (NVIDIA#9890)" This reverts commit c59b0a2. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

Signed-off-by: Nghia Truong <[email protected]>

docs/compatibility.md

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2023-11-30T22:50:04Z

build

docs/compatibility.md

jlowe · 2023-12-01T14:52:21Z

docs/compatibility.md

+our plugin cannot guarantee to always match its output with Apache Spark if there are `NaN` values
+in the input.


This is a very scary statement as written since it's so vague. Ideally we need to be more specific to what operators are affectged or otherwise users may think they can't trust any query that has floating point comparisons in it since there might be NaN values and the output will be very wrong.

We just went through an exercise to remove "avoid NaNs" floating point generators, so I wouldn't expect our NaN behavior to not match Spark for a lot of things. There should be relatively few things to enumerate here, or am I missing something? Do we know of places where NaNs don't match after Spark 3.2?

I agree we need to be very specific and Ideally point to specific issues that we have to fix the problems, even if they are low priority because we don't think NaN is common in these specific cases.

docs/compatibility.md

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2023-12-01T22:31:05Z

I messed up with the branch code so closing this and will open a new PR.

nvauto and others added 30 commits November 14, 2023 14:14

Merge pull request NVIDIA#9694 from NVIDIA/branch-23.12

7118506

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9699 from NVIDIA/branch-23.12

1a548eb

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9700 from NVIDIA/branch-23.12

2f088e3

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9704 from NVIDIA/branch-23.12

eacf2c8

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9706 from NVIDIA/branch-23.12

3b0d65d

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9709 from NVIDIA/branch-23.12

03496bf

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9710 from NVIDIA/branch-23.12

b21e21f

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9713 from NVIDIA/branch-23.12

a3eee5c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9725 from NVIDIA/branch-23.12

f092553

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9727 from NVIDIA/branch-23.12

aeb70db

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9730 from NVIDIA/branch-23.12

a46849d

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9734 from NVIDIA/branch-23.12

a3d1e46

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9735 from NVIDIA/branch-23.12

ef427f4

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Initiate project version 24.02.0-SNAPSHOT (NVIDIA#9716)

342b67b

Signed-off-by: Peixin Li <[email protected]>

Merge pull request NVIDIA#9740 from NVIDIA/branch-23.12

3f1cddc

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9742 from NVIDIA/branch-23.12

42f38a2

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9749 from NVIDIA/branch-23.12

9ff3b7c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9752 from NVIDIA/branch-23.12

c18d6ef

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9754 from NVIDIA/branch-23.12

198cfbd

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9756 from NVIDIA/branch-23.12

ecea3f4

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9757 from NVIDIA/branch-23.12

9a85791

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9762 from NVIDIA/branch-23.12

88d88ac

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9764 from NVIDIA/branch-23.12

ef79e11

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9772 from NVIDIA/branch-23.12

12fa043

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9791 from NVIDIA/branch-23.12

6e3881b

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9793 from NVIDIA/branch-23.12

5e53ed5

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9796 from NVIDIA/branch-23.12

2b116a2

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9797 from NVIDIA/branch-23.12

e61ce58

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9799 from NVIDIA/branch-23.12

fb7b8fc

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Merge pull request NVIDIA#9803 from NVIDIA/branch-23.12

2660b0c

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

nvauto and others added 8 commits November 30, 2023 08:04

Merge pull request NVIDIA#9899 from NVIDIA/branch-23.12

7ae2635

[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]

Fix test_cast_string_ts_valid_format test (NVIDIA#9889)

7c653bf

Signed-off-by: Haoyang Li <[email protected]>

Change test

55200f0

Test with NaN from Spark 3.2

eb68a99

Signed-off-by: Nghia Truong <[email protected]>

Add docs

703c6e8

Signed-off-by: Nghia Truong <[email protected]>

ttnghia added documentation Improvements or additions to documentation test Only impacts tests labels Nov 30, 2023

ttnghia requested a review from jlowe November 30, 2023 22:46

ttnghia self-assigned this Nov 30, 2023

ttnghia commented Nov 30, 2023

View reviewed changes

docs/compatibility.md Outdated Show resolved Hide resolved

ttnghia added 2 commits November 30, 2023 14:48

Fix docs

b5cbe55

Signed-off-by: Nghia Truong <[email protected]>

Fix typo

cbd6112

Signed-off-by: Nghia Truong <[email protected]>

jlowe reviewed Dec 1, 2023

View reviewed changes

Change docs

a628087

Signed-off-by: Nghia Truong <[email protected]>

ttnghia requested review from tgravescs, GaryShen2008, NvTimLiu and pxLi as code owners December 1, 2023 22:14

ttnghia changed the base branch from branch-24.02 to branch-23.12 December 1, 2023 22:14

ttnghia marked this pull request as draft December 1, 2023 22:18

ttnghia force-pushed the fix_test_in_set branch from fd4148c to a628087 Compare December 1, 2023 22:21

Merge branch 'branch-23.12' into fix_test_in_set

b4e2400

ttnghia force-pushed the fix_test_in_set branch from cee8e17 to b4e2400 Compare December 1, 2023 22:26

ttnghia closed this Dec 1, 2023

ttnghia deleted the fix_test_in_set branch December 1, 2023 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid testing `inset` with `NaN` for Spark before 3.2.0 #9911

Avoid testing `inset` with `NaN` for Spark before 3.2.0 #9911

ttnghia commented Nov 30, 2023

ttnghia commented Nov 30, 2023

jlowe Dec 1, 2023

revans2 Dec 1, 2023

ttnghia commented Dec 1, 2023

		our plugin cannot guarantee to always match its output with Apache Spark if there are `NaN` values
		in the input.

Avoid testing inset with NaN for Spark before 3.2.0 #9911

Avoid testing inset with NaN for Spark before 3.2.0 #9911

Conversation

ttnghia commented Nov 30, 2023

ttnghia commented Nov 30, 2023

jlowe Dec 1, 2023

Choose a reason for hiding this comment

revans2 Dec 1, 2023

Choose a reason for hiding this comment

ttnghia commented Dec 1, 2023

Avoid testing `inset` with `NaN` for Spark before 3.2.0 #9911

Avoid testing `inset` with `NaN` for Spark before 3.2.0 #9911