-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid testing inset
with NaN
for Spark before 3.2.0
#9911
Conversation
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
Signed-off-by: Peixin Li <[email protected]>
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
[auto-merge] branch-23.12 to branch-24.02 [skip ci] [bot]
fix NVIDIA#9493 fix NVIDIA#9844 The python runner uses two separate threads to write and read data with Python processes, however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread. Now the first reading is always ahead of the first writing. But the original BatchQueue will wait on the first reading until the first writing is done. Then it will wait forever. Change made: - Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. This can eliminate the order requirement of reading and writing. - Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number peek on demand for the reading. - Apply this new BatchQueue to relevant plans. - Update the Python runners to support writing one batch one time for the singled-threaded model. - Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3, and add a test (test_window_aggregate_udf_on_cpu) for it. - Other small refactors --------- Signed-off-by: Firestarman <[email protected]>
…VIDIA#9888) * Refactor deploy to support build and deploy arm64 artifacts Signed-off-by: Peixin Li <[email protected]> * test only * reset test code and update * address comment --------- Signed-off-by: Peixin Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
…A#9890)" [databricks] (NVIDIA#9900) * Revert "Remove Databricks 13.3 from release 23.12 [databricks] (NVIDIA#9890)" This reverts commit c59b0a2. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
build |
docs/compatibility.md
Outdated
our plugin cannot guarantee to always match its output with Apache Spark if there are `NaN` values | ||
in the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very scary statement as written since it's so vague. Ideally we need to be more specific to what operators are affectged or otherwise users may think they can't trust any query that has floating point comparisons in it since there might be NaN values and the output will be very wrong.
We just went through an exercise to remove "avoid NaNs" floating point generators, so I wouldn't expect our NaN behavior to not match Spark for a lot of things. There should be relatively few things to enumerate here, or am I missing something? Do we know of places where NaNs don't match after Spark 3.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need to be very specific and Ideally point to specific issues that we have to fix the problems, even if they are low priority because we don't think NaN is common in these specific cases.
Signed-off-by: Nghia Truong <[email protected]>
fd4148c
to
a628087
Compare
cee8e17
to
b4e2400
Compare
I messed up with the branch code so closing this and will open a new PR. |
Before Apache Spark 3.2.0, 3.1.3, 3.0.4, the
inset
operator may treatNaN
as different values (https://issues.apache.org/jira/browse/SPARK-36792) while our plugin and Spark from 3.2.0 comparesNaN
as equal values. This eliminatesNaN
from the input test for Spark before version 3.2 and updates documentation about such inconsistent outcomes ofNaN
comparison.Closes #9687.