Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Move timezone check to each operator [databricks] #9482

Closed
Closed
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
d8e77b2
Add test cases for timezone awarded operators
Oct 19, 2023
3f781a4
Move timezone check to each operator
Oct 19, 2023
d5a6d7a
Merge branch 23.12
Oct 27, 2023
b3fa3ee
Update
Oct 27, 2023
c31b2e3
debug
Oct 27, 2023
a7c8996
debug
Oct 27, 2023
2878c5c
Add timezone test mark
Oct 27, 2023
705f8b5
Minor update
Nov 1, 2023
882b751
Fix failed cmp case on Spark311; Restore a python import; minor changes
Nov 1, 2023
aec893c
Fix failure on Databricks
Nov 2, 2023
7f81644
Update test cases for Databricks
Nov 2, 2023
bcc1f5b
Update test cases for Databricks
Nov 2, 2023
505b72e
Fix delta lake test cases.
Nov 3, 2023
07942ea
Fix delta lake test cases.
Nov 3, 2023
3033bc3
Remove the skip logic when time zone is not UTC
Nov 7, 2023
a852455
Add time zone config to set non-UTC
Nov 7, 2023
0358cd4
Add fallback case for cast_test.py
Nov 7, 2023
f6ccadd
Add fallback case for cast_test.py
Nov 7, 2023
21d5a69
Add fallback case for cast_test.py
Nov 8, 2023
e2aa9da
Add fallback case for cast_test.py
Nov 8, 2023
9eab476
Update split_list
Nov 8, 2023
e231a80
Add fallback case for cast_test.py
Nov 8, 2023
71928a0
Add fallback case for cast_test.py
Nov 8, 2023
ca23932
Add fallback cases for cmp_test.py
Nov 9, 2023
ee60bea
Add fallback tests for json_test.py
firestarman Nov 9, 2023
d403c59
add non_utc fallback for parquet_write qa_select and window_function …
thirtiseven Nov 9, 2023
dd5ad0b
Add fallback tests for conditionals_test.py
winningsix Nov 9, 2023
058e13e
Add fallback cases for collection_ops_test.py
Nov 9, 2023
fc3a678
add fallback tests for date_time_test
thirtiseven Nov 9, 2023
938c649
clean up spark_session.py
thirtiseven Nov 9, 2023
befa39d
Add fallback tests for explain_test and csv_test
winningsix Nov 9, 2023
cf2c621
Update test case
Nov 9, 2023
c298d5f
update test case
Nov 9, 2023
09e772c
Add default value
Nov 10, 2023
f43a8f9
Remove useless is_tz_utc
Nov 10, 2023
5882cc3
Fix fallback cases
Nov 10, 2023
7a53dc2
Add bottom check for time zone; Fix ORC check
Nov 13, 2023
7bd9ef8
By default, ExecCheck do not check UTC time zone
Nov 13, 2023
9817c4e
For common expr like AttributeReference, just skip the UTC checking
Nov 13, 2023
f8505b7
For common expr like AttributeReference, just skip the UTC checking
Nov 13, 2023
fa1c84d
For common expr like AttributeReference, just skip the UTC checking
Nov 13, 2023
fbbbd5b
Update test cases
Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add fallback cases for cmp_test.py
Chong Gao committed Nov 9, 2023
commit ca23932bb0402b95e53ec1146f81e8fd93c041e7
27 changes: 24 additions & 3 deletions integration_tests/src/main/python/cmp_test.py
Original file line number Diff line number Diff line change
@@ -14,9 +14,10 @@

import pytest

from asserts import assert_gpu_and_cpu_are_equal_collect
from asserts import assert_gpu_and_cpu_are_equal_collect, assert_gpu_fallback_collect
from conftest import is_utc, is_not_utc
from data_gen import *
from marks import disable_timezone_test
from marks import allow_non_gpu
from spark_session import with_cpu_session, is_before_spark_330
from pyspark.sql.types import *
import pyspark.sql.functions as f
@@ -291,8 +292,8 @@ def test_filter_with_project(data_gen):
# no columns to actually filter. We are making it happen here with a sub-query
# and some constants that then make it so all we need is the number of rows
# of input.
@pytest.mark.xfail(is_not_utc(), reason="TODO sub-issue in https://github.com/NVIDIA/spark-rapids/issues/9653 to support non-UTC tz for DateAddInterval")
@pytest.mark.parametrize('op', ['>', '<'])
@disable_timezone_test
def test_empty_filter(op, spark_tmp_path):

def do_it(spark):
@@ -309,6 +310,26 @@ def do_it(spark):
return spark.sql(f"select * from empty_filter_test2 where test {op} current_date")
assert_gpu_and_cpu_are_equal_collect(do_it)


@allow_non_gpu('ProjectExec', 'FilterExec')
@pytest.mark.skipif(is_utc(), reason="TODO sub-issue in https://github.com/NVIDIA/spark-rapids/issues/9653 to support non-UTC tz for DateAddInterval")
@pytest.mark.parametrize('op', ['>', '<'])
def test_empty_filter_for_non_utc(op, spark_tmp_path):

def do_it(spark):
df = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
# we repartition the data to 1 because for some reason Spark can write 4 files for 3 rows.
# In this case that causes a race condition with the last aggregation which can result
# in a null being returned. For some reason this happens a lot on the GPU in local mode
# and not on the CPU in local mode.
df.repartition(1).write.mode("overwrite").parquet(spark_tmp_path)
df = spark.read.parquet(spark_tmp_path)
curDate = df.withColumn("current_date", f.current_date())
curDate.createOrReplaceTempView("empty_filter_test_curDate")
spark.sql("select current_date, ((select last(current_date) from empty_filter_test_curDate) + interval 1 day) as test from empty_filter_test_curDate").createOrReplaceTempView("empty_filter_test2")
return spark.sql(f"select * from empty_filter_test2 where test {op} current_date")
assert_gpu_fallback_collect(do_it, 'DateAddInterval')

def test_nondeterministic_filter():
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, LongGen(), 1).filter(f.rand(0) > 0.5))