-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Move timezone check to each operator [databricks] #9482
Changes from 14 commits
d8e77b2
3f781a4
d5a6d7a
b3fa3ee
c31b2e3
a7c8996
2878c5c
705f8b5
882b751
aec893c
7f81644
bcc1f5b
505b72e
07942ea
3033bc3
a852455
0358cd4
f6ccadd
21d5a69
e2aa9da
9eab476
e231a80
71928a0
ca23932
ee60bea
d403c59
dd5ad0b
058e13e
fc3a678
938c649
befa39d
cf2c621
c298d5f
09e772c
f43a8f9
5882cc3
7a53dc2
7bd9ef8
9817c4e
f8505b7
fa1c84d
fbbbd5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1173,3 +1173,14 @@ def get_25_partitions_df(spark): | |
StructField("c3", IntegerType())]) | ||
data = [[i, j, k] for i in range(0, 5) for j in range(0, 5) for k in range(0, 100)] | ||
return spark.createDataFrame(data, schema) | ||
|
||
# If timezone is non-UTC and rebase mode is LEGACY, writing to Parquet will fail because of GPU | ||
# currently does not support. On Databricks, the default datetime rebase mode is LEGACY, | ||
# it's different from regular Spark. Some of the cases will fall if timezone is non-UTC on DB. | ||
# The following configs is for DB and ensure the rebase mode is not LEGACY on DB. | ||
writer_confs_for_DB = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a huge change for our testing. I think what we really want for now is to have all of the file read/write operators need to fallback to the CPU if there is a timestamp at all involved. It does not matter if we have LEGACY or not. Until the code has been checked and updated, if needed, to work for other time zones we need to fall back to the CPU and skip the tests like we are doing in other places. I am not saying that the changes you have done to the plugin are wrong. I am just saying that I don't think this is the right way to fix the tests, and I don't have time right now to fully review that all of the tests you updated didn't lose some kind of coverage that we need/want when this happened. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still don't want this in here. I am fine if we xfail the DB write tests and point to why when the timezone is not UTC. But I don't want to change what we are testing unless we go through each test and verify that we are not losing coverage. This PR is already big enough. If you want to do this change it should be split off into another PR. |
||
'spark.sql.parquet.datetimeRebaseModeInWrite': 'CORRECTED', | ||
'spark.sql.parquet.datetimeRebaseModeInRead': 'CORRECTED', | ||
'spark.sql.parquet.int96RebaseModeInWrite' : 'CORRECTED', | ||
'spark.sql.parquet.int96RebaseModeInRead' : 'CORRECTED' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a follow on issue to enable/test sequence for all time zones? We don't support timestamps for sequence currently and there are a lot of tests that are failing/skipped for no good reason.