-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add date and timestamp support to to_json [databricks] #9600
Conversation
Signed-off-by: Andy Grove <[email protected]>
aa0e1e3
to
e7a1c13
Compare
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala
Outdated
Show resolved
Hide resolved
build |
Build failed with:
|
build |
2 similar comments
build |
build |
build |
@@ -628,7 +628,8 @@ def test_read_case_col_name(spark_tmp_path, v1_enabled_list, col_name): | |||
pytest.param(True, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/9517')), | |||
False | |||
]) | |||
def test_structs_to_json(spark_tmp_path, data_gen, ignore_null_fields, pretty): | |||
@pytest.mark.parametrize('timezone', ['UTC', 'Etc/UTC']) | |||
def test_structs_to_json(spark_tmp_path, data_gen, ignore_null_fields, pretty, timezone): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a fallback test for non-utc?
pytest.param(date_gen, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/9515')), | ||
pytest.param(timestamp_gen, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/9515')), | ||
date_gen, | ||
timestamp_gen, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some fallback tests for the options timestampFormat, dateFormat, and if "spark.sql.legacy.timeParserPolicy" is set to LEGACY? At least from the code all of these impact the format that is used to write the JSON output.
if (hasTimestamps) { | ||
// check if the default format is being used | ||
val defaultFormat = "yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]" | ||
val timestampFormat = a.options.getOrElse("timestampFormat", defaultFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does 100% of what we want
include a check for the legacyTimeParserPolicy. Either we need to use JSONOptions
directly to get the format that we should use, or we need to replicate the behavior exactly. I would prefer JSONOptions if possible, even through a trampoline, because it lets us know exactly what it is, even if it changes from one version of spark to another.
build |
@revans2 I have addressed feedback now. |
rule: DataFromReplacementRule | ||
) extends UnaryExprMeta[StructsToJson](expr, conf, parent, rule) { | ||
|
||
lazy val supportedTimezones = Seq("UTC", "Etc/UTC").map(ZoneId.of) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We can move this into
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Line 440 in 2667941
val UTC_TIMEZONE_ID = ZoneId.of("UTC").normalized() |
lazy val supportedTimezones = Seq("UTC", "Etc/UTC").map(ZoneId.of) | ||
|
||
override def tagExprForGpu(): Unit = { | ||
if (expr.options.get("pretty").exists(_.equalsIgnoreCase("true"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI. #9719 touches a little bit of timezone check. given StructsToJson
is also a TimeZoneAwareExpression
.
build |
build |
There is a failure after upmerging. I am looking into it. |
@revans2 Could you take another look when you get a chance |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful that we coordinate this going in with other timezone work being done, but nothing I see here as being a blocker.
@@ -438,6 +438,7 @@ object GpuOverrides extends Logging { | |||
"the Unicode version used by cuDF and the JVM may differ, resulting in some " + | |||
"corner-case characters not changing case correctly." | |||
val UTC_TIMEZONE_ID = ZoneId.of("UTC").normalized() | |||
val SUPPORTED_TIMEZONE_IDS = Seq("UTC", "Etc/UTC").map(ZoneId.of) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need to have an explicit allow list for specific ZoneIds instead of normalizing them and comparing them to UTC_TIMESONE_ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I wasn't familiar with ZoneId, but that makes sense. I have pushed that fix.
build |
Closes #9515
Part of #9518
This PR adds support for date and timestamp in to_json.
Changes: