Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY` [databricks] #9658

ttnghia · 2023-11-07T23:40:16Z

This adds support for LEGACY mode in spark.sql.parquet.int96RebaseModeInWrite, which allows writing files containing ancient times before 1582-10-15 with rebasing from Proleptic Gregorian calendar times to Julian calendar times.

Closes:

[FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY #9037

Signed-off-by: Nghia Truong <[email protected]>

# Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Signed-off-by: Nghia Truong <[email protected]>

…IA#9617)" This reverts commit 401d0d8. Signed-off-by: Nghia Truong <[email protected]> # Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

# Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2023-11-07T23:53:55Z

build

ttnghia · 2023-11-07T23:54:50Z

build

ttnghia · 2023-11-08T05:22:00Z

build

revans2

I see the tests are updated and the code says that we now support rebase for in96 writes. But I don't see anywhere that the code was updated for it. I am assuming that the existing code just covered it and we are now enabling it after testing.

integration_tests/src/main/python/parquet_write_test.py

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2023-11-08T17:56:10Z

build

ttnghia · 2023-11-08T18:54:52Z

I see the tests are updated and the code says that we now support rebase for in96 writes. But I don't see anywhere that the code was updated for it. I am assuming that the existing code just covered it and we are now enabling it after testing.

Right, the existing code already handles the rebase computation. Now we just enable the corresponding code path and update tests.

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2023-11-08T23:05:33Z

build

revans2 · 2023-11-09T20:28:50Z

integration_tests/src/main/python/hive_write_test.py

@@ -85,8 +85,6 @@ def do_write(spark, table_name):
 @pytest.mark.skipif(not is_hive_available(), reason="Hive is missing")
 @pytest.mark.parametrize("gens", [_basic_gens], ids=idfn)
 @pytest.mark.parametrize("storage_with_confs", [
-    ("PARQUET", {"spark.sql.legacy.parquet.datetimeRebaseModeInWrite": "LEGACY",
-                 "spark.sql.legacy.parquet.int96RebaseModeInWrite": "LEGACY"}),


Why drop these?

This is a fallback test. We now have full support for LEGACY in write thus we don't fallback anymore.

This deleted test is similar: https://github.com/NVIDIA/spark-rapids/pull/9658/files#diff-33b5f03f5e0d5aa2b19d7666f2db31cb30b3c8d9b67271017ce960cc6ff40c78L789

ttnghia and others added 30 commits November 2, 2023 10:52

Add check for nested types

e368aa6

Recursively check for rebasing

7da416b

Extract common code

df8f861

Allow nested type in rebase check

95d19ee

Enable nested timestamp in roundtrip test

b426610

Fix another test

7343b17

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'check_rebase_nested' into rebase_datatime

0d48f57

Enable LEGACY rebase in read

024e6c9

Remove comment

9a39628

Change function/class signatures

e686bb0

Merge branch 'branch-23.12' into rebase_datatime

b49963e

# Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Complete modification

2c232f8

Misc

ac0f3e4

Signed-off-by: Nghia Truong <[email protected]>

Add explicit type

c773794

Signed-off-by: Nghia Truong <[email protected]>

Rename file and add some stuff in DateTimeRebaseHelpers.scala

29df7cd

Move file and rename class

1b5112d

Adopt new enum type

63342a9

Signed-off-by: Nghia Truong <[email protected]>

Add name for the enum classes

6b2d795

Change exception messages

37aa40b

Merge branch 'branch-23.12' into refactor_parquet_scan

d4cdc1b

Does not yet support legacy rebase in read

03f681e

Signed-off-by: Nghia Truong <[email protected]>

Change legacy to corrected mode

14f230f

Signed-off-by: Nghia Truong <[email protected]>

Extract common code

1b464ec

Signed-off-by: Nghia Truong <[email protected]>

Rename functions

0d26d97

Signed-off-by: Nghia Truong <[email protected]>

Reformat

c2504fd

Signed-off-by: Nghia Truong <[email protected]>

Make classes serializable

edb6c81

Signed-off-by: Nghia Truong <[email protected]>

Revert "Support rebase checking for nested dates and timestamps (NVID…

ea86e8f

…IA#9617)" This reverts commit 401d0d8. Signed-off-by: Nghia Truong <[email protected]> # Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Merge branch 'refactor_parquet_scan' into rebase_datatime

b14463f

# Conflicts: # sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala # sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala

Implement date time rebase

adc8ae2

Optimize rebase op

791573c

ttnghia added 7 commits November 7, 2023 11:21

Still cast timestamp to the old type after rebasing

273b2c4

Signed-off-by: Nghia Truong <[email protected]>

Rename test

996d9d4

Signed-off-by: Nghia Truong <[email protected]>

Should not transform non-datetime types

5fd6ef5

Signed-off-by: Nghia Truong <[email protected]>

Fix test

4144655

Update tests

5a8b44c

Signed-off-by: Nghia Truong <[email protected]>

Enable int96 rebase in write

e366e5a

Complete tests

8eba053

Signed-off-by: Nghia Truong <[email protected]>

ttnghia added the task Work required that improves the product but is not user facing label Nov 7, 2023

ttnghia self-assigned this Nov 7, 2023

Revert unrelated changes

bda59ef

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-23.12' into int96_rebase_write

bbcd9d9

ttnghia mentioned this pull request Nov 8, 2023

Fully support date/time legacy rebase for nested input [databricks] #9660

Merged

revans2 previously approved these changes Nov 8, 2023

View reviewed changes

integration_tests/src/main/python/parquet_write_test.py Show resolved Hide resolved

ttnghia added 2 commits November 8, 2023 09:42

Fix tests

62a1686

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-23.12' into int96_rebase_write

6bda224

ttnghia dismissed revans2’s stale review via 6bda224 November 8, 2023 17:45

Change date/time generators

27967e4

Signed-off-by: Nghia Truong <[email protected]>

Fix test

c67a53a

Signed-off-by: Nghia Truong <[email protected]>

ttnghia requested a review from revans2 November 9, 2023 14:33

revans2 reviewed Nov 9, 2023

View reviewed changes

ttnghia linked an issue Nov 14, 2023 that may be closed by this pull request

[FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY #9037

Closed

revans2 approved these changes Nov 14, 2023

View reviewed changes

revans2 merged commit 4fdd7bd into NVIDIA:branch-23.12 Nov 14, 2023
37 checks passed

ttnghia deleted the int96_rebase_write branch November 14, 2023 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY` [databricks] #9658

Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY` [databricks] #9658

ttnghia commented Nov 7, 2023 •

edited

Loading

ttnghia commented Nov 7, 2023

ttnghia commented Nov 7, 2023

ttnghia commented Nov 8, 2023

revans2 left a comment

ttnghia commented Nov 8, 2023

ttnghia commented Nov 8, 2023

ttnghia commented Nov 8, 2023

revans2 Nov 9, 2023

ttnghia Nov 9, 2023

ttnghia Nov 9, 2023 •

edited

Loading

Support spark.sql.parquet.int96RebaseModeInWrite=LEGACY [databricks] #9658

Support spark.sql.parquet.int96RebaseModeInWrite=LEGACY [databricks] #9658

Conversation

ttnghia commented Nov 7, 2023 • edited Loading

ttnghia commented Nov 7, 2023

ttnghia commented Nov 7, 2023

ttnghia commented Nov 8, 2023

revans2 left a comment

Choose a reason for hiding this comment

ttnghia commented Nov 8, 2023

ttnghia commented Nov 8, 2023

ttnghia commented Nov 8, 2023

revans2 Nov 9, 2023

Choose a reason for hiding this comment

ttnghia Nov 9, 2023

Choose a reason for hiding this comment

ttnghia Nov 9, 2023 • edited Loading

Choose a reason for hiding this comment

Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY` [databricks] #9658

Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY` [databricks] #9658

ttnghia commented Nov 7, 2023 •

edited

Loading

ttnghia Nov 9, 2023 •

edited

Loading