Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully support date/time legacy rebase for nested input [databricks] #9660

Merged
merged 78 commits into from
Nov 16, 2023

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Nov 8, 2023

This adds full support for date/time legacy rebase for the input containing dates/timestamps nested under other columns.

Most related tests are updated to reflect the changes.

Depends on:

Closes #1126.


Warn: This PR contains code from the dependency PRs above and will be hidden after they are merged.

ttnghia and others added 30 commits August 28, 2023 16:15
Signed-off-by: Nghia Truong <[email protected]>
# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
…IA#9617)"

This reverts commit 401d0d8.

Signed-off-by: Nghia Truong <[email protected]>

# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
# Conflicts:
#	sql-plugin/src/main/scala/com/nvidia/spark/RebaseHelper.scala
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
# Conflicts:
#	integration_tests/src/main/python/parquet_test.py
# Conflicts:
#	integration_tests/src/main/python/parquet_test.py
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia added the task Work required that improves the product but is not user facing label Nov 8, 2023
@ttnghia ttnghia self-assigned this Nov 8, 2023
@ttnghia ttnghia changed the title Fully support date/time legacy rebase for nested input Fully support date/time legacy rebase for nested input [databricks] Nov 8, 2023
# Conflicts:
#	integration_tests/src/main/python/parquet_test.py
#	integration_tests/src/main/python/parquet_write_test.py
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
#	sql-plugin/src/main/scala/com/nvidia/spark/rapids/datetimeRebaseUtils.scala
Comment on lines -316 to -317
# Once https://github.com/NVIDIA/spark-rapids/issues/1126 is fixed delete this test and merge it
# into test_parquet_read_roundtrip_datetime
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deleted test is combined with the test_parquet_read_roundtrip_datetime.

Comment on lines +78 to +83
parquet_datetime_in_struct_gen = [
StructGen([['child' + str(ind), sub_gen] for ind, sub_gen in enumerate(parquet_datetime_gen_simple)])]
parquet_datetime_in_array_gen = [ArrayGen(sub_gen, max_length=10) for sub_gen in
parquet_datetime_gen_simple + parquet_datetime_in_struct_gen]
parquet_nested_datetime_gen = parquet_datetime_gen_simple + parquet_datetime_in_struct_gen + \
parquet_datetime_in_array_gen
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the data generators a bit, since they are too heavy and the tests using them (especially in parquet read tests) become very very slow now.

Signed-off-by: Nghia Truong <[email protected]>
# Conflicts:
#	integration_tests/src/main/python/parquet_write_test.py
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 15, 2023

build

Signed-off-by: Nghia Truong <[email protected]>
revans2
revans2 previously approved these changes Nov 15, 2023
@ttnghia ttnghia marked this pull request as ready for review November 15, 2023 18:26
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 16, 2023

build

@ttnghia ttnghia requested a review from revans2 November 16, 2023 00:15
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 16, 2023

build

@revans2 revans2 merged commit a7fa0df into NVIDIA:branch-23.12 Nov 16, 2023
37 checks passed
@ttnghia ttnghia deleted the rebase_nested_timestamp branch November 16, 2023 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] support rebase checking for nested dates and timestamps
2 participants