Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST) #9810

Merged
merged 12 commits into from
Nov 28, 2023

Conversation

NVnavkumar
Copy link
Collaborator

@NVnavkumar NVnavkumar commented Nov 21, 2023

Fixes #9802 and closes #6831

This uses the GpuTimeZoneDB from spark-rapids-jni to enable from_utc_timestamp on the GPU for non-UTC timezones (currently without DST).

@NVnavkumar NVnavkumar self-assigned this Nov 21, 2023
integration_tests/src/main/python/date_time_test.py Outdated Show resolved Hide resolved
inputCv,
ZoneId.of(zoneStr))
withResource(createColumnVector(epochSeconds)) { inputCv =>
val actualRet = if (useGPU) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to compare CPU TimezoneDB result against Spark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this includes part of another PR #9739 which should be merged first after NVIDIA/spark-rapids-jni#1553 is merged

@NVnavkumar NVnavkumar marked this pull request as ready for review November 22, 2023 18:21
@NVnavkumar NVnavkumar changed the title [WIP] Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST) Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST) Nov 22, 2023
@NVnavkumar NVnavkumar requested a review from revans2 November 22, 2023 18:51
@NVnavkumar
Copy link
Collaborator Author

build

1 similar comment
@NVnavkumar
Copy link
Collaborator Author

build

Copy link
Collaborator

@winningsix winningsix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General looks good to me. A few minor comments left.

val epochDays = getEpochDays(startYear, endYear)
testFromDateToTimestamp(epochDays, zoneStr)
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you indicate to revert?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per style, we generally have a blank line at the end of every file btw.

class TimeZoneSuite extends SparkQueryCompareTestSuite with BeforeAndAfterAll {
private val useGPU = true
private val testAllTimezones = false
private val testAllYears = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parameterized this in test cases below?

@NVnavkumar
Copy link
Collaborator Author

build

1 similar comment
@NVnavkumar
Copy link
Collaborator Author

build

@sameerz sameerz added the feature request New feature or request label Nov 28, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good, but I really want to see us starting to run automated tests in other time zones. Have you tested the date_time_test.py changes with TZ set to a non-UTC time zone?


@pytest.mark.parametrize('time_zone', ["UTC", "Asia/Shanghai", "EST", "MST", "VST"], ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
@pytest.mark.xfail(condition = is_not_utc(), reason = 'xfail non-UTC time zone tests because of https://github.com/NVIDIA/spark-rapids/issues/9653')
def test_from_utc_timestamp_supported_timezones(data_gen, time_zone):
# Remove spark.rapids.test.CPU.timezone configuration when GPU kernel is ready to really test on GPU
# TODO: Remove spark.rapids.sql.nonUTC.enabled configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #9881 and added to timezone epic

@NVnavkumar
Copy link
Collaborator Author

In general this looks good, but I really want to see us starting to run automated tests in other time zones. Have you tested the date_time_test.py changes with TZ set to a non-UTC time zone?

I have tested it with other timezones. from_utc_timestamp explicitly doesn't actually care about the session timezone so it behaves pretty much the same way as before. But again with Xfail, will need to figure out if it does fail in the same way.

@NVnavkumar
Copy link
Collaborator Author

NVnavkumar commented Nov 28, 2023

In general this looks good, but I really want to see us starting to run automated tests in other time zones. Have you tested the date_time_test.py changes with TZ set to a non-UTC time zone?

I have tested it with other timezones. from_utc_timestamp explicitly doesn't actually care about the session timezone so it behaves pretty much the same way as before. But again with Xfail, will need to figure out if it does fail in the same way.

Automated testing of non-UTC timezones should be captured by #9855 and #9633

@NVnavkumar NVnavkumar merged commit 9af612f into NVIDIA:branch-23.12 Nov 28, 2023
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
4 participants