Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support date_format via Gpu for non-UTC time zone [databricks] #9721

Merged
merged 11 commits into from
Dec 14, 2023

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Nov 15, 2023

PR includes:

  • Support date_format via Gpu for non-UTC time zone
  • Split test_date_format to test date and time separately. date_format(date) will introduce cast(date, timestamp) which is not supported in non-UTC now. Only test date_format(timestamp)
  • Add fall back case for non-supported time zone.

Signed-off-by: Chong Gao [email protected]

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core changes to dateTimeExpressions.scala looks good, but I don't feel good checking in code that changes how we tag TimeZoneAware/TimestampTypes until we have tests in place that verify that we are still falling back in all the right places.

@res-life
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar linked an issue Nov 21, 2023 that may be closed by this pull request
@res-life
Copy link
Collaborator Author

build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Nov 21, 2023
@res-life res-life changed the title Support CPU path for date_format function with timezone Support GPU path for date_format function with timezone Nov 27, 2023
@res-life
Copy link
Collaborator Author

build

@res-life res-life changed the base branch from branch-23.12 to branch-24.02 November 28, 2023 01:57
@res-life res-life changed the title Support GPU path for date_format function with timezone Support date_format via Gpu for non-UTC time zone Dec 5, 2023
@res-life res-life force-pushed the non-utc-date_format branch from 4fbfbb5 to e3941c2 Compare December 5, 2023 05:29
@res-life
Copy link
Collaborator Author

res-life commented Dec 5, 2023

Please review the last commit.

@res-life
Copy link
Collaborator Author

res-life commented Dec 5, 2023

build

integration_tests/src/main/python/time_zone_utils.py Outdated Show resolved Hide resolved

# get from Java:
# ZoneId.getAvailableZoneIds
# id.getRules.isFixedOffset || id.getRules.getTransitionRules.isEmpty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we use Python Java gateway (e.g., Py4j) to obtain the list instead of hardcoding it here? Besides the maintenance, just want to ensure we're consistent with Java on this behavior id.getRules.isFixedOffset || id.getRules.getTransitionRules.isEmpty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's invoked in the pytest marks, Python Java gateway is not available when Spark session is not started.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now is using the following:

@pytest.mark.skipif(is_supported_time_zone() ......)

The is_supported_time_zone works in the pytest marks.

integration_tests/src/main/python/date_time_test.py Outdated Show resolved Hide resolved
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr("date_format(a, '{}')".format(date_format)))

@pytest.mark.parametrize('date_format', supported_date_formats, ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
@pytest.mark.xfail(is_dst_time_zone(), reason="only support non-DST time zone, refer to https://github.com/NVIDIA/spark-rapids/issues/6839")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid xfail here?

If is_dst_time_zone = false, test result equals without fallback.
If is_dst_time_zone = true, test result equals with CPU operator fallback. This one can be combined with test_date_format_for_date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using xfail is to record the xfail/xpass cases.
I'm OK to use skipif, but we may forget it in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use fallback check instead? I am nervous on xfail which hides errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@res-life
Copy link
Collaborator Author

res-life commented Dec 5, 2023

build

@res-life
Copy link
Collaborator Author

res-life commented Dec 6, 2023

This one is depending on #9814
When #9814 is merged, I'll then update this one.

@res-life
Copy link
Collaborator Author

res-life commented Dec 8, 2023

Test passed locally:

  • export TZ=Iran; pass
  • export TZ=America/Los_Angeles; fallback works

@res-life res-life changed the title Support date_format via Gpu for non-UTC time zone Support date_format via Gpu for non-UTC time zone [databricks] Dec 8, 2023
@res-life
Copy link
Collaborator Author

res-life commented Dec 8, 2023

build

@res-life res-life requested a review from NVnavkumar December 8, 2023 10:48
@allow_non_gpu(*non_utc_allow)
def test_date_format(data_gen, date_format):
@pytest.mark.parametrize('data_gen', [date_gen], ids=idfn)
@allow_non_gpu('ProjectExec')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check whether it's supported timezone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date_format(date) will introduce cast(date as timestamp) which is not supported in non-UTC now.
After we support cast(date as timestamp), then we will update this case.

assert_gpu_fallback_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr("date_format(a, '{}')".format(date_format)),
'ProjectExec',
conf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1681,8 +1681,9 @@ object GpuOverrides extends Logging {
.withPsNote(TypeEnum.STRING, "A limited number of formats are supported"),
TypeSig.STRING)),
(a, conf, p, r) => new UnixTimeExprMeta[DateFormatClass](a, conf, p, r) {
override def isTimeZoneSupported = true
override def convertToGpu(lhs: Expression, rhs: Expression): GpuExpression =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: code style. not sure whether a line break need here.

winningsix
winningsix previously approved these changes Dec 12, 2023
@@ -1681,8 +1681,9 @@ object GpuOverrides extends Logging {
.withPsNote(TypeEnum.STRING, "A limited number of formats are supported"),
TypeSig.STRING)),
(a, conf, p, r) => new UnixTimeExprMeta[DateFormatClass](a, conf, p, r) {
override def isTimeZoneSupported = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the fact that DateFormatClass is not considered a TimeZoneAwareExpression, but requires support for non-UTC timezones. It's the last check for timezone requirement in Expressions. By default, this flag is false.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttnghia FYI. Some original notes around this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not a good practice to set some value as false by default then override true like this. Can we compute that value based on the input in the object constructor?

@res-life
Copy link
Collaborator Author

build

@NVnavkumar
Copy link
Collaborator

NVnavkumar commented Dec 14, 2023

pre-merge failed due to #10050

@NVnavkumar
Copy link
Collaborator

build

@res-life res-life merged commit 782de5c into NVIDIA:branch-24.02 Dec 14, 2023
37 of 38 checks passed
@res-life res-life deleted the non-utc-date_format branch December 14, 2023 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support DateFormat on GPU with a non-UTC timezone
7 participants