Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DateutilTimePeriodAdjuster #1233

Merged
merged 4 commits into from
May 31, 2024
Merged

Add DateutilTimePeriodAdjuster #1233

merged 4 commits into from
May 31, 2024

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented May 30, 2024

Description

This adds an implementation for adjusting time periods using dateutil instead
of pandas.

@cla-bot cla-bot bot added the cla:yes label May 30, 2024
@plypaul plypaul marked this pull request as ready for review May 30, 2024 01:51
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! nitty nitty inline but the only really important thing is to sync up with @courtneyholcomb about the sub-daily granularity transition and whether exceptions are preferable to silent date assumptions.

Comment on lines +56 to +57
if time_granularity is TimeGranularity.DAY:
return date_to_adjust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes all input has a minimum granularity of daily and silently returns whatever is passed in. The Pandas implementation raises an exception in this case, which might be preferable for testing scenarios.

I'm fine with either approach since we're going to update this within the next couple of weeks, but @courtneyholcomb is going to be doing the work on opening up granularity so maybe check in with her about what she prefers before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinged her, but preferring to fix later if there are issues to unblock merging of the stack.

elif time_granularity is TimeGranularity.WEEK:
return date_to_adjust + relativedelta(weekday=dateutil.relativedelta.MO(-1))
elif time_granularity is TimeGranularity.MONTH:
return date_to_adjust + relativedelta(day=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API.... TIL day=1 and days=1 are vastly different things.

Oh well, it works, just read very carefully is all.....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved documentation and more examples of that API would have helped - had to do play around with the methods in the REPL.

elif time_granularity is TimeGranularity.WEEK:
return date_to_adjust + relativedelta(weekday=dateutil.relativedelta.SU(1))
elif time_granularity is TimeGranularity.MONTH:
return date_to_adjust + relativedelta(day=31)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting that docstring note on the class, I would've wondered about this.

def date_times_to_check() -> Sequence[datetime.datetime]: # noqa: D103
date_times = []
# Cover regular and leap years.
start_date_time = datetime.datetime(year=2020, month=1, day=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the year 2000, which looks like a leap year and is? And also 1900, which looks like a leap year, but isn't?

I hate date/time operations.....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both 1900 and 2000 work, but 1900 exceeds the limit of supported times, so that was left as a comment.

), f"Expansion mismatch: {pandas_adjuster_result=} {dateutil_adjuster_result=} {time_granularity=}"
finished_count += 1
if finished_count % 100000 == 0 or finished_count == test_case_count:
logger.info(f"Progress {finished_count / test_case_count * 100:.0f}%")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many INFO rows does this print? If it's a lot we might want to put it in debug just so it's not filling up CI logs, but I assume it' snot that many since it's one every 100k.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<10 lines

for time_granularity in TimeGranularity
for end_time in (
start_time + datetime.timedelta(days=day_offset)
for day_offset in range(grain_to_count_in_year[time_granularity] + 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2? Why plus two? To guarantee a spillover across the year boundary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added a comment about it.

@@ -0,0 +1,3652 @@
Date Period Grain Period Start Period End
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a leftover from an earlier version of the test. Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@plypaul plypaul force-pushed the p--py312--02 branch 2 times, most recently from fa6d0a0 to 2f50de9 Compare May 31, 2024 01:09
Base automatically changed from p--py312--01 to main May 31, 2024 01:12
This adds an implementation for adjusting time periods using `dateutil` instead
of `pandas`. Later, `DateutilTimePeriodAdjuster` will replace the `pandas`
implementation.
@plypaul plypaul merged commit 4bbccce into main May 31, 2024
15 checks passed
@plypaul plypaul deleted the p--py312--02 branch May 31, 2024 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants