Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional documentation for UTC representation of timestamps #6994

Merged
merged 3 commits into from
Jan 21, 2025

Conversation

Abdullahsab3
Copy link
Contributor

Which issue does this PR close?

closes #6993

Rationale for this change

Additional documentation of what representation should be used for UTC

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 18, 2025
@Abdullahsab3 Abdullahsab3 changed the title Main Add additional documentation for UTC representation of timestamps Jan 18, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @Abdullahsab3 . This change Makes sense to me.

I spetnt some time messing searching around in this repo and it does appear to use "+00:00" for UTC

https://github.com/search?q=repo%3Aapache%2Farrow-rs%20%22UTC%22&type=code

It also seems to be what is used by DataFusion mostly as well: https://github.com/search?q=repo%3Aapache%2Fdatafusion+%22UTC%22&type=code

💯

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

Seems as if there is one small CI failure

@alamb
Copy link
Contributor

alamb commented Jan 19, 2025

I plan to merge this in the next few days -- I will leave it open a bit longer to give anyone else a chance to comment who might be interested.

Thanks again @Abdullahsab3

Comment on lines +199 to +204
/// For UTC time
/// ----------------------------
/// For UTC time, it is possible to use either the timezone representation, such as "UTC", or the absolute time zone offset "+00:00".
/// However, it is better to use the offset representation, as it is more explicit and less ambiguous.
/// This also ensures that other arrow-rs functionalities can interpret the UTC timestamps correctly
/// For example, the `with_timezone_utc` method that is applied on timestamp arrays to add the UTC timezone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the original issue correctly, this isn't limited to UTC and +00:00, right? If we substitute JST and +09:00 we'd run into the same issue; in this case, it might be worth generalizing this documentation such that it isn't specific to UTC but more around offsets vs named timezones.

Copy link
Contributor Author

@Abdullahsab3 Abdullahsab3 Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is specific to UTC, since arrow-rs functionalities regarding UTC time (such as with_timezone_utc) uses +00:00 to indicate UTC. So if you create a timestamp array with the UTC timezone, and call with_timezone_utc, the call would fail.

I don't believe it's possible to substitute named timezones with fixed offsets. There is daylight savings that needs to be taken into account. for example Europe/Brussels is either +01:00 or +02:00 depending on DST

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but some timezones have fixed offsets where DST doesn't factor in (hence why I gave example of JST). Just thought it might be beneficial to make it clear that even if the named timezone has only 1 fixed offset ever, they still aren't equivalent in the eyes of arrow-rs (and can still highlight UTC as the main example for this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can try and make the statement a bit more general. However, as written I think this is an improvement over main so I will merge it and make a follow on PR with proposed improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb alamb merged commit 595a835 into apache:main Jan 21, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 21, 2025

Thank you @Abdullahsab3 and @Jefffrey 🙏

@alamb alamb added the documentation Improvements or additions to documentation label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UTC timestamps with "+00:00" are different than with UTC
3 participants