Releases: chronotope/chrono
v0.2.8
v0.2.7
v0.2.6
v0.2.5
v0.2.4
v0.2.3
v0.2.2
v0.2.1
v0.2.0
Original announcements:
- https://lifthrasiir.github.io/rustlog/worklog-2015-01-13.html
- https://lifthrasiir.github.io/rustlog/worklog-2015-02-19.html
I've finally got the initial design of Chrono 0.2 working. It took so much time, partially because I'm working on dozens of other projects including Rust libraries. It solves one of the major annoyance with Chrono, so I'm glad that this new design is promising.
Time zones in Chrono 0.1
As always, I start by pointing to Erik Naggum's excellent essay about date and time. (Yes, I'm terrible at story telling and I won't say a lot about that.) Most importantly, the core aspect of time zone is that
it is ultimately a political creation rather than a natural necessity. This complicates lots of things, though Author David Olson and others have done a tremendously important work in this regard.
There are several problems with time zones:
- There is no reliable way to handle the local date in the future.
- There can exist a local date which occurred in two or more instants.
- There can exist a local date which never has been occurred.
- The conversion process itself is seriously annoying and easy to make a mistake.
(If you are interested in the timekeeping, you'll realize that this list equally applies to leap seconds. I had to deal with them in Chrono as well, and chose to make them invisible to the normal usage.)
In fact, Chrono's original Offset
design does explicitly account for these problems. In Chrono, the local date is a concept only meaningful to accessors and formatting routines, eliminating the very source of 1. The possibility of 2 and 3 is handled via the LocalResult
enum, while 4 is handled via... delegating everything to the Offset
. This decision is partly because we would only have a handful number of Offset
implementations, so we have to implement them anyway. In the end, we had something like this:
pub trait Offset: Clone + Show {
fn local_minus_utc(&self) -> Duration;
fn from_local_date(&self, local: &NaiveDate) -> LocalResult<Date<Self>>;
fn from_local_time(&self, local: &NaiveTime) -> LocalResult<Time<Self>>;
fn from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<DateTime<Self>>;
fn to_local_date(&self, utc: &NaiveDate) -> NaiveDate;
fn to_local_time(&self, utc: &NaiveTime) -> NaiveTime;
fn to_local_datetime(&self, utc: &NaiveDateTime) -> NaiveDateTime;
fn ymd(&self, year: i32, month: u32, day: u32) -> Date<Self> { ... }
// other constructors follow
}
pub struct DateTime<Off> {
datetime: NaiveDateTime,
offset: Off
}
// Date and Time follows
This sounds good. You can put the offset data into the timezone-aware DateTime
, and use it to convert to the local date (to_local_date
). In the converse, DateTime
has to be created from Offset
so that it converts to the internal representation in UTC (from_local_date
).
But you might wonder: why is local_minus_utc
separate from to_local_datetime
? The latter can be implemented via the former, right? Yes! In the current implementation the latter is redundant. And this redundancy suggests a bigger problem.
Originally, to_local_datetime
was to be used in the absence of the exact offset to UTC. This alone is enough for converting UTC to the local time, and it is still used in the Offset
conversion where the original value is converted to UTC then to the target time zone. But this is inefficient, especially if we have a large table of zone transitions. Therefore we have local_minus_utc
for caching the calculated current offset. The caller was expected to call local_minus_utc
for converting the current value to UTC and to_local_datetime
etc. for other cases. For the fixed-offset time zones like UTC
these methods will be largely a simple arithmetic, so we only pay what we use.
In the reality this scheme didn't work well. Local
was an example of the glaring problem; since it has to cache the value, it should have a field, but that meant that we need to create a Local
instance every time we convert to the local date! This defies the simple interface like dt.with_offset(UTC)
, and since such Local
instance doesn't know about the exact offset, we have a separate flag indicating the offset is correct or not. (In Rust, this would translate to Option<FixedOffset>
.) This even breaks the original premise of "only paying what we actually use".
After I realized this problem, I'd tried several solutions and spectacularly failed. It was clear that we really need two kinds of types, but I was not sure how to do that.
Time zones in Chrono 0.2
Associated types, originally proposed in RFC #195, were a game changer for this problem. They allow for two types to be smoothly connected in the compile time. The resulting design is as follows:
pub trait Offset: Sized + Clone + fmt::Show {
fn local_minus_utc(&self) -> Duration;
}
pub trait TimeZone: Sized {
type Offset: Offset;
fn from_offset(offset: &Self::Offset) -> Self;
fn offset_from_local_date(&self, local: &NaiveDate) -> LocalResult<Self::Offset>;
fn offset_from_local_time(&self, local: &NaiveTime) -> LocalResult<Self::Offset>;
fn offset_from_local_datetime(&self, local: &NaiveDateTime) -> LocalResult<Self::Offset>;
fn offset_from_utc_date(&self, utc: &NaiveDate) -> Self::Offset;
fn offset_from_utc_time(&self, utc: &NaiveTime) -> Self::Offset;
fn offset_from_utc_datetime(&self, utc: &NaiveDateTime) -> Self::Offset;
// helpers and constructors follow
}
pub struct DateTime<Tz: TimeZone> {
datetime: NaiveDateTime,
offset: Tz::Offset,
}
// Date and Time follows
This new design directly shows that we are dealing with two different types! TimeZone
creates an Offset
which is a storage-oriented type, which can be converted back to the TimeZone
via from_offset
. TimeZone
is used for creating date and time values, while Offset
is used for converting to the local time. I originally tried to avoid separate trait for local_minus_utc
, but ultimately abandoned that plan to make dt.offset().local_minus_utc()
possible.
There are a set of TimeZone
s and Offset
s available:
TimeZone instance |
Offset instance |
---|---|
UTC |
UTC |
FixedOffset |
FixedOffset |
Local |
FixedOffset |
TzFile (*) |
TzFileOffset (*) |
TzRule (*) |
TzRuleOffset (*) |
(* Some instances are under the development.)
One can note that some time zones are their own offsets, as they do not have an additional data (i.e. cache) for the storage, and Local
reuses the FixedOffset
as Local
itself has no state. Other time zones have to cache its offset and the transition data, hence separate offset types.
TzFile
and TzRule
is a part of tzdata support; they will be typically used as a reference-counted version, such as Rc<TzFile>
or Arc<TzRule>
, to avoid the deep copying (which is common in Chrono). TzRule
implements the rule string in POSIX TZ
environment variable, which is not that useful by its own (since we can use Local
anyway), but it's TzFile
's solution to the aforementioned problem 1: the future zone transition is encoded as a form of TzRule
, and we need to implement it.
At the end this new design is quite promising, but even after #![feature(associated_types)]
is gone associated types are still cutting edge features. I had to disable debuginfo due to the Rust issue #21010, for example. Hopefully though, this shall not affect the validity of this design.
New formatting and parsing API
Chrono 0.2 has three different pieces of new APIs redesigned for formatting and parsing:
- Formatting syntax representation ("items") and parsing
- Formatting with items
- Parsing with items
Altogether they form an advanced formatting facility in Chrono 0.2.
I'll try to briefly discuss their designs and justifications.
Formatting Items
In Chrono, a formatting item is a unit of formatting or parsing. For example a strftime
-like format string %Y-%m-%d
has five different formatting items: %Y
, -
, %m
, -
and %d
. Chrono decouples a formatting syntax from the actual meaning of formatting items, so they have the following (somewhat verbose) internal representations:
[Item::Numeric(Numeric::Year, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Month, Pad::Zero),
Item::Literal("-"),
Item::Numeric(Numeric::Day, Pad::Zero)]
This decoupling allows Chrono to support multiple formatting syntax, such as YYYY-MM-DD
or Go-like 2006-01-02
instead. Also, Chrono can have "hidden" formatting items that can be used for internal purposes. RFC 2822 and 3339 support is implemented in this way.
The formatting item is a good abstraction, but every abstraction comes with a complexity. In the case of Chrono the complexity arises from the desire to avoid allocation. The number of formatting items is proportional to the length of format string in the worst case, so we cannot blindly collect items into a collection. Instead, Chrono returns an Iterator
of formatting items and directly consumes that iterator for printing the date and time. Therefore the following identity holds:
assert_eq!(StrftimeItems::new("%Y-%m-%d").collect::<Vec<_>>(),
[Item::Numeric(Numeric::Year, Pad::Zero),
Item::Literal("-"),
Item::Numeri...