-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement correct time zone type fallback #6021
base: main
Are you sure you want to change the base?
Conversation
5531232
to
c689e85
Compare
2efea0f
to
d5ea86a
Compare
@@ -74,7 +74,7 @@ | |||
"zzzz": "日本標準時", | |||
|
|||
"v": "JST", | |||
"vvvv": "日本時間", | |||
"vvvv": "日本標準時", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this is the only test diff, this PR changes the generic non-location names of probably every non-DST timezone.
|GMT-07:00 |Mexican Pacific Standard Time |Mexican Pacific Standard Time |Mazatlan Time |Mazatlan |mxmzt | | ||
|GMT-07:00/GMT-06:00|Mountain Time |Mountain Standard Time |Boise Time |Boise |usboi | | ||
|GMT-07:00/GMT-06:00|Mountain Time |Mountain Standard Time |Denver Time |Denver |usden | | ||
|GMT-07:00 |Mountain Standard Time |Mountain Standard Time |Phoenix Time |Phoenix |usphx | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the key change. UTS-35 is not very clear here, saying that the standard type should be used for non-DST zones only if the generic type is unavailable, but then gives this as an example. ICU4C behaves like this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicating generic non-location and specific non-location seems like a very positive change. But, I do not follow why you need the ALL CAPS metazone IDs.
It also seems wasteful to add the standard specific patterns that are not reachable from generic non-location format to the generic non-location data payload. I do want to keep these decoupled if possible.
What doesn't work if you only deduplicate the exact matches (standard specific with generic)?
"amevn": { | ||
"0": null, | ||
"11426160": "arme" | ||
"11426160": "arme", | ||
"14201280": "ARME", | ||
"21162180": "arme", | ||
"21564000": "ARME", | ||
"21686280": "arme" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: why the changes and growth in this file?
Ok I think I see, you are making a distinction between And I guess somewhere in here you are applying the UTS 35 "184 day" heuristic to determine whether a time zone is locked into standard time? |
Yes. I'm interpreting the 184 day thing as "this zone uses DST". UTS-35 should probably say that, and the 184 day thing is a crude way to determine that. |
@@ -165,6 +165,12 @@ pub struct ExemplarCitiesV1<'data> { | |||
#[yoke(prove_covariance_manually)] | |||
pub struct MetazoneGenericNamesV1<'data> { | |||
/// The default mapping between metazone id and localized metazone name. | |||
/// | |||
/// This mapping may contain a MetazoneId twice: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reproduce the entire section in question:
So, the spec tells us to use the Type Fallback when one of the three names is missing (generic, standard, daylight). Let's break this down:
Spot-checking, I see many examples of metazones that have all 3 names and ones that have only Standard. I do not see any other configurations, at least not in English. So if there are any instances of Also, there isn't ever a case where a particular locale could resolve to a different result:
So it doesn't seem right to me that |
Another concern I have is, if something like the Sunshine Protection Act gets passed (either in the US or anywhere else), then we start getting time zones that pass the 184-day rule but are actually Daylight Time. So I feel like the all-caps metazone is maybe too coarse. I do really like the principle of precomputing it, though. |
What's the math here? DST is already less than 184 days, at what point are you saying that rule won't detect it anymore? Our way of looking at actual TZDB rules will always detect DST anyway, that 184 thing in the spec really ought to be changed to "figure out if the time zone uses DST". |
There are proposals being floated which would make certain places have year-round Daylight Time. Like, Florida would be on year-round Eastern Daylight Time. It could be called Atlantic Standard Time, but Floridians would prefer to call it Eastern Daylight Time. This case is not modeled correctly in your model. |
It's unclear how such a time zone would be modeled in TZDB and CLDR. If it's modeled as a permanent DST rule, the UTS-35 algorithm that checks +-184 days would also fail on it, so I don't think it's fair to raise this as a blocker. |
This case could be easily fixed in my model by using e.g. titlecase for daylight only metazones. |
Well, it's a regression because saying "Eastern Time" is more correct than saying "Eastern Standard Time" for such cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what your expectation is of me regarding an approval of this?
We've established that it is not conformant with UTS-35, nor is it necessarily conformant with user expectations. You claim that it is conformant with ICU4J behavior, which, if it doesn't follow UTS 35, is not spec'd out anywhere.
The data size for generic-only and specific-only formatting increases, though the sum of the two parts decreases in size.
The use of uppercase to distinguish different versions of the same metazone is intriguing and should be discussed further.
Can we agree on these algorithms in consultation with Peter, Mark, and the rest of the CLDR Design WG?
I guess, my main concern here is that we have not established what our "goal" is with time zone name formatting conformance. This PR makes a move toward using ICU4J as the standard. Doing this would be a shift in how we work as ICU4X-TC, which is that we aim for UTS-35 conformance without directly trying to match ICU4C/ICU4J, and we submit improvements to UTS-35 where it is deficient. This seems like a directional decision that we should make in consultation with the TC. It is also a bit close to ICU4X 2.0 to make major changes that we haven't previously agreed on, though I am very much sympathetic to wanting to improve time zone name formatting, and changing the data payload representation will be harder to do after 2.0. |
We've made a lot of updates to UTS-35 to align it with the realities of existing implementations, i.e. ICU4C/J. I think this change makes sense, as it improves clarity (producing different names for America/Phoenix and America/Denver) and reduces data size, so we should update UTS-35 if needed (UTS-35 might have tried to prescribe this behaviour already, but not very accurately). |
This is in Chrome: > Intl.DateTimeFormat('en', {timeZone: 'America/Denver', timeZoneName: 'longGeneric'}).format(new Date())
'1/30/2025, Mountain Time'
> Intl.DateTimeFormat('en', {timeZone: 'America/Phoenix', timeZoneName: 'longGeneric'}).format(new Date())
'1/30/2025, Mountain Standard Time' |
This reduces data size by 206'736 bytes, mainly by not including daylight and generic display names for metazones that do not use DST.
#983