Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement correct time zone type fallback #6021

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Jan 20, 2025

This reduces data size by 206'736 bytes, mainly by not including daylight and generic display names for metazones that do not use DST.

#983

@robertbastian robertbastian changed the title Typefb Implement correct time zone type fallback Jan 20, 2025
@robertbastian robertbastian force-pushed the typefb branch 8 times, most recently from 5531232 to c689e85 Compare January 21, 2025 19:14
@robertbastian robertbastian force-pushed the typefb branch 2 times, most recently from 2efea0f to d5ea86a Compare January 21, 2025 19:28
@robertbastian robertbastian marked this pull request as ready for review January 21, 2025 19:28
@@ -74,7 +74,7 @@
"zzzz": "日本標準時",

"v": "JST",
"vvvv": "日本時間",
"vvvv": "日本標準時",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is the only test diff, this PR changes the generic non-location names of probably every non-DST timezone.

|GMT-07:00 |Mexican Pacific Standard Time |Mexican Pacific Standard Time |Mazatlan Time |Mazatlan |mxmzt |
|GMT-07:00/GMT-06:00|Mountain Time |Mountain Standard Time |Boise Time |Boise |usboi |
|GMT-07:00/GMT-06:00|Mountain Time |Mountain Standard Time |Denver Time |Denver |usden |
|GMT-07:00 |Mountain Standard Time |Mountain Standard Time |Phoenix Time |Phoenix |usphx |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key change. UTS-35 is not very clear here, saying that the standard type should be used for non-DST zones only if the generic type is unavailable, but then gives this as an example. ICU4C behaves like this as well.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplicating generic non-location and specific non-location seems like a very positive change. But, I do not follow why you need the ALL CAPS metazone IDs.

It also seems wasteful to add the standard specific patterns that are not reachable from generic non-location format to the generic non-location data payload. I do want to keep these decoupled if possible.

What doesn't work if you only deduplicate the exact matches (standard specific with generic)?

Comment on lines 23 to 30
"amevn": {
"0": null,
"11426160": "arme"
"11426160": "arme",
"14201280": "ARME",
"21162180": "arme",
"21564000": "ARME",
"21686280": "arme"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: why the changes and growth in this file?

@sffc
Copy link
Member

sffc commented Jan 21, 2025

Ok I think I see, you are making a distinction between AMMO for Mountain Time with a time zone transition and ammo for year-round Mountain Standard Time, for example?

And I guess somewhere in here you are applying the UTS 35 "184 day" heuristic to determine whether a time zone is locked into standard time?

@robertbastian
Copy link
Member Author

Yes.

I'm interpreting the 184 day thing as "this zone uses DST". UTS-35 should probably say that, and the 184 day thing is a crude way to determine that.

@@ -165,6 +165,12 @@ pub struct ExemplarCitiesV1<'data> {
#[yoke(prove_covariance_manually)]
pub struct MetazoneGenericNamesV1<'data> {
/// The default mapping between metazone id and localized metazone name.
///
/// This mapping may contain a MetazoneId twice:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc
Copy link
Member

sffc commented Jan 21, 2025

To reproduce the entire section in question:

When a specified type (generic, standard, daylight) does not exist:

  1. If the daylight type does not exist, then the metazone doesn’t require daylight support. For all three types:
    a. If the generic type exists, use it.
    b. Otherwise if the standard type exists, use it.
  2. Otherwise if the generic type is needed, but not available, and the offset and daylight offset do not change within 184 day +/- interval around the exact formatted time, use the standard type.
    a. Example: "Mountain Standard Time" for Phoenix
    b. Note: 184 is the smallest number that is at least 6 months AND the smallest number that is more than 1/2 year (Gregorian).

So, the spec tells us to use the Type Fallback when one of the three names is missing (generic, standard, daylight).

Let's break this down:

Fields in struct Requested field Rule applied Resolved field
GS D 1a G
GD S not specified (fall back to offset?) not specified (fall back to offset?)
SD G 2a S if the 184-day rule applies, otherwise not specified (fall back to location?)
G S 1a G
G D 1a G
S G 1b S
S D 1b S
D G 2a S if the 184-day rule applies, otherwise not specified (fall back to location?)
D S not specified (fall back to offset?) not specified (fall back to offset?)

Spot-checking, I see many examples of metazones that have all 3 names and ones that have only Standard. I do not see any other configurations, at least not in English.

So if there are any instances of SD in CLDR, then we need a way to know whether or not we fall back from G to S or location at runtime. But otherwise we don't need to know.

Also, there isn't ever a case where a particular locale could resolve to a different result:

  1. If the locale+metazone has a generic name, use the generic name.
  2. If the loclae+metazone has a standard name but not a generic name, use the standard name but somehow flag that it should only be used if the 184-day rule applies (such as an all-caps metazone ID).

So it doesn't seem right to me that en.json contains both "Mountain Time" and "Mountain Standard Time" in the same file.

@robertbastian
Copy link
Member Author

This is the time zone picker in Google Calendar. It uses ICU4J with long generic + exemplar city. I've matched that instead of trying to figure out what UTS-35 is trying to tell me, which really isn't clear.

image

@robertbastian robertbastian requested a review from sffc January 21, 2025 22:50
@sffc
Copy link
Member

sffc commented Jan 21, 2025

Another concern I have is, if something like the Sunshine Protection Act gets passed (either in the US or anywhere else), then we start getting time zones that pass the 184-day rule but are actually Daylight Time.

So I feel like the all-caps metazone is maybe too coarse.

I do really like the principle of precomputing it, though.

@robertbastian
Copy link
Member Author

Another concern I have is, if something like the Sunshine Protection Act gets passed (either in the US or anywhere else), then we start getting time zones that pass the 184-day rule but are actually Daylight Time.

What's the math here? DST is already less than 184 days, at what point are you saying that rule won't detect it anymore?

Our way of looking at actual TZDB rules will always detect DST anyway, that 184 thing in the spec really ought to be changed to "figure out if the time zone uses DST".

@sffc
Copy link
Member

sffc commented Jan 22, 2025

There are proposals being floated which would make certain places have year-round Daylight Time. Like, Florida would be on year-round Eastern Daylight Time. It could be called Atlantic Standard Time, but Floridians would prefer to call it Eastern Daylight Time. This case is not modeled correctly in your model.

@robertbastian
Copy link
Member Author

It's unclear how such a time zone would be modeled in TZDB and CLDR. If it's modeled as a permanent DST rule, the UTS-35 algorithm that checks +-184 days would also fail on it, so I don't think it's fair to raise this as a blocker.

@robertbastian
Copy link
Member Author

This case could be easily fixed in my model by using e.g. titlecase for daylight only metazones.

@sffc
Copy link
Member

sffc commented Jan 22, 2025

Well, it's a regression because saying "Eastern Time" is more correct than saying "Eastern Standard Time" for such cases.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what your expectation is of me regarding an approval of this?

We've established that it is not conformant with UTS-35, nor is it necessarily conformant with user expectations. You claim that it is conformant with ICU4J behavior, which, if it doesn't follow UTS 35, is not spec'd out anywhere.

The data size for generic-only and specific-only formatting increases, though the sum of the two parts decreases in size.

The use of uppercase to distinguish different versions of the same metazone is intriguing and should be discussed further.

Can we agree on these algorithms in consultation with Peter, Mark, and the rest of the CLDR Design WG?

@sffc
Copy link
Member

sffc commented Jan 22, 2025

I guess, my main concern here is that we have not established what our "goal" is with time zone name formatting conformance. This PR makes a move toward using ICU4J as the standard. Doing this would be a shift in how we work as ICU4X-TC, which is that we aim for UTS-35 conformance without directly trying to match ICU4C/ICU4J, and we submit improvements to UTS-35 where it is deficient. This seems like a directional decision that we should make in consultation with the TC.

It is also a bit close to ICU4X 2.0 to make major changes that we haven't previously agreed on, though I am very much sympathetic to wanting to improve time zone name formatting, and changing the data payload representation will be harder to do after 2.0.

@robertbastian
Copy link
Member Author

We've made a lot of updates to UTS-35 to align it with the realities of existing implementations, i.e. ICU4C/J. I think this change makes sense, as it improves clarity (producing different names for America/Phoenix and America/Denver) and reduces data size, so we should update UTS-35 if needed (UTS-35 might have tried to prescribe this behaviour already, but not very accurately).

@Manishearth Manishearth removed their request for review January 22, 2025 18:50
@sffc sffc added 2.0-breaking Changes that are breaking API changes discuss Discuss at a future ICU4X-SC meeting labels Jan 22, 2025
@robertbastian
Copy link
Member Author

This is in Chrome:

> Intl.DateTimeFormat('en', {timeZone: 'America/Denver', timeZoneName: 'longGeneric'}).format(new Date())
'1/30/2025, Mountain Time'
> Intl.DateTimeFormat('en', {timeZone: 'America/Phoenix', timeZoneName: 'longGeneric'}).format(new Date())
'1/30/2025, Mountain Standard Time'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes discuss Discuss at a future ICU4X-SC meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants