Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define locale options for :datetime, :date & :time #911

Closed
wants to merge 6 commits into from

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Oct 19, 2024

Adds a new "basket" of date/time locale options that override defaults set in the current locale or an implementation-defined date/time operand value. These are optional, and can be made available on each of :datetime, :date & :time.

The hourCycle option is dropped from the :datetime field options (where it never belonged), and its functionality is replaced by the hour12=true|falsedate/time locale option.

This PR explicitly does not address the concerns raised in #866; that is a separate discussion.

@eemeli eemeli added the registry Issue pertains to the function registry label Oct 19, 2024
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like some of the thought behind this, but want to think about it more.

- `true`
- `false`
- `timeZone`
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't link the RFC. Link the BCP (e.g. https://www.rfc-editor.org/bcp/bcp175)... but... time zone identifiers have a ton of quirks in them. Also, we almost certainly want to allow offset time zones (e.g. GMT-01:23) and we may want to allow special sauce like metazones. CLDR has a bunch of stuff about this, but I'm too busy this morning to look up the precise reference. It's somewhere near here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to go with whatever; this reference was not changed from the earlier one we already included.

Copy link
Member

@macchiati macchiati Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, a narrow-ish EBNF for well-formed current and past timezone IDs would be:

tzId   := tzPath | tzEtc | tzOld
tzPath := tzPart ("/" tzPart)+;
tzPart := tzWord ("_" tzWord)*;
tzWord := [A-Z][a-z]*;
tzEtc  := "Etc/" ("UTC" | "GMT" ([+\-] \d{1,2})?)
tzOld  := HST" | "PST8PDT" | "MST" | "MST7MDT" | "CST6CDT" | "EST" | "EST5EDT" | "WET" | "CET" | "MET" | "EET" | "Factory"

A very loose EBNF for well-formed would be

tzId  := [a-zA-Z0-9+\-/_]+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tendency is to want a version of the narrow-ish EBNF (in ABNF, since that's the dialect our WG uses). It wouldn't want to support tzOld and would want to add support for "UTC" as an explicit zone name. See suggestion thread below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't want to support tzOld

But those exist currently in CLDR.
I am not sure why they are called "old", but "Factory" was only added in CLDR 46, a couple of months ago.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a refactor to "optional" is better than having them separate.

spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Show resolved Hide resolved
@eemeli
Copy link
Collaborator Author

eemeli commented Nov 14, 2024

Updated as per discussion during this week's meeting.

@eemeli eemeli requested a review from aphillips November 14, 2024 01:22
- `numberingSystem` \[OPTIONAL\]
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
- `timeZone` \[OPTIONAL\]
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- valid identifier per either [BCP175](https://www.rfc-editor.org/rfc/rfc6557) or [Unicode Timezone Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeTimezoneIdentifier)

ICU4X and some other implementations use the short timezone IDs, so we should allow them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This would effectively require every implementation to support them. I would rather permit them as implementation-defined.

Copy link
Member

@aphillips aphillips Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a stab at a full fix of timeZone. Note that I replaced none with local for consistency with HTML and Java Temporal. I would use float but want to reduce cognitive burden.

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed identifier per [BCP175](https://www.rfc-editor.org/bcp/bcp175)
- an implementation-defined value or identifier
(such as a [Unicode Timezone Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeTimezoneIdentifier))
- `local`
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value) by removing
> the association with a specific time zone.
> [!NOTE]
> Implementations SHOULD check if identifiers for each of these _options_ are valid;
> they SHOULD ignore _options_ that contain invalid or unknown values;
> and they MAY emit a _Bad Option_ error for invalid or unknown values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This would effectively require every implementation to support them. I would rather permit them as implementation-defined.

In general, I really don't think we want to require every implementation to support all valid identifiers — whereby 'support' means that they are required to do something reasonable.

For example, we shouldn't require that u:locale=def "make a difference", rather than being ignored if that language is not supported by the implementation.

With that in mind, adding (such as a Unicode Timezone Identifier) doesn't have a cost, because people don't have to support all the values that we imply.

BTW, BCP175 doesn't define either well-formedness or validity. For well-formedness, the best would be https://github.com/eggert/tz/blob/main/theory.html, but that is not very rigorous. For validity, that would be defined by scanning certain files, and looking for lines starting with Zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about not requiring support for all valid identifiers and agree that unknown values should be ignored (but not necessarily dropped, in case they are consumed downstream).

What I'm trying to guard against is requiring (or implying that it is required) implementers to build support for parsing e.g. short timezone IDs. I think we want to required that they accept Olson ids (for the definition of "accept" I explained elsewhere). They are not required to do anything with any value (although users will be unhappy if real time zone IDs don't work). The formulation you're suggesting would strongly suggest that the short IDs "have" to be supported.

BTW, BCP175 doesn't define either well-formedness or validity.

Yeah, I noticed that yesterday. I hadn't been into the tzinfo docs in a while.

Copy link
Member

@aphillips aphillips Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporating the time zone discussion above

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed time zone identifier
(see [BCP175](https://www.rfc-editor.org/bpc/bpc175))
- `local`
- `UTC`
A time zone identifier is well-formed if it matches `tzId` in the following ABNF:
>```abnf
> tzId = tzPath / tzEtc
> tzPath = tzPart 1*("/" tzPart)
> tzPart = tzWord *("_" tzWord)
> tzWord = (%x41-5A) *(%x61-7A) ; Uppercase ASCII letter followed by lowercase letters
> tzEtc = ("Etc/" ("UTC" / "GMT" (("+" / "-") 1*2DIGIT))
>```
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value) by removing
> the association with a specific time zone.
> [!NOTE]
> Implementations SHOULD check if identifiers for each of these _options_ are valid;
> they SHOULD ignore _options_ that contain invalid or unknown values;
> and they MAY emit a _Bad Option_ error for invalid or unknown values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some trouble understanding exactly why it's worthwhile to establish a canonical definition of a well-formed timezone identifier within the MF2 spec for an optional timeZone option.

This does not seem like something that's fully baked yet.

On a deeper level, I'm no longer sure that it's a good idea to forbid a datetime formatter from emitting a Bad Option error and using fallback representation when it's told to override a timezone with a value that it doesn't support, in particular as the output might not include a timezone identifier. That seems highly likely to produce misleading results.

Also, the latest suggestion above includes normative language in a non-normative note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not excited about defining the ABNF here in the least. I want to point to CLDR, but that would bring in all of the short identifier crud.

On a deeper level, I'm no longer sure that it's a good idea to forbid a datetime formatter from emitting a Bad Option error and using fallback representation when it's told to override a timezone with a value that it doesn't support, in particular as the output might not include a timezone identifier. That seems highly likely to produce misleading results.

I agree that we don't just want to drop the zone on the floor. You might never notice that the option is not working (or only sometimes is not working). This is particularly true for :date formatting (where the time zone does matter, but the zone ID is almost never displayed in the results).

Also, the latest suggestion above includes normative language in a non-normative note.

We should just get rid of that note. The problem of well-formed-but-invalid values seems like a problem for the function handler implementation. We permit errors to come out in formatting.md. Any reason to prescribe what each function does with each option?

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed time zone identifier
(see [BCP175](https://www.rfc-editor.org/bpc/bpc175))
- `local`
- `UTC`
A time zone identifier is well-formed if it matches `tzId` in the following ABNF:
>```abnf
> tzId = tzPath / tzEtc
> tzPath = tzPart 1*("/" tzPart)
> tzPart = tzWord *("_" tzWord)
> tzWord = (%x41-5A) *(%x61-7A) ; Uppercase ASCII letter followed by lowercase letters
> tzEtc = ("Etc/" ("UTC" / "GMT" (("+" / "-") 1*2DIGIT))
>```
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value)
> by removing the association with a specific time zone.

spec/registry.md Show resolved Hide resolved
spec/registry.md Show resolved Hide resolved
spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Outdated
Comment on lines 834 to 836
- valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
- `numberingSystem` \[OPTIONAL\]
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
Copy link
Member

@aphillips aphillips Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing valid vs. well-formed

Suggested change
- valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
- `numberingSystem` \[OPTIONAL\]
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
- well-formed [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
- `numberingSystem` \[OPTIONAL\]
- well-formed [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we need to make it clear somewhere (either with a general statement or multiple specific ones) that if the option for (say) numberingSystem is a valid Unicode Number System Identifier (eg Latn), it can be ignored, but can't be misinterpreted as something else (eg Hebrew numbering system); moreover, that those take precedence over an implementation-defined value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe somewhere have a statement about

Implementation-defined values

Implementation-defined option values are permitted for many options, but if there is a valid standard option value then it takes precedence. Thus it is wisest for implementation-defined option values to use values that are not well-formed standard option values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here is the example I was thinking of, where "or an implementation-defined value" applied to a option value, and is wrong, for the reasons I state in #938 (but had misapplied that reasoning to operand, not option values).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the two statements "or an implementation-defined value" need to be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... or an implementation-defined value" is meant to cover values such as com.ibm.icu.text.NumberingSystem or java.time.chrono.Chronology, in which the identifier has been converted to a local representation. This is the same argument as allowing the currency option in :currency to accept a java.util.Currency object in a Java implementation rather than requiring that the ISO4217 code be unboxed.

Implementation defined values of this sort can only be passed into options as variables, since there is no way to construct them using MF2's syntax.

The other sort of implementation defined value, of course, would be locally defined literals. Elsewhere in the PR you and I are discussing the use of UTI short timezone IDs, which I would classify as a sort of "locally defined literal" (vs. the Olson IDs). Implementers are responsible for not hanging themselves or their users by making locally-defined and standard values distinct.

However, allowing such values permits implementers to adopt whatever local idiom exists (on their platform, programming languages, etc.) for locales, time zones, calendars, and so forth, with the caution that messages that use these IDs will not be portable and that tools/translators won't possibly recognize the values.

It's kind of surprising, but we don't say what almost any of the options do except by implication. A responsible implementer will probably do all the right things. But nothing forbids a lazy implementer from causing all invocations of :date to return the string "wildebeest".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... or an implementation-defined value" is meant to cover values such as com.ibm.icu.text.NumberingSystem or java.time.chrono.Chronology, in which the identifier has been converted to a local representation.

If that's what it means, then that's what it has to say. Moreover, it should be in a note, not part of the definition.

So the following would be fine (bold for the addition):

Bear with me for a moment.

We have in registry.md:

Implementations MAY accept additional option values for options defined here. However, such values might become defined with a different meaning in the future, including with a different, incompatible name or using an incompatible value space. Supporting implementation-specific option values for standard or optional functions is NOT RECOMMENDED.

We also have

option = identifier o "=" o (literal / variable)

The implications are that conformant implementation can interpret any of:

{$x :currency compactDisplay=short} 
{$x :currency compactDisplay=medium} 
{$x :currency compactDisplay=μικρός}
{$x :currency compactDisplay=|🐭|}
{$x :currency compactDisplay=$myDisplay}

It can also interpret:

{$x :currency currency=CAD} 
{$x :currency currency=MyCurrency} 
{$x :currency currency=δολάριοΚαναδά}
{$x :currency currency=|¥|}
{$x :currency currency=|🐭|}
{$x :currency currency=$myCurrency}

So we could have for every single option have that extra clause, as currently stated PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's what it means, then that's what it has to say. Moreover, it should be in a note, not part of the definition.

I can make changes to that effect. Note that notes are non-normative, so it probably can't go in a note.

So the following would be fine (bold for the addition):

Note: The option value could be an implementation-defined value that denotes the same calendar as a well-formed Unicode Calendar Identifier, but with a different format.

No, I think this is too precise. It supposes that all sets of calendar models exactly mirror (or are a subset of) CLDRs, when the reality is likely to be just a bit more chaotic.

Bear with me for a moment.

I follow your argument. The examples don't move me: they are what the quoted stuff starting "Implementations MAY..." mean. But it does convince me that we should remove implementation-defined value to a note here and elsewhere. It also convinces me that the "Implementations MAY..." text needs to spell out that it is referring to literals, with implementation-defined types handled separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the MAY in #944 and I edited my proposed text above so that it can be used in place of "valid"

spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Outdated Show resolved Hide resolved
spec/registry.md Outdated Show resolved Hide resolved
Co-authored-by: Addison Phillips <[email protected]>
aphillips added a commit that referenced this pull request Nov 17, 2024
In [this thread](#911 (comment)) on #911, @macchiati and I discussed the handling of implementation-defined literal values and implementation-defined types.

This change splits the "MAY _accept_" for these two cases, permitting both and saving us having to say "... or an implementation-defined value..." in lots of places.
@eemeli
Copy link
Collaborator Author

eemeli commented Nov 18, 2024

I updated the structure a bit to get rid of the weird phrasing, and to not include hour12 for :date.

I'm honestly not sure what the current request is regarding valid vs. well-formed, and which externals specs to reference, so leaving those unchanged for now.

@aphillips
Copy link
Member

@eemeli I think the suggestion I made a few minutes ago (with the tzId ABNF in it) would make the options "well-formed" instead of "valid" and address my concerns for time zone. Note that my concerns might not be shared by others (such as yourself)

aphillips added a commit that referenced this pull request Nov 18, 2024
* Address "implementation-defined" literal and type values

In [this thread](#911 (comment)) on #911, @macchiati and I discussed the handling of implementation-defined literal values and implementation-defined types.

This change splits the "MAY _accept_" for these two cases, permitting both and saving us having to say "... or an implementation-defined value..." in lots of places.

* Address comments, remove "literal"

* and => or for clarity

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Address comment via a rewrite of the example

---------

Co-authored-by: Eemeli Aro <[email protected]>
@eemeli
Copy link
Collaborator Author

eemeli commented Nov 19, 2024

This has been split into:

@eemeli eemeli closed this Nov 19, 2024
@eemeli eemeli deleted the datetime-locale-options branch November 19, 2024 10:07
eemeli added a commit to messageformat/messageformat that referenced this pull request Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
registry Issue pertains to the function registry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants