Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define locale options for :datetime, :date & :time #911

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 34 additions & 21 deletions spec/registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -842,8 +842,10 @@ All other _operand_ values produce a _Bad Operand_ error.
The `:datetime` function can use either the appropriate _style options_
or can use a collection of _field options_ (but not both) to control the formatted
output.
_Date/time override options_ can be combined with either _style options_ or _field options_.

If both are specified, a _Bad Option_ error MUST be emitted
If both _style options_ and _field options_ are specified,
a _Bad Option_ error is emitted
and a _fallback value_ used as the _resolved value_ of the _expression_.

If the _operand_ of the _expression_ is an implementation-defined date/time type,
Expand Down Expand Up @@ -882,7 +884,7 @@ and what format to use for that field.
The _field options_ are defined as follows:

> [!IMPORTANT]
> The value `2-digit` for some _field options_ **must** be quoted
> The value `2-digit` for some _field options_ MUST be quoted
> in the MessageFormat syntax because it starts with a digit
> but does not match the `number-literal` production in the ABNF.
> ```
Expand Down Expand Up @@ -924,11 +926,6 @@ The function `:datetime` has the following options:
- `1`
- `2`
- `3`
- `hourCycle` (default is locale-specific)
- `h11`
- `h12`
- `h23`
- `h24`
- `timeZoneName`
- `long`
- `short`
Expand All @@ -937,20 +934,6 @@ The function `:datetime` has the following options:
- `shortGeneric`
- `longGeneric`

> [!NOTE]
> The following options do not have default values because they are only to be used
> as overrides for locale-and-value dependent implementation-defined defaults.

The following date/time options are **not** part of the default registry.
Implementations SHOULD avoid creating options that conflict with these, but
are encouraged to track development of these options during Tech Preview:
- `calendar` (default is locale-specific)
- valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
- `numberingSystem` (default is locale-specific)
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
- `timeZone` (default is system default time zone or UTC)
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)

#### Resolved Value

The _resolved value_ of an _expression_ with a `:datetime` _function_
Expand Down Expand Up @@ -980,6 +963,7 @@ The function `:date` has these _options_:
- `long`
- `medium` (default)
- `short`
- _Date/time override options_

If the _operand_ of the _expression_ is an implementation-defined date/time type,
it can include other option values.
Expand Down Expand Up @@ -1017,6 +1001,7 @@ The function `:time` has these _options_:
- `long`
- `medium`
- `short` (default)
- _Date/time override options_

If the _operand_ of the _expression_ is an implementation-defined date/time type,
it can include other option values.
Expand Down Expand Up @@ -1080,3 +1065,31 @@ For more information, see [Working with Timezones](https://w3c.github.io/timezon
> The form of these serializations is known and is a de facto standard.
> Support for these extensions is expected to be required in the post-tech preview.
> See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/

### Date and Time Override Options

**_<dfn>Date/time override options</dfn>_** are _options_ that allow an _expression_ to
override values set by the current locale,
or provided by the _formatting context_ (such as the default time zone),
or embedded in an implementation-defined date/time _operand_ value.

The following **standard** option and its values MUST be available on
the functions `:datetime` and `:time`:

- `hour12`
- `true`
- `false`
aphillips marked this conversation as resolved.
Show resolved Hide resolved

The following **optional** options and their values SHOULD be available on
the functions `:datetime`, `:date`, and `:time`:

- `calendar`
- valid [Unicode Calendar Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCalendarIdentifier)
- `numberingSystem`
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
- `timeZone`
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't link the RFC. Link the BCP (e.g. https://www.rfc-editor.org/bcp/bcp175)... but... time zone identifiers have a ton of quirks in them. Also, we almost certainly want to allow offset time zones (e.g. GMT-01:23) and we may want to allow special sauce like metazones. CLDR has a bunch of stuff about this, but I'm too busy this morning to look up the precise reference. It's somewhere near here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to go with whatever; this reference was not changed from the earlier one we already included.

Copy link
Member

@macchiati macchiati Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, a narrow-ish EBNF for well-formed current and past timezone IDs would be:

tzId   := tzPath | tzEtc | tzOld
tzPath := tzPart ("/" tzPart)+;
tzPart := tzWord ("_" tzWord)*;
tzWord := [A-Z][a-z]*;
tzEtc  := "Etc/" ("UTC" | "GMT" ([+\-] \d{1,2})?)
tzOld  := HST" | "PST8PDT" | "MST" | "MST7MDT" | "CST6CDT" | "EST" | "EST5EDT" | "WET" | "CET" | "MET" | "EET" | "Factory"

A very loose EBNF for well-formed would be

tzId  := [a-zA-Z0-9+\-/_]+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tendency is to want a version of the narrow-ish EBNF (in ABNF, since that's the dialect our WG uses). It wouldn't want to support tzOld and would want to add support for "UTC" as an explicit zone name. See suggestion thread below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't want to support tzOld

But those exist currently in CLDR.
I am not sure why they are called "old", but "Factory" was only added in CLDR 46, a couple of months ago.

aphillips marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- valid identifier per either [BCP175](https://www.rfc-editor.org/rfc/rfc6557) or [Unicode Timezone Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeTimezoneIdentifier)

ICU4X and some other implementations use the short timezone IDs, so we should allow them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This would effectively require every implementation to support them. I would rather permit them as implementation-defined.

Copy link
Member

@aphillips aphillips Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a stab at a full fix of timeZone. Note that I replaced none with local for consistency with HTML and Java Temporal. I would use float but want to reduce cognitive burden.

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed identifier per [BCP175](https://www.rfc-editor.org/bcp/bcp175)
- an implementation-defined value or identifier
(such as a [Unicode Timezone Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeTimezoneIdentifier))
- `local`
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value) by removing
> the association with a specific time zone.
> [!NOTE]
> Implementations SHOULD check if identifiers for each of these _options_ are valid;
> they SHOULD ignore _options_ that contain invalid or unknown values;
> and they MAY emit a _Bad Option_ error for invalid or unknown values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This would effectively require every implementation to support them. I would rather permit them as implementation-defined.

In general, I really don't think we want to require every implementation to support all valid identifiers — whereby 'support' means that they are required to do something reasonable.

For example, we shouldn't require that u:locale=def "make a difference", rather than being ignored if that language is not supported by the implementation.

With that in mind, adding (such as a Unicode Timezone Identifier) doesn't have a cost, because people don't have to support all the values that we imply.

BTW, BCP175 doesn't define either well-formedness or validity. For well-formedness, the best would be https://github.com/eggert/tz/blob/main/theory.html, but that is not very rigorous. For validity, that would be defined by scanning certain files, and looking for lines starting with Zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about not requiring support for all valid identifiers and agree that unknown values should be ignored (but not necessarily dropped, in case they are consumed downstream).

What I'm trying to guard against is requiring (or implying that it is required) implementers to build support for parsing e.g. short timezone IDs. I think we want to required that they accept Olson ids (for the definition of "accept" I explained elsewhere). They are not required to do anything with any value (although users will be unhappy if real time zone IDs don't work). The formulation you're suggesting would strongly suggest that the short IDs "have" to be supported.

BTW, BCP175 doesn't define either well-formedness or validity.

Yeah, I noticed that yesterday. I hadn't been into the tzinfo docs in a while.

Copy link
Member

@aphillips aphillips Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporating the time zone discussion above

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed time zone identifier
(see [BCP175](https://www.rfc-editor.org/bpc/bpc175))
- `local`
- `UTC`
A time zone identifier is well-formed if it matches `tzId` in the following ABNF:
>```abnf
> tzId = tzPath / tzEtc
> tzPath = tzPart 1*("/" tzPart)
> tzPart = tzWord *("_" tzWord)
> tzWord = (%x41-5A) *(%x61-7A) ; Uppercase ASCII letter followed by lowercase letters
> tzEtc = ("Etc/" ("UTC" / "GMT" (("+" / "-") 1*2DIGIT))
>```
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value) by removing
> the association with a specific time zone.
> [!NOTE]
> Implementations SHOULD check if identifiers for each of these _options_ are valid;
> they SHOULD ignore _options_ that contain invalid or unknown values;
> and they MAY emit a _Bad Option_ error for invalid or unknown values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some trouble understanding exactly why it's worthwhile to establish a canonical definition of a well-formed timezone identifier within the MF2 spec for an optional timeZone option.

This does not seem like something that's fully baked yet.

On a deeper level, I'm no longer sure that it's a good idea to forbid a datetime formatter from emitting a Bad Option error and using fallback representation when it's told to override a timezone with a value that it doesn't support, in particular as the output might not include a timezone identifier. That seems highly likely to produce misleading results.

Also, the latest suggestion above includes normative language in a non-normative note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not excited about defining the ABNF here in the least. I want to point to CLDR, but that would bring in all of the short identifier crud.

On a deeper level, I'm no longer sure that it's a good idea to forbid a datetime formatter from emitting a Bad Option error and using fallback representation when it's told to override a timezone with a value that it doesn't support, in particular as the output might not include a timezone identifier. That seems highly likely to produce misleading results.

I agree that we don't just want to drop the zone on the floor. You might never notice that the option is not working (or only sometimes is not working). This is particularly true for :date formatting (where the time zone does matter, but the zone ID is almost never displayed in the results).

Also, the latest suggestion above includes normative language in a non-normative note.

We should just get rid of that note. The problem of well-formed-but-invalid values seems like a problem for the function handler implementation. We permit errors to come out in formatting.md. Any reason to prescribe what each function does with each option?

Suggested change
- valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557)
- well-formed time zone identifier
(see [BCP175](https://www.rfc-editor.org/bpc/bpc175))
- `local`
- `UTC`
A time zone identifier is well-formed if it matches `tzId` in the following ABNF:
>```abnf
> tzId = tzPath / tzEtc
> tzPath = tzPart 1*("/" tzPart)
> tzPart = tzWord *("_" tzWord)
> tzWord = (%x41-5A) *(%x61-7A) ; Uppercase ASCII letter followed by lowercase letters
> tzEtc = ("Etc/" ("UTC" / "GMT" (("+" / "-") 1*2DIGIT))
>```
> [!NOTE]
> The value `local` permits a _message_ to convert a date/time value
> into a [floating](https://www.w3.org/TR/timezone/#floating) time value
> (sometimes called a _plain_ or _local_ time value)
> by removing the association with a specific time zone.

aphillips marked this conversation as resolved.
Show resolved Hide resolved

> [!NOTE]
> These options do not have default values because they are only to be used
> as overrides for locale-and-value dependent implementation-defined defaults.