Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeZoneFallbackOverride date field length; use NeoTimeZoneSkeleton #5024

Merged
merged 10 commits into from
Jun 25, 2024

Conversation

sffc
Copy link
Member

@sffc sffc commented Jun 8, 2024

@sffc
Copy link
Member Author

sffc commented Jun 10, 2024

I'm not completely sure which direction to take on this. The key issue here is that expressing configurations for time zone formatting is substantially thicker than for other fields. For example, if you need to format a month, you might first try to get the long symbol, then if that fails, short, and then finally numeric. However, for time zones, you might first try to format short generic based on BCP-47, then short generic based on Metazone, then long generic from BCP-47, then long generic from Metazone, then partial location format based on BCP-47, and then finally one of the handful of GMT offset fallback formats. Importantly, this set of fallback pathways can be configured by the user.

There are two general ways to approach this:

  1. Rely on PatternItem field symbols and field lengths. UTS 35 defines 23 stock formats, which cover some but not all use cases: even UTS 35 itself defines time zone formats that aren't reachable from the fields, such as Generic Partial Location Format ("Pacific Time (Los Angeles)") which is likely to be a popular option for time zone selectors. However, now that CLDR defines private use field lengths (CLDR-17217 Reserve datetime fields for private use cldr#3603), we could leverage this to create fields that cover all formats and use cases, potentially upstreaming any missing pieces to CLDR.
  2. Don't use fields for time zones at all and instead store the fully expressive TimeZoneFormatConfig along with the pattern. Use a placeholder field in the pattern string which, when encountered, gets replaced by the formatted time zone. Two sub bullets for this approach:
    1. Store the config in the pattern metadata, which is currently a 1-byte field stored in every Pattern. Patterns would not be allowed to have time zone fields except for the single time zone placeholder. We would likely need to increase this metadata field to 2 or 3 bytes. Pro: All serialized patterns are capable of carrying time zone information. Con: Only one time zone format possible per pattern. Increases pattern size for everyone, which can be substantial.
    2. Store the config as a structured object. Stock UTS 35 time zone formats in patterns would continue to be supported. Pro: Easier to implement. Con: Patterns cannot store custom time zone configs. Two ways to do the same thing (config or pattern item).

This PR currently implements option 2ii. Prior to this PR, something closer to option 1 was implemented.

I'm currently leaning toward sticking with option 1, in which case I'll update this PR to save some refactorings and revert others.

Thoughts? Any other options I'm missing? @Manishearth @robertbastian @zbraniecki

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what preference I have here. The private use fields are a bit messy but not terrible. This also seems fine?

Interested in what Zibi thinks


/// "GMT-8"
#[derive(Debug)]
pub enum GmtShort {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: not sure if empty enums are what we need for these markers but that's a separate discussion to be had later

@sffc
Copy link
Member Author

sffc commented Jun 19, 2024

I'm leaning toward private use field lengths (option 1) unless I hear otherwise. All the code is just so intertwined with PatternItems (for good reason), and there's not a good reason to break from that.

@sffc sffc force-pushed the zone-experimentation branch from 674a1ea to 5266c6c Compare June 19, 2024 21:48
@sffc sffc force-pushed the zone-experimentation branch from 5266c6c to bb378f1 Compare June 19, 2024 23:12
@sffc
Copy link
Member Author

sffc commented Jun 19, 2024

Okay I've tried implementing option 1. Some more thoughts:

  • Option 1 requires three ways to represent mostly the same thing: a ZoneComponents enum of all supported configurations (primary public interface); a Field used to represent that enum in a pattern string (public but low level); and the ZoneConfig as the more useful machine representation (mostly internal but it needs to be public because it is in a trait). In comparison, Option 2 can get away with only ZoneConfig and not the others.
  • It is nice that Option 1 works seamlessly with existing pattern strings. Option 2 requires some pre-processing at some point to handle pattern strings that might contain CLDR time zone items.

@zbraniecki
Copy link
Member

I don't have a strong opinion, I'm sorry. Can we do the Option 2 pre-processing at data build time?

@sffc sffc changed the title Create a more expressive time zone config struct Add TimeZoneFallbackOverride date field length; use NeoTimeZoneSkeleton Jun 25, 2024
@sffc
Copy link
Member Author

sffc commented Jun 25, 2024

Ok this has been rewritten multiple times but I simplified the change to only use the private use field length and it's now ready for review.

@sffc sffc marked this pull request as ready for review June 25, 2024 20:34
@sffc
Copy link
Member Author

sffc commented Jun 25, 2024

Backup of neo_zone.rs:

// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

//! Types for neo time zone formatting.

use crate::fields::{self, Field, FieldSymbol, FieldLength};
#[cfg(feature = "experimental")]
use crate::neo_marker::private;
use crate::time_zone::Iso8601Format;

/// Time zone offset formatting style.
#[derive(Debug, Copy, Clone)]
pub enum NeoOffsetStyle {
    /// "GMT-8"
    /// `O`
    GmtShort,
    /// "GMT-08:00"
    /// `OOOO`, `ZZZZ`
    GmtLong,
    /// All `x` and `X` formats
    Iso8601(Iso8601Format),
}

/// Time zone formatting style, not including offsets.
#[derive(Debug, Copy, Clone)]
pub enum NeoZoneStyle {
    /// "PDT"
    /// `z..zzz`
    SpecificShort,
    /// "Pacific Daylight Time"
    /// `zzzz`
    SpecificLong,
    /// "PT"
    /// `v`
    GenericShort,
    /// "Pacific Time"
    /// `vvvv`
    GenericLong,
    /// "uslax"
    /// `V`
    IdentifierShort,
    /// "America/Los_Angeles"
    /// `VV`
    IdentifierLong,
    /// "Los Angeles"
    /// `VVV`
    City,
    /// "Los Angeles Time"
    /// `VVVV`
    GenericLocation,
    /// "Pacific Time (Los Angeles)"
    /// not available via field?
    GenericPartialLocation,
}

/// A specification for the desired display of a time zone.
///
/// Unlike date and time, we support only a single time zone component, but the
/// specific component can change.
///
/// Since the length of a time zone can vary independent of the date and time,
/// the time zone lengths are directly encoded into this enum.
#[derive(Debug, Copy, Clone)]
#[non_exhaustive]
pub struct NeoZoneConfig {
    /// The primary time zone style.
    pub first: Option<NeoZoneStyle>,
    /// The fallback time zone style if the primary style is unavailable.
    pub second: Option<NeoZoneStyle>,
    /// The offset style if the time zone cannot be formatted.
    pub offset: NeoOffsetStyle,
}

impl NeoZoneConfig {
    pub(crate) fn from_field(
        field: fields::TimeZone,
        length: fields::FieldLength,
    ) -> Option<Self> {
        match (field, length) {
            // TODO: This should actually be GenericShortOrLocation according to spec
            (fields::TimeZone::LowerV, fields::FieldLength::One) => Some(GenericShortOrGmt::ZONE_CONFIG),
            _ => todo!()
        }
    }

    pub(crate) fn default_for_fallback() -> Self {
        Self {
            first: None,
            second: None,
            offset: NeoOffsetStyle::Iso8601(Iso8601Format::default_for_fallback()),
        }
    }
}

/// A type that has a time zone configuration associated with it.
// TODO: Make this require sealed
pub trait HasZoneComponents {
    /// The configuration for formatting this time zone.
    const ZONE_CONFIG: NeoZoneConfig;
    /// The corresponding field.
    const ZONE_FIELD: Field;
}

/// "GMT-8"
#[derive(Debug)]
pub enum GmtShort {}

#[cfg(feature = "experimental")]
impl private::Sealed for GmtShort {}

impl HasZoneComponents for GmtShort {
    const ZONE_CONFIG: NeoZoneConfig = NeoZoneConfig {
        first: None,
        second: None,
        offset: NeoOffsetStyle::GmtShort,
    };
    const ZONE_FIELD: Field = Field {
        symbol: FieldSymbol::TimeZone(fields::TimeZone::UpperO),
        length: FieldLength::One,
    };
}

/// "PT" or "GMT-8"
#[derive(Debug)]
pub enum GenericShortOrGmt {}

#[cfg(feature = "experimental")]
impl private::Sealed for GenericShortOrGmt {}

impl HasZoneComponents for GenericShortOrGmt {
    const ZONE_CONFIG: NeoZoneConfig = NeoZoneConfig {
        first: Some(NeoZoneStyle::GenericShort),
        second: None,
        offset: NeoOffsetStyle::GmtShort,
    };
    const ZONE_FIELD: Field = Field {
        symbol: FieldSymbol::TimeZone(fields::TimeZone::LowerV),
        length: FieldLength::TimeZoneFallbackOverride(fields::TimeZoneFallbackOverride::ShortOrGmt),
    };
}

/// "Los Angeles Time" or "GMT-8"
#[derive(Debug)]
pub enum GenericLocationOrGmt {}

#[cfg(feature = "experimental")]
impl private::Sealed for GenericLocationOrGmt {}

impl HasZoneComponents for GenericLocationOrGmt {
    const ZONE_CONFIG: NeoZoneConfig = NeoZoneConfig {
        first: Some(NeoZoneStyle::GenericLocation),
        second: None,
        offset: NeoOffsetStyle::GmtShort,
    };
    const ZONE_FIELD: Field = Field {
        symbol: FieldSymbol::TimeZone(fields::TimeZone::UpperV),
        length: FieldLength::Wide,
    };
}

/// "PT" or "Los Angeles Time" or "GMT-8"
#[derive(Debug)]
pub enum GenericShortOrLocation {}

#[cfg(feature = "experimental")]
impl private::Sealed for GenericShortOrLocation {}

impl HasZoneComponents for GenericShortOrLocation {
    const ZONE_CONFIG: NeoZoneConfig = NeoZoneConfig {
        first: Some(NeoZoneStyle::GenericShort),
        second: Some(NeoZoneStyle::GenericLocation),
        offset: NeoOffsetStyle::GmtShort,
    };
    const ZONE_FIELD: Field = Field {
        symbol: FieldSymbol::TimeZone(fields::TimeZone::LowerV),
        length: FieldLength::One,
    };
}

@sffc sffc removed request for zbraniecki and nordzilla June 25, 2024 20:36
@sffc sffc merged commit 8e7e8e7 into unicode-org:main Jun 25, 2024
28 checks passed
@sffc sffc deleted the zone-experimentation branch June 25, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants