Skip to content

Commit

Permalink
CLDR-17235 Organize conformance references (#4070)
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati authored Sep 25, 2024
1 parent 54ee48b commit 50f2e98
Showing 1 changed file with 159 additions and 7 deletions.
166 changes: 159 additions & 7 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,25 +229,177 @@ As LDML is an interchange format, it was designed for ease of maintenance and si

### <a name="Conformance" href="#Conformance">Conformance</a>

There are many ways to use the Unicode LDML format and the data in CLDR, and the Unicode Consortium does not restrict the ways in which the format or data are used. However, an implementation may also claim conformance to LDML or to CLDR, as follows:
There are many ways to use the Unicode LDML specification and the CLDR data.
The Unicode Consortium does not restrict the ways in which the format or data are used.
However, an implementation may also claim conformance to the LDML specification and/or to CLDR data, as follows:

<a name="UAX35-C1" href="#UAX35-C1"></a>
_**UAX35-C1.**_ An implementation that claims conformance to this specification shall:

1. Identify the sections of the specification that it conforms to.
* For example, an implementation might claim conformance to all LDML features except for _transforms_ and _segments_.
2. Interpret the relevant elements and attributes of LDML documents in accordance with the descriptions in those sections.
* The names of sections may change for clarity, so the associated links should be included in any reference — links into LDML will remain stable.
2. Interpret the relevant elements and attributes of LDML data in accordance with the descriptions in those sections.
* For example, an implementation that claims conformance to the date format patterns must interpret the characters in such patterns according to [Date Field Symbol Table](tr35-dates.md#Date_Field_Symbol_Table).
3. Declare which types of CLDR data it uses.
* For example, an implementation might declare that it only uses language names, and those with a _draft_ status of _contributed_ or _approved_.
4. Declare when it overrides CLDR data, or uses `alt` data
* For example, for `//ldml/numbers/symbols/group` an implementation could use `alt="official"` data.

An implementation may also make a _general claim_ of conformance to the LDML specification and/or CLDR data.
Such a claim is understood to claim conformance to all portions of this specification that are relevant to the operations performed by the implementation,
except for those specifically declared as exceptions.
For example, if an implementation making a _general claim_ of conformance performs date formatting, and does not declare date formatting as an exception,
it is understood to be claiming conformance to date formatting as described in the section listed below.

~~_**UAX35-C2.**_ An implementation that claims conformance to Unicode locale or language identifiers shall:~~

~~1. Specify whether Unicode locale extensions are allowed~~
~~2. Specify the canonical form used for identifiers in terms of casing and field separator characters.~~

~~External specifications may also reference particular components of Unicode locale or language identifiers, such as:~~

~~> _Field X can contain any Unicode region subtag values as given in Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML), excluding grouping codes._~~

<a name="UAX35-C2" href="#UAX35-C2"></a>
NOTE: _**UAX35-C2.**_ is replaced by the following generalization.

The following lists the high-level sections with structures and/or processing algorithms.
Conformance to a particular section may reference and require conformance to another section.

#### Unicode Locale Identifiers
| Sections | Topics |
| --- | --- |
| [Unicode Locale Identifier](#Unicode_locale_identifier)| identifier syntax, interpretation, and validity |
| [Annex C. LocaleId Canonicalization](#LocaleId_Canonicalization) | canonicalize |
| [CLDR to BCP 47](#Unicode_Locale_Identifier_CLDR_to_BCP_47), [BCP 47 to CLDR](#Unicode_Locale_Identifier_BCP_47_to_CLDR) | convert |
| [Language Identifier Field Definitions](#Field_Definitions) | interpretation and validity of -u key-value pairs |
| [Locale Display Name Algorithm](tr35-general.html#locale_display_name_algorithm) | locale display names |

#### Unicode Locale Inheritance and Matching
| Sections | Topics |
| --- | --- |
| [Locale Inheritance and Matching](#Locale_Inheritance) | locale inheritance |
| [Likely Subtags](#Likely_Subtags) | likely subtags |
| [Language Matching](#LanguageMatching) | locale matching |

#### Units of Measurement
| Sections | Topics |
| --- | --- |
| [Unit Identifiers](tr35-general.html#unit-identifiers) | unit identifier syntax, interpretation, and validity |
| [Unit Identifier Normalization](tr35-info.html#Unit_Identifier_Normalization) | identifier normalization |
| [Unit Conversion](tr35-info.html#Unit_Conversion) | unit conversion |
| [Unit Preferences](tr35-info.html#Unit_Preferences) | evaluation of user preferences |
| [Unit Identifier Uniqueness](tr35-general.html#unit-identifier-uniqueness) | converting units into BCP47 format |
| [Compound Units](tr35-general.html#compound-units) | unit display names |

#### Number Formatting
| Sections | Topics |
| --- | --- |
| [Number Format Patterns](tr35-numbers.html#number-format-patterns) | number format patterns, syntax and interpretation |
| [Compact Number Formats](tr35-numbers.html#compact-number-formats) | compact number formats |
| [Rule-Based Number Formatting](tr35-numbers.html#Rule-Based_Number_Formatting) | spell-out number formatting |

#### Date Formatting
| Sections | Topics |
| --- | --- |
| [Elements availableFormats, appendItems](tr35-dates.html#availableFormats_appendItems) | date formatting, patterns |
| [Date Format Patterns](tr35-dates.html#Date_Format_Patterns) | date format patterns and symbols|
| [Using Time Zone Names](tr35-dates.html#Using_Time_Zone_Names) | timezone forms, fallback and parsing |

#### Collation
| Sections | Topics |
| --- | --- |
| [Root Collation](tr35-collation.html#root-collation) | Root collation syntax and structure |
| [Collation Tailorings](tr35-collation.html#Collation_Tailorings) | Rule syntax and interpretation for language-specific ordering |

#### Grammar
| Sections | Topics |
| --- | --- |
| [Grammatical Features](tr35-general.html#grammatical-features) | noun classes (except for plurals) |
| [Language Plural Rules](tr35-numbers.html#Language_Plural_Rules) | plural and ordinal category rules, ranges |

#### Miscellaneous
| Sections | Topics |
| --- | --- |
| [Unicode Sets](#Unicode_Sets) | Unicode set syntax and interpretation |
| [String Range](#string-range) | string-range syntax and interpretation |
| [Transforms](tr35-general.html#Transforms)| transform identifier and rule syntax and interpretation |
| [Segmentations](tr35-general.html#segmentations) | segmentation customizations |
| [Synthesizing Sequence Names](tr35-general.html#synthesizing-sequence-names) | constructing derived emoji names |
| [Formatting Process](tr35-personNames.html#formatting-process) | person name formatting |
| [Part 7: Keyboards](tr35-keyboards.html) | keyboard structure and interpretation |
| [Conformance](tr35-messageFormat.html#conformance) (Message Format) | message formatting |

### Customization

Conformant implementations cannot modify CLDR structures, such as the syntax or interpretation of locale identifiers.
There are usually mechanisms for implementations to customize these to a certain extent, using what are known a private use codes.
For example, an implementation could use the private-use language code `qfz` to mean a language that was not covered by BCP 47,
or use a [private use extension](#pu_extensions) in a Unicode locale identifer, or use a private-use unit such as `xxx-smoot-per-second`.

An implementation may also use a deprecated code instead of the corresponding preferred code.
For example, the most frequent case of this is with an implementation whose earlier versions predated BCP 47, and used `iw` for Hebrew,
rather than the BCP 47 (and CLDR) code `he`.
When this is done, the CLDR data needs to be modified in appropriate places, not just in some file names.
For example, the languageAlias data requires modification, from:
```
<languageAlias type="iw" replacement="he" reason="deprecated"/> <!-- Hebrew -->
```
to
```
<languageAlias type="he" replacement="iw" reason="deprecated"/> <!-- Hebrew -->
```

Minimized locale identifiers are also not required. For example, an implementation could consistently expand locale identifiers to include regions, such as `en``en_DE` or `de``de-AT`.

Implementations may customize CLDR data, as long as they declare that they are doing so. This may include:

#### Omitting data

An implementation may dispense with locale data for locales that an implementation does not support, or for locales it does support,
dispense with data that is at CoverageLevel=Comprehensive, or dispense with particular sorts of data, such a annotations for emoji.

_**UAX35-C2.**_ An implementation that claims conformance to Unicode locale or language identifiers shall:
#### Adding data

1. Specify whether Unicode locale extensions are allowed
2. Specify the canonical form used for identifiers in terms of casing and field separator characters.
An implementation could add data for a locale that CLDR does not yet support, or add higher-coverage data for a locale than what CLDR has.

External specifications may also reference particular components of Unicode locale or language identifiers, such as:
#### Overriding data

CLDR has a mechanism for overriding data using the `alt` mechanism.
At build time, an implementation could override the default value by using an alt value.
For example, take the following data:
```
<territory type="HK">Sonderverwaltungsregion Hongkong</territory>
<territory type="HK" alt="short">Hongkong</territory>
```
An implementation could, at build time, substitute the short value for the regular value, getting "Hongkong".
It could instead support both values at runtime, using display option settings to pick between the regular value and the short value.

Implementations can override the data in other ways as well, such as changing the spelling of a particular value.

#### Testing

The files in [testData](https://github.com/unicode-org/cldr/tree/main/common/testData) can be used to test conformance.
Brief instructions for use are supplied in `_readme.txt` files in the different directories and/or in the headers of the files in question.
For example, the following is from a sample header:
```
# Format:
# <source locale identifier> ; <expected canonicalized locale identifier>
#
# The data lines are divided into 4 sets:
# explicit: a short list of explicit test cases.
# fromAliases: test cases generated from the alias data.
# decanonicalized: test cases generated by reversing the normalization process.
# withIrrelevants: test cases generated from the others by adding irrelevant fields where possible,
# to ensure that the canonicalization implementation is not sensitive to irrelevant fields. These include:
# Language: aaa
# Script: Adlm
# Region: AC
# Variant: fonipa
```

> _Field X can contain any Unicode region subtag values as given in Unicode Technical Standard #35: Unicode Locale Data Markup Language (LDML), excluding grouping codes._
If an implementation overrides CLDR data, then various lines in the relevant test files may need to be modified correspondingly, or skipped.

### EBNF
The BNF syntax used in LDML is a variant of the Extended Backus-Naur Form (EBNF) notation used in [W3C XML Notation](https://www.w3.org/TR/REC-xml/#sec-notation). The main differences are:
Expand Down

0 comments on commit 50f2e98

Please sign in to comment.