diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index 0c44f9779ca..439137f1b0b 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -698,6 +698,37 @@ Private use codes fall into three groups. See also _[Unknown or Invalid Identifiers](#Unknown_or_Invalid_Identifiers)_. +### Special Script Codes +Certain valid script code require special handling. +These are the codes in [Script Codes](https://www.unicode.org/iso15924/iso15924-codes.html) with the words "variant" or "alias" within parentheses, +excluding Zsye. +The Compound codes include characters in multiple scripts; +the Visual variants are distinct in appearance, but otherwise encompass a single script; +and the Subsets exclude certain characters from a script. +The Equivalents for Subsets are not as well defined, so the "Equivalents" are marked as approximate. + +| Group | Script | Equivalent | +| --- | --- | --- | +| Compounds | Jpan | ≡ Hani ∪ Hira ∪ Kana | +| | Hrkt | ≡ Hira ∪ Kana | +| | Kore | ≡ Hani ∪ Hang | +| | Hanb | ≡ Hani ∪ Bopo | +| Visual variants | Aran | ≡ Arab (Nastaliq variant) | +| | Cyrs | ≡ Cyrl (Old Church Slavonic variant) | +| | Latf | ≡ Latn (Fraktur variant) | +| | Latg | ≡ Latn (Gaelic variant) | +| | Syrn | ≡ Syrc (Eastern variant) | +| | Syre | ≡ Syrc (Estrangelo variant) | +| | Syrj | ≡ Syrc (Western variant) | +| Subsets (approximate) | Jamo | ≃ Hang - LVT - LV | +| | Hans | ≃ Hani - Traditional-only | +| | Hant | ≃ Hani - Simplified-only | + +The special codes most frequently used are in the locale identifiers zh-Hans, zh-Hant, ja-Jpan, and ko-Kore. +These are used, for example, in [Likely Subtags](#Likely_Subtags) in LDML. +Some of the special codes are used in other specifications, +such as in [Mixed_Script_Detection](https://unicode.org/reports/tr39/#Mixed_Script_Detection). + ### Unicode BCP 47 U Extension