From 53acbb1887612a202abb9ea827e0d074c3caed2c Mon Sep 17 00:00:00 2001 From: Mark Davis Date: Wed, 4 Sep 2024 11:37:08 -0700 Subject: [PATCH] CLDR-17830 Updates from TC meeting --- docs/site/downloads/cldr-46.md | 99 ++++++++++++++++++---------------- 1 file changed, 53 insertions(+), 46 deletions(-) diff --git a/docs/site/downloads/cldr-46.md b/docs/site/downloads/cldr-46.md index 39aa8b620a1..44c467cbf15 100644 --- a/docs/site/downloads/cldr-46.md +++ b/docs/site/downloads/cldr-46.md @@ -6,7 +6,7 @@ title: CLDR 46 Release Note | No. | Date | Rel. Note | Data | Charts | Spec | Delta | GitHub Tag | Delta DTD | CLDR JSON | |:---:|:----------:|:---------:|:------:|:--------:|:------------:|:---:|:----------:|:---------:|:---------:| -| 46 | 2024-010-~~XX~~ | ~~[v46]()~~ | ~~[CLDR46](http://unicode.org/Public/cldr/46/)~~ | [Charts46](http://unicode.org/cldr/charts/dev) | [LDML46](http://www.unicode.org/reports/tr35/proposed.html) | [Δ46](https://unicode-org.atlassian.net/issues/?jql=project+%3D+CLDR+AND+status+%3D+Done+AND+resolution+%3D+Fixed+AND+fixVersion+%3D+%2246%22+ORDER+BY+priority+DESC) | ~~[release-46]()~~ | [ΔDtd46](https://www.unicode.org/cldr/charts/dev/supplemental/dtd_deltas.html) | ~~[46.0.0](https://github.com/unicode-org/cldr-json/releases/tag/46.0.0)~~ | +| 46 | 2024-10-~~XX~~ | ~~[v46]()~~ | ~~[CLDR46](http://unicode.org/Public/cldr/46/)~~ | [Charts46](http://unicode.org/cldr/charts/dev) | [LDML46](http://www.unicode.org/reports/tr35/proposed.html) | [Δ46](https://unicode-org.atlassian.net/issues/?jql=project+%3D+CLDR+AND+status+%3D+Done+AND+resolution+%3D+Fixed+AND+fixVersion+%3D+%2246%22+ORDER+BY+priority+DESC) | ~~[release-46]()~~ | [ΔDtd46](https://www.unicode.org/cldr/charts/dev/supplemental/dtd_deltas.html) | ~~[46.0.0](https://github.com/unicode-org/cldr-json/releases/tag/46.0.0)~~ | ## Overview @@ -49,7 +49,7 @@ For a full listing, see [Coverage Levels](https://unicode.org/cldr/charts/46/sup ### DTD Changes -1. Added alt='official' to represent cases where an official value differs from the customary value. +1. Added `alt='official'` to represent cases where an official value differs from the customary value. Currently added for a small number of language names, decimal separators, and grouping separators. 2. Added new numbering systems from Unicode 16.0. @@ -58,40 +58,56 @@ For a full listing, see [Delta DTDs](https://unicode.org/cldr/charts/46/suppleme ### Supplemental Data Changes 1. Currency - 2. New currency code ZWG added — because it was late in the cycle, many locales will just support the code. -3. Timezones + 1. New currency code `ZWG` added — because it was late in the cycle, many locales will just support the code (no symbol or name). +2. Timezones and Metazones 1. Changed the metazone for Kazakhstan to reflect removal of Asia/Almaty, thus dropping the distinction among different regions in Kazakhstan. - 2. Deprecated timezone ids. Altered the handling of: CST6CDT, EST, EST5EDT, MST7MDT, PST8PDT -4. Units - 1. Added units: portion-per-1e9 (aka per-billion), night (for hotel stays), light (as a prefix for light-second, light-minute, etc.) - 2. Changed preferred wind speed preference for some locales to meter-per-second -5. Locale metadata - 1. Minimization for likelySubtags removes many additional redundant mappings. - 1. For example, the mapping acy_Grek → acy_Grek_CY is unnecessary, because the mapping acy → acy_Latn_CY is sufficient. + 2. Added support for deprecated codes by remapping: `CST6CDT → America/Chicago`, `EST → America/Panama`, `EST5EDT → America/New_York`, `MST7MDT → America/Denver`, `PST8PDT → America/Los_Angeles`. +3. Units + 1. Added units: `portion-per-1e9` (aka per-billion), `night` (for hotel stays), `light-speed` (as an internal prefix for **light-second**, **light-minute**, etc.) + 2. Changed preferred wind speed preference for some locales to `meter-per-second`. +More preference changes are planned for the next release. +4. Minimization for likelySubtags removes many additional redundant mappings. + - For example, the mapping `acy_Grek → acy_Grek_CY` is unnecessary, because the mapping `acy → acy_Latn_CY` is sufficient. For the reason why, see the algorithm in [Likely Subtags](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#likely-subtags). - 4. The ordering in the file is more consistent; first the main mappings, then the mapping from region and/or script to likly language, then the data contributed by SIL. - 5. The territories have been cleaned up: there are no ZZ entries, and 001 is limited to artifical languages such as Interlingua. - 6. Language matching dropped Russian (ru) as a fallback language for Ukrainian. - 1. A fallback language is used when the user's primary language is unavailable, + - The ordering in the file is more consistent; first the main mappings, then the mapping from region and/or script to likely language, then the data contributed by SIL. + - The regions have been cleaned up: there are no entries with `ZZ`, and `001` is limited to artifical languages such as Interlingua. The only other macroregion code is in `und_419 → es_Latn_419` (Spanish‧Latin‧Latin America) +5. Language matching + - Dropped the fallback mapping `desired="uk" → supported="ru"` (so that Ukrainian (`uk`) doesn't fall back to Russian (`ru`)). + - Note: A fallback language is used when the user's primary language is unavailable, and either the user doesn't have any secondaries language in their settings (as on Android or iOS) or those secondary languages are also not available. As a result of this change, when the primary and secondary languages are not available, the fallback language would be the system default instead of Russian. -6. Transforms - 1. Major update to Han → Latn, reflecting new data in Unicode 16.0 - 2. Fixes for Arabic numbers, a Farsi vowel + - Added the mapping `desired="scn" → supported="it"`. + - Changed the deprecated code `knn` to `gom`. +7. Transforms + 1. Major update to `Han → Latn`, reflecting new data in Unicode 16.0 + 2. Fixes for Arabic numbers, and a Farsi vowel +8. Other Unicode 16.0 changes + 1. Additional numbering systems + 2. Additional scripts and script identifiers + 3. ScriptMeta has been expanded for Unicode 16.0 +9. Locale identifiers + 1. The subdivision identifiers have been updated to the latest available from ISO + - The removed identifiers have been deprecated + - Missing names have been added (from Wikidata) + 2. The language subtags, script subtags, and variant subtags have been updated to the latest from IANA + - Some codes have been deprecated + 3. Parent and DefaultContent mappings have been added for kaa and kok; DefaultContent mappings added for `kk`, `lld`, `ltg`, `mhn`, and `zh_Latn_CN` +10. Territory Info has been updated from World Bank and other sources: gdp, population, languages. +11. LanguageGroup info has been updated from Wikidata +12. Plural rules have been added for some new locales +13. Week data + - The first day of the week has been changed for `AE` + - Hour preferences (12 v 24) have been added for `en_H`K, `en_MY`, `en_IL` For a full listing, see [¤¤BCP47 Delta](https://unicode.org/cldr/charts/46/delta/bcp47.html) and [¤¤Supplemental Delta](https://unicode.org/cldr/charts/46/delta/supplemental-data.html) ### [Locale Changes](https://unicode.org/cldr/charts/46/delta/index.html) -1. Major changes to emoji search keywords and short names - 1. Data imported from WhatsApp - 2. Increased the maximum number of search keywords - 3. Revision of many search keywords to break up phrases +1. Major changes to emoji search keywords and short names (see below) 2. Major changes to Chinese collation, reflecting new data in Unicode 16.0 3. Other changes 1. Various locales also had smaller improvements agreed to by translators. - -**TBD** + 2. Additional test files have been added. For a full listing, see [Delta Data](https://unicode.org/cldr/charts/46/delta/index.html) @@ -102,14 +118,16 @@ The usage model for emoji search keywords is that - heart → 🥰 😘 😻 💌 💘 💝 💖 💗 💓 💞 💕 💟 ❣️ 💔 ❤️‍🔥 ❤️‍🩹 ❤️ 🩷 🧡 💛 💚 💙 🩵 💜 🤎 🖤 🩶 🤍 💋 🫰 🫶 🫀 💏 💑 🏠 🏡 ♥️ 🩺 - blue → 🥶 😰 💙 🩵 🫐 👕 👖 📘 🧿 🔵 🟦 🔷 🔹 🏳️‍⚧️ - therefore, [heart blue] → 💙 🩵 -- A word that has no hits matches all the words that begin with it; if there are no such words hits, it is ignored. - - [heart | blue | confabulation] is equivalent to [heart | blue] -- Whenever the list is short enough to scan, the user will mouse-click on the right emoji - so it doesn't have to be narrowed too far. +- A word matches all the words that begin with it; if there are no such matches, it is ignored. + - [heart blue confabulation] is equivalent to [heart blue] +- Whenever the list is short enough to scan, the user will mouse-click on the right emoji — so it doesn't have to be narrowed too far. Thus in the following, the user would just click on 🎉 if that works for them. - celebrate → 🥳 🥂 🎈 🎉 🎊 🪅 -In this release WhatsApp data has been incorporated, and the keywords have been simplified in most locales by breaking up multi-word keywords. -An example would be white flag (🏳️) formerly having 3 keyword phrases of [white waving flag | white flag | waving flag], +In this release WhatsApp emoji search keyword data has been incorporated. +In the process of doing that, the maximum number of search keywords per emoji has been increased, +and the keywords have been simplified in most locales by breaking up multi-word keywords. +An example would be white flag (🏳️), formerly having 3 keyword phrases of [white waving flag | white flag | waving flag], now being replaced by the simpler 3 single keywords [white | waving | flag]. The simpler version typically works as well or better in practice. @@ -139,36 +157,25 @@ where only the traditional forms of radicals are now available as index characte ### JSON Data Changes -**TBD** +1. Separate modern packages were dropped [CLDR-16465] +2. Adding transliteration rules [CLDR-16720] (In progress) ### Markdown ### The CLDR site is in the process of being moved to markdown source (GFM), which will regularize the formatting and make it easier to maintain and extend than with Google Sites. The URLs will remain the same. +This process should be completed before release. ### File Changes -All files added in this release were for new locales. +Most files added in this release were for new locales. +There were the following new test files: **TBD*** ### Tooling Changes **TBD** -## Growth -The following chart shows the growth of CLDR locale-specific data over time. -It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data; -nor does it include corrections (which typically outnumber new items). -The % values are percent of the current measure of Modern coverage. -That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. -There is just one line per year, even though there were multiple releases in most years. - -The additional locales at given levels are listed at the top of this document. - -![Screenshot 2024-09-03 at 17 38 52](https://github.com/user-attachments/assets/7cd0a35f-b46f-4891-8f1a-5d0fd7ebfcc3) - -**TBD: Question: is it useful to have this section anymore?** - ## Migration **TBD** @@ -183,7 +190,7 @@ The additional locales at given levels are listed at the top of this document. ## Acknowledgments -Many people have made significant contributions to CLDR and LDML; see the [Acknowledgments](https://cldr.unicode.org/index/acknowledgments) page for a full listing. +Many people have made significant contributions to CLDR and LDML; see the [Acknowledgments](https://cldr.unicode.org/index/acknowledgments) page for a full listing. We'd also like to acknowledge the work done by interns this release: **TBD** The Unicode [Terms of Use](https://unicode.org/copyright.html) apply to CLDR data; in particular, see [Exhibit 1](https://unicode.org/copyright.html#Exhibit1).