diff --git a/docs/site/images/index/APIIntegration.png b/docs/site/images/index/APIIntegration.png new file mode 100644 index 00000000000..8fc9fcf85e5 Binary files /dev/null and b/docs/site/images/index/APIIntegration.png differ diff --git a/docs/site/images/index/cldrGrowthChart.png b/docs/site/images/index/cldrGrowthChart.png new file mode 100644 index 00000000000..ecb3045bf2a Binary files /dev/null and b/docs/site/images/index/cldrGrowthChart.png differ diff --git a/docs/site/images/index/growth44.png b/docs/site/images/index/growth44.png new file mode 100644 index 00000000000..a9710a939fb Binary files /dev/null and b/docs/site/images/index/growth44.png differ diff --git a/docs/site/index/downloads/cldr-43.md b/docs/site/index/downloads/cldr-43.md new file mode 100644 index 00000000000..e2cd2a50524 --- /dev/null +++ b/docs/site/index/downloads/cldr-43.md @@ -0,0 +1,265 @@ +--- +title: CLDR 43 Release Note +--- + +# CLDR 43 Release Note + +| No. | Date | Rel. Note | Data | Charts | Spec | Delta Tickets | GitHub Tag | Delta DTD | CLDR JSON | +|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +| 43 | 2023-04-12 | [v43](http://cldr.unicode.org/index/downloads/cldr-43) | [CLDR43](http://unicode.org/Public/cldr/43/) | [Charts43](https://unicode.org/cldr/charts/43#h.bzf6i36qsctj) | [LDML43](https://www.unicode.org/reports/tr35/tr35-68/tr35.html) | [ΔV43](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20CLDR%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20(Fixed%2C%20%22Fix%20in%20Survey%20Tool%20(CLDR)%22%2C%20%22Fixed%20non-repo%22)%20AND%20fixVersion%20%3D%20%2243%22%20ORDER%20BY%20resolution%20ASC%2C%20component%20ASC%2C%20priority%20DESC%2C%20created%20ASC) | [release-43](https://github.com/unicode-org/cldr/tree/release-43) | [ΔDtd43](https://cldr-smoke.unicode.org/staging-dev/charts/43/supplemental/dtd_deltas.html) | [43.0.0](https://github.com/unicode-org/cldr-json/releases/tag/43.0.0) | +| 43.1 | 2023-06-15 | [v43.1](https://cldr.unicode.org/index/downloads/cldr-43#h.qobmda543waj) | n/a | n/a | [LDML43.1](https://www.unicode.org/reports/tr35/tr35-69/tr35.html) | [ΔV43.1](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20CLDR%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20(Fixed%2C%20%22Fix%20in%20Survey%20Tool%20(CLDR)%22%2C%20%22Fixed%20non-repo%22)%20AND%20fixVersion%20%3D%2043.1%20ORDER%20BY%20resolution%20ASC%2C%20component%20ASC%2C%20priority%20DESC%2C%20created%20ASC) | [release-43-1](https://github.com/unicode-org/cldr/tree/release-43-1) | See note in [43.1 Changes](https://cldr.unicode.org/index/downloads/cldr-43#h.qobmda543waj) | [43.1.0](https://github.com/unicode-org/cldr-json/releases/tag/43.1.0) | + +See [Key to Header Links](https://cldr.unicode.org/index/downloads#h.xq13gabuoy9w) + +## Overview + +Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all [major software systems](https://cldr.unicode.org/index#TOC-Who-uses-CLDR-) (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. *It is important to review the [Migration](https://cldr.unicode.org/index/downloads/cldr-43#h.7s25aqdv767e) section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).* + +CLDR 43\.1 is a **dot release** focused on fixing specific issues. For more details for see [Version 43\.1 Changes.](https://cldr.unicode.org/index/downloads/cldr-43#h.qobmda543waj) + +CLDR 43 is a **limited\-submission release**, focusing on just a few areas: + +1. **Formatting Person Names** + - Completing the data for formatting people‘s names, bringing it out of “tech preview”. For more information on the benefits of this feature, see [Background](https://sites.google.com/unicode.org/cldr/index/downloads/cldr-42#h.xtb1v8tpviuc). +2. **Locales** + - Adding substantially to the LikelySubtags data + 1. This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. + 2. The data has been contributed by [SIL](https://www.sil.org/). + - Inheritance + 1. Adding components to parentLocales. + 2. Documenting the different inheritance for rgScope data, which inherits primarily by region. +3. **Other data updates** + - In English, Türkiye is now the primary country name for the country code TR, and Turkey is available as an alternate. Other locales have been reviewed to see whether similar changes would be appropriate. + - Name for the new timezone *Ciudad Juárez* +4. **Structure** + - Adding some structure and data needed for ICU4X and JavaScript, for calendar eras and parentLocales. + - All files have been moved from 'seed' to 'common'. +5. **Collation \& Searching** + - Treating various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. + +For details, see below. + +### Locale Status + +The bar for each coverage level [increases each release](https://en.wikipedia.org/wiki/Red_Queen%27s_race#:~:text=said%20the%20Queen.%20%22Now%2C%20here%2C%20you%20see%2C%20it%20takes%20all%20the%20running%20you%20can%20do%2C%20to%20keep%20in%20the%20same%20place.). Faroese (fo) increased from Basic to Moderate, while Cherokee (chr), Lower Sorbian (dsb), and Upper Sorbian (hsb) dropped from Modern to Moderate. + +[CLDR v43 Coverage](https://drive.google.com/open?id=1wIYTQX4vE0LE_iBRcFuVl3tTv3KY4hL_-P5Jx7hvILI) + +## Version 43\.1 Changes + +**Version 43\.1 currently in Beta**. It is planned to be a dot release that addresses the following issues. The main changes are for compatibility (including parser compatibility and GB 18030\-2022 Level 2 support). To access the release data, use the release tag or the json link. The following tickets are included: + +### GB18030\-2022 Compliance + +- [CLDR\-16571 Characters needed for GB18030 implementation level 2 should be in "short" versions of Chinese collations](https://unicode-org.atlassian.net/browse/CLDR-16571) + +### Compatibility + +The following changes are included to allow for better compatibility with certain parsers. + +- [CLDR\-16606 Support ASCII space in English time formats between time and AM/PM, using alt\="ascii"](https://unicode-org.atlassian.net/browse/CLDR-16606) +- [CLDR\-16634 Revert treatment of '@' as ALetter for word break](https://unicode-org.atlassian.net/browse/CLDR-16634) + +### Other + +- [CLDR\-16247 Decimal and grouping separator for en\_ZA does not align with in\-country usage](https://unicode-org.atlassian.net/browse/CLDR-16247) +- [CLDR\-16623 Fix the LDML specification for the locale to use for name formatting](https://unicode-org.atlassian.net/browse/CLDR-16623) +- [CLDR\-16643 Cyrl ↔︎ Latn should use Kh instead of Ḫ](https://unicode-org.atlassian.net/browse/CLDR-16643) + +The only **DTD change** is the additional of alt\="ascii" for time formats: + +\ +    \ +\ +    \ + +## Data Changes + +### [DTD Changes](https://www.unicode.org/cldr/charts/43/supplemental/dtd_deltas.html) + +- **[Person Names](https://www.unicode.org/reports/tr35/tr35-personNames.html#Contents) (formerly in tech preview in CLDR 42\)** + - Changed the **order, length, usage, formality** attribute values to be single elements, not sets. + - Expanded the sample names, and changed two field values (**prefix, suffix**) to be more descriptive (**title, generation, credentials**), splitting the suffix because the placement may vary. +- **Date Eras** + - Eras were accessed only by number. There are now alphanumeric identifiers added with new attributes: an identifying **code** plus **aliases**. + - Calendars may inherit eras with the **inheritEras** element. For example, the Japanese calendar inherits from Gregorian previous to a certain point in history. +- **Locales** + - The **parentLocale** elements now have an optional **component** attribute, with a value of **segmentations** or **collations**. These should be used for inheritance for those respective elements. For example, zh\_Hant does not normally inherit from zh (since people would get a ransom\-note effect with mixed scripts). However, collations can be designed to handle sets of characters for multiple writing systems. + - Likely Subtags now have an attribute to indicate the **origin**, currently: **sil1, wikidata, special**. +- **Cleanup** + - The @MATCH values were not being tested for some entries, so the valid entries were extended for the elements: **cr, rbnfrule**. + +### [BCP47 Data Changes](https://www.unicode.org/cldr/charts/43/delta/bcp47.html) + +- A new timezone short id was added (tz\-mxcjs, for Ciudad Juárez), and the description for Istanbul updated the country spelling to Türkiye. + +### [Supplemental Data Changes](https://www.unicode.org/cldr/charts/43/delta/supplemental-data.html) + +- **Units** + - A new unit was added for the Beaufort scale. Translations are only provided for a few locales that are known to use it. + - Unit preferences were added for floor area, rainfall speed, and snowfall speed. See [Units](https://cldr-smoke.unicode.org/staging-dev/charts/43/delta/supplemental-data.html#Units) for differences. +- **Locales** + - Special parentLocales are added for collations and segmentations.  See [Locale \> Parent…](https://cldr-smoke.unicode.org/staging-dev/charts/43/delta/supplemental-data.html#Locale) for the differences. + - Many new likely subtag mappings were added, thanks to contributions from SIL. See [Likely \> Subtag](https://cldr-smoke.unicode.org/staging-dev/charts/43/delta/supplemental-data.html#Likely) for differences. +- **Transforms** + - Aliases for certain Ethiopic transliterators were added. + - New **test** transliterators for Jpan, Khmr, Laoo, and Sinh scripts were added. These are intended for testing, not for production (especially for Jpan scripts, which requires NLP for acceptable results). + - See [Transforms](https://www.unicode.org/cldr/charts/43/delta/supplemental-data.html#Transform) for the differences. +- **Language Info** + - Preferred hours were changed for CW (Curaçao). +- **Metazones** + - Data was changed for 3 zones, and added new metazone for Ciudad Juárez. See [Metazone](https://cldr-smoke.unicode.org/staging-dev/charts/43/delta/supplemental-data.html#Metazone). + +### Locale Changes + +- **Person Name Data** + - Expanded data was collected for sample names. These are not meant for use in production, but rather to give translators a feeling for how these names would appear with the different the name formatting patterns. + - Data was also collected for more locales, and additional warning messages were added to alert translators about possible problems. +- **Inheritance Changes**: Data was added due to inheritance changes in order to maintain correctness of the data. Clients shouldn't need to take any action, but may notice a larger size. However, clients that use mechanisms such as string pools may see no growth at all. + - CLDR data uses two kinds of inheritance:  + - **vertical** — items inherited from parent languages (eg, fr\_CA inherits from fr) + - **horizontal** — items inherited within the same language (narrow Month translations inherit from short ones when the same value is expected for both) + - These can affect two kinds of data: + - **missing values** — where the locale has no data (eg, no narrow Month translations) + - **marked values** — where CLDR has a special internal marker, which doesn't appear in the production data for a release. These specially marked values have always been removed from production data. + - There are a few cases where these modes of inheritance can conflict. To prevent that from happening (both in processing CLDR files and in clients), the internal data has been “***hardened***” — marked values have been replaced by explicit data values. This makes it more likely that clients that don't handle horizontal inheritance correctly will end up with the right answer. +- **Updates:** + - The term *Türkiye* is now used for the country instead of Turkey for English (the alternate spelling is also available). Where appropriate, a corresponding term is used in other languages. + - Name for the new timezone *Ciudad Juárez* +- **Locales** —The following locales were added, but only have Core level for this release. + - North Levantine Arabic (apc), Choctaw (cho), Lombard (lmo), Papiamento (pap), Riffian (rif) +- **Collation \& Searching** + - The default collation and searching now treats various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. In searching they are treated as identical when ignoring case and accents; in collation they are ignored unless there are no primary differences (such as a vs b) and no preceding secondary differences (like a vs â). +- **Exemplars** + - The exemplar characters for Chinese (zh) now include all TGH 2013 Level 1 characters +- **Rule\-Based Number Format** + - There were various fixes to some locales: see the tickets for more information. + +### File Changes + +**New files:** + +- /common/annotationsDerived/ + - bgn.xml, lij.xml, nso.xml, quc.xml, tn.xml +- /common/main/ + - apc.xm, apc\_SY.xml, cho.xml, cho\_US.xml, lmo.xml, lmo\_IT.xml, pap.xml, pap\_AW.xml, pap\_CW.xml, rif.xml, rif\_MA.xml +- /common/testData/personNameTest/ + - 122 files +- /common/testData/transforms/ + - am\-Ethi\-t\-am\-ethi\-m0\-geminate.txt, und\-Latn\-t\-und\-ethi\-m0\-aethiopi\-geminate.txt, und\-Latn\-t\-und\-ethi\-m0\-alaloc\-geminate.txt, und\-Latn\-t\-und\-ethi\-m0\-beta\-metsehaf\-geminate.txt, und\-Latn\-t\-und\-ethi\-m0\-ies\-jes\-1964\-geminate.txt +- /common/transforms/ + - Japn\-Latn.xml, Khmer\-Latin.xml, Lao\-Latin.xml, Sinhala\-Latin.xml, am\-Ethi\-t\-am\-ethi\-m0\-geminate.xml, und\-Ethi\-t\-und\-latn\-m0\-aethiopi\-geminate.xml, und\-Ethi\-t\-und\-latn\-m0\-alaloc\-geminate.xml, und\-Ethi\-t\-und\-latn\-m0\-beta\_metsehaf\-geminate.xml, und\-Ethi\-t\-und\-latn\-m0\-ies\-jes\-1964\-geminate.xml + +***Note: All files were moved from seed to common (see the Migration section)*** + +### JSON Data Changes + +- **JSON packaging changes due to the seed/main merge ([CLDR\-16425](https://unicode-org.atlassian.net/browse/CLDR-16425))** + - The **\-modern** tier now reflects locales which are actually at modern, not those locales which are targeted to modern. (See [CLDR\-16465](https://unicode-org.atlassian.net/browse/CLDR-16465) for a proposal to consider dropping the **\-modern** tier.) + - The **\-full** tier now includes all locales, including those formerly in seed. Use the coverageLevels.json file in the cldr\-core package to filter out locales. (See the **Migration** section, below.) + - There is an "effectiveCoverageLevels" key in coverageLevels.json which contains coverage levels for sublocales. +- parentLocales.json now has new keys for collations and segmentations parent information ([CLDR\-16425](https://unicode-org.atlassian.net/browse/CLDR-16425)) +- coverageLevels.json has a new key, effectiveCoverageLevels, with calculated coverage levels for sublocales ([CLDR\-16425](https://unicode-org.atlassian.net/browse/CLDR-16425)) +- unitIdComponents.json, now the \_values keys are arrays instead of space\-separated strings ([CLDR\-16373](https://unicode-org.atlassian.net/browse/CLDR-16373) ) +- languages.json and other files no longer include some code\-fallback data, such as "apc": "apc" where the translation is the same as the code.  ([CLDR\-16468](https://unicode-org.atlassian.net/browse/CLDR-16468)) + - For time zone names, clients will need to construct the fallback exemplar city [per spec](https://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Names). For example, America/Los\_Angeles → "Los Angeles" (last field of the TZID, and turn \_ into space). + - For language names, see the [locale display name algorithm](https://www.unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm). The "composed" forms are no longer automatically included in the data. For example, purely composed forms such as "en\_GB": "en (GB)" or "en\_GB": "English (United Kingdom)" are no longer present in the JSON data, unless there is an explicit translation such as "en\_GB":"British English". + - Implementations should be aware that some und.json files may now be completely missing due to this change. + +See the Migration section for general data changes. + +## Specification Changes + +Please see [Modifications](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Modifications) section in the LDML for full list of items: + +- Removed numbering from sections, to allow for more flexible reorganization of the specification in the future. +- [Person Names](https://www.unicode.org/reports/tr35/tr35-68/tr35-personNames.html#Contents) + - Brought Person Name Formatting out of tech preview. + - Described the changes from the fields prefix and suffix to the fields title, generation, and credentials. The problem was that ‘prefix’ and ‘suffix’ are positional terms, whereas the contents may need to change position based on the locale. + - Provided much more detailed algorithms for the whole [Formatting Process](https://www.unicode.org/reports/tr35/tr35-68/tr35-personNames.html#formatting-process), including additional processing steps such as [Handle missing surname](https://www.unicode.org/reports/tr35/tr35-68/tr35-personNames.html#handle-missing-surname). + - Documented changes in the [Sample Name](https://www.unicode.org/reports/tr35/tr35-68/tr35-personNames.html#sample-name) structure (whose primary use is internal to CLDR data collection). + - For more background, the [Person Names Guide](https://docs.google.com/document/d/1mjxIHsb97Og8ub6BKWxOihcHz7zjU4GdFkIxWHGAtes/edit#heading=h.4u6bqbd313a5) may be helpful, although it is primarily targeted at CLDR data submitters. +- Locales + - Fixed formatting errors in [Likely Subtags](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#likely-subtags). + - Improved the specification information about the effect of locale keywords: + - "fw" keyword for first day of the week in [Week Data](https://www.unicode.org/reports/tr35/tr35-68/tr35-dates.html#Week_Data) + - "hc" keyword for hour cycle in [Time Data](https://www.unicode.org/reports/tr35/tr35-68/tr35-dates.html#Time_Data) + - "dx", "lb", "lw", "ss" keywords related to line wrapping in [Segmentations](https://www.unicode.org/reports/tr35/tr35-68/tr35-general.html#segmentations) + - "cf" keyword in [Currency Formats](https://www.unicode.org/reports/tr35/tr35-68/tr35-numbers.html#Currency_Formats) + - "ca", "cf", "dx", "fw", "hc", "lb", "lw", "ms", "mu", "rg" keywords updates in [Key And Type Definitions](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Key_And_Type_Definitions_) + - [Parent Locales](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Parent_Locales) + - Documented the new component attribute, which provides for different inheritance behavior for different components (such as segmentation or collation). + - [Region\-Priority Inheritance](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Region_Priority_Inheritance) + - Documented the differences in inheritance for rgScope data, which inherits primarily by region rather than primarily by language. + - Includes small changes in [\: Scope of the “rg” Locale Key](https://www.unicode.org/reports/tr35/tr35-68/tr35-info.html#rgScope), in [Lookup](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#lookup), and in [Bundle vs Item Lookup](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Bundle_vs_Item_Lookup). +- [Calendar Data](https://www.unicode.org/reports/tr35/tr35-68/tr35-dates.html#calendar-data) + - Documents new optional code and aliases attributes to eras, which allow string IDs for eras instead of just numbers. +- [Data Size Reduction](https://www.unicode.org/reports/tr35/tr35-68/tr35.html#Data_Size) + - Added new section with guidance on how to reduce CLDR data size where necessary. +- [Telephone Code Data](https://www.unicode.org/reports/tr35/tr35-68/tr35-info.html#Telephone_Code_Data) + - Added pointer to the recommended open\-source library [libphonenumber](https://github.com/google/libphonenumber#what-is-it). + +## Growth + +The following chart shows the growth of CLDR locale\-specific data over time. It is restricted to data items in **/main** and **/annotations** directories, so it does not include the non\-locale\-specific data. The % values are percent of the *current* measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years. + +The detailed information on changes between v43 release and v42 are at [v43]() [delta\_summary.tsv](https://www.unicode.org/cldr/charts/43/tsv/delta_summary.tsv): look at the TOTAL line for the overall counts of Added/Deleted/Changed. + +Because this was a limited\-submission release, there are a small number of changes visible. + +![image](../../images/index/cldrGrowthChart.png) + +## Language Matching + +CLDR has data for *language matching*, as in [this chart](https://unicode.org/cldr/charts/43/supplemental/language_matching.html). The purpose and usage is sometimes misunderstood.  + +So how is this used? Consider a user whose first language is Breton. If they open an application that only has localizations for English, German, and French, then Breton will not be available. In that case, the data in CLDR can be used to select French as a fallback localization — *in the absence of other information*.  + +That last clause is important. The CLDR data is based on the *likelihood* that a person using language X understands text written in language Y, but large portions of the population for X might prefer other languages.  + +The CLDR language matching data can *and should* be overridden whenever there is more information available from a user that allows an implementation to do a better job. It is ***strongly recommended*** that systems allow users to not only specify their preferred language, but also any secondary languages in order of priority. Thus a person speaking Kazakh who also knows French could specify French as a secondary language, and get a French localization for an app instead of the CLDR match. This has been done on both Android and iOS, for example. + +**Important**:  language matching is different from the CLDR *inheritance mechanism*: they serve different purposes, and are not aligned. The CLDR inheritance mechanism is how CLDR organizes localized data, and should not be used for language matching. Applications do not need to follow the CLDR inheritance chain. + +**References**: [LDML Language Matching](https://www.unicode.org/reports/tr35/#LanguageMatching), [LDML Inheritance vs Related Information](https://www.unicode.org/reports/tr35/#Inheritance_vs_Related), [ICU4J Locale Matcher](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/index.html?com/ibm/icu/util/LocaleMatcher.html), [ICU4C Locale Matcher](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/localematcher_8h.html#details)  + +## Migration + +- **Seed has been merged into Common ([CLDR\-6396](https://unicode-org.atlassian.net/browse/CLDR-6396))** + - All files have been moved from the **seed/** to the **common/** subdirectory. + - Implementations should make use of the **common/properties/coverageLevels.txt** file (added in CLDR v41\) to filter locale files appropriately, in place of depending on incomplete files being in seed. This file and its usage is documented at [Coverage Levels](https://cldr.unicode.org/index/cldr-spec/coverage-levels). ([CLDR\-16420](https://unicode-org.atlassian.net/browse/CLDR-16420)). + - Background: Older versions of CLDR separated some locale files into a 'seed' directory, which some implementations used for filtering, but the criteria for moving from seed to common were not rigorous. To maintain compatibility with its set of locales used from previous versions, an implementation may use the **coverageLevels.txt** file filtering for Basic and above, but then also add locales that were previously included. +- **Interval Formats** + - A small number of interval formats (like “Dec 2 – 3”) have their spacing changed for consistency. This is unlikely to cause problems, as they are similar to a large number of similar changes in v42\. +- **Person Name Formatting** + - Person Name Formatting was in Tech Preview, to allow for feedback. It has now advanced out of Tech Preview and can be used in production. We will continue to enhance the data in subsequent releases, but will now maintain compatibility. + - The field structure for the person name patterns was changed while in Tech Preview. This changed two field values (**prefix, suffix**) to be more semantically based (**title, generation, credentials**) instead of positionally based, splitting the suffix because the placement may vary. + - The handling of literals between placeholders in patterns has also changed. For example, when the pattern “{given}•{given2}•{surname}” is used to format a name record \[given\=Albert, surname\=Einstein], the missing field is collapsed and the adjacent literals coalesced, given the equivalent of the pattern “{given}•{surname}”, and thus yielding “Albert•Einstein” rather than “Albert••Einstein”. Beforehand, translators would have to supply an extra pattern to avoid the •• result. + - The handling of spaces in the final formatted string has also changed. + - The specification has been substantially revised to more clearly provide the exact steps to take in formatting a name, so any code formatting person names using the tech preview from v42 must be carefully reviewed and adjusted as necessary. +- **Collation** + - As usual when there are collation changes, databases may need to re\-index sorted fields. +- **Locale Inheritance** + - The parentLocales now have an optional component attribute. This attribute **MUST** not simply be ignored; otherwise data from different components could override the main parentLocale data. The attribute specifies inheritance adjustments that should be used for ***segmentations*** or ***collations***. For example, zh\_Hant does not normally inherit from zh (since people would get mixed scripts). However, collations can be designed to handle sets of characters for multiple writing systems. + - Items marked as rgScope should have different inheritance lookup, which is recommended for the best results. However, implementations that use the general inheritance lookup will see no changes. +- **Other New Attributes** + - Calendar metadata now has new era attributes (**code** \& **aliases**), and element **inheritEras**, all of which may be ignored if not supported.  ([CLDR\-16469](https://unicode-org.atlassian.net/browse/CLDR-16469)) + - Likely Subtags now have an attribute to indicate the **origin** of the data. This is informational, and typically be ignored by implementations. +- **Turkey / Türkiye** + - In v42, the customary English name for the country code TR was "Turkey", and an alternate name was "Türkiye". In v43, the customary English name was changed to "Türkiye", and the alternate name was set to "Turkey". Translators were advised of the change, and reviewed the names in their locales to see if any needed adjustment. Implementations that wish to retain the English name "Turkey" may choose to use the alternate form. + +## Known Issues + +None currently. + +## Acknowledgments + +Many people have made significant contributions to CLDR and LDML; see the [Acknowledgments](https://cldr.unicode.org/index/acknowledgments) page for a full listing. + + +The Unicode [Terms of Use](https://unicode.org/copyright.html) apply to CLDR data; in particular, see [Exhibit 1](https://unicode.org/copyright.html#Exhibit1). + +For web pages with different views of CLDR data, see . + + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/index/downloads/cldr-44.md b/docs/site/index/downloads/cldr-44.md new file mode 100644 index 00000000000..83d5563fb59 --- /dev/null +++ b/docs/site/index/downloads/cldr-44.md @@ -0,0 +1,231 @@ +--- +title: CLDR 44 Release Note +--- + +# CLDR 44 Release Note + +| No. | Date | Rel. Note | Data | Charts | Spec | Delta Tickets | GitHub Tag | JSON Tag | Delta DTD | +|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +| 44 | 2023-10-31 | [v44](http://cldr.unicode.org/index/downloads/cldr-44) | [CLDR44](http://unicode.org/Public/cldr/44/) | [Charts44](https://unicode.org/cldr/charts/44/) | [LDML44](https://www.unicode.org/reports/tr35/tr35-70/tr35.html) | [Δ44](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20CLDR%20AND%20status%20%3D%20Done%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%20%2244%22%20ORDER%20BY%20component%20ASC%2C%20priority%20DESC%2C%20created%20ASC) | [release-44](https://github.com/unicode-org/cldr/tree/release-44) | [44.0.0](https://github.com/unicode-org/cldr-json/releases/tag/44.0.0)* | [ΔDtd44](https://unicode.org/cldr/charts/44/supplemental/dtd_deltas.html) | +| 44.1 | 2023-12-13 | [v44.1](http://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx) | n/a | n/a | [LDML44.1](https://www.unicode.org/reports/tr35/tr35-71/tr35.html) | [Δ44.1](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20CLDR%20AND%20status%20%3D%20Done%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%20%2244%2E1%22%20ORDER%20BY%20component%20ASC%2C%20priority%20DESC%2C%20created%20ASC) | [release-44-1](https://github.com/unicode-org/cldr/tree/release-44-1) | [44.1.0](https://github.com/unicode-org/cldr-json/releases/tag/44.1.0) | See [44.1 Changes](http://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx) | + +See [Key To Header Links](https://cldr.unicode.org/index/downloads#h.xq13gabuoy9w) +*Note: For NPM, the JSON data uses version 44.0.1 + +## Overview + +Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all [major software systems](https://cldr.unicode.org/index#h.ezpykkomyltl) (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. + +In CLDR 44, the focus is on: + +1. **Formatting Person Names**. Added further enhancements (data and structure) for formatting people's names. For more information on why this feature is being added and what it does, see [Background](https://sites.google.com/unicode.org/cldr/index/downloads/cldr-42#h.xtb1v8tpviuc). +2. **Emoji 15**\.1 Support. Added short names, keywords, and sort\-order for the new Unicode 15\.1 emoji. +3. **Unicode 15**\.1 additions. Made the regular additions and changes for a new release of Unicode, including names for new scripts, collation data for Han characters, etc. +4. **Digitally disadvantaged language coverage**. Work began to improve DDL coverage, with the following DDL locales now having higher coverage levels: + 1. **Modern**: Cherokee, Lower Sorbian, Upper Sorbian + 2. **Moderate**: Anii, Interlingua, Kurdish, Māori, Venetian + 3. **Basic**: Esperanto, Interlingue, Kangri, Kuvi, Kuvi (Devanagari), Kuvi (Odia), Kuvi (Telugu), Ligurian, Lombard, Low German, Luxembourgish, Makhuwa, Maltese, N’Ko, Occitan, Prussian, Silesian, Swampy Cree, Syriac, Toki Pona, Uyghur, Western Frisian, Yakut, Zhuang + +### Locale Coverage Status + +The coverage status determines how well languages are supported on laptops, phones, and other computing devices. In particular, qualifying at a Basic level is typically a requirement just for being selectable on phones as a language. Note that for each language there are typically multiple locales, so 90 languages at Modern coverage corresponds to more than 350 locales at that coverage. + +Below is the coverage in this release: + +[CLDR v44 Coverage](https://drive.google.com/open?id=1oCc8e78wGoLv3XSUqzVk85n7AU25yi_sUsyv6iWKj6A) + + +## Version 44\.1 Changes + +### DTD Changes + +- In ldmlSupplemental.dtd, unicodeVersion was corrected to be 15\.1\.0 ([CLDR\-17225](https://unicode-org.atlassian.net/browse/CLDR-17225)). +- In ldmlKeyboard3\.dtd, the locale element id attribute was incorrectly flagged as @VALUE, fixed ([CLDR\-17204](https://unicode-org.atlassian.net/browse/CLDR-17204)). + +### Specification Changes + +- The description of the syntax for the \-u\-dx locale ID key was improved to resolve some ambiguities ([CLDR\-17194](https://unicode-org.atlassian.net/browse/CLDR-17194)). +- The section on synthesizing emoji sequence names was updated to  cover emoji names and keywords for emoji facing\-right sequences ([CLDR\-17230](https://unicode-org.atlassian.net/browse/CLDR-17230)). + +### Data Changes + +- Annotations for emoji facing\-right sequences were added ([CLDR\-17230](https://unicode-org.atlassian.net/browse/CLDR-17230)). +- CLDR tooling was improved to better fix cases when multiple spaces of different types were used instead of single space, and was then used to find and fix cases where a normal space was used in combination with NARROW NO\-BREAK SPACE or THIN SPACE ([CLDR\-17233](https://unicode-org.atlassian.net/browse/CLDR-17233)). This affected 6 locales: fr, hi\_Latn, ku, pap, syr, vi. + +## Data Changes + +### [DTD Changes](https://www.unicode.org/cldr/charts/44/supplemental/dtd_deltas.html) + +The following is a summary of the DTD changes which reflect changes in the structure. The relevant ones are described more fully in the data changes. + +**LDML** + +- **characterLabels** \- characterLabelPattern addition of 'facing\-left' and 'facing\-right' to support Unicode 15\.1 emoji that can face different directions. +- **contextTransformUsage** \- many more values allowed for the type attribute (previously it only supported a subset of the documented values) +- **dateFormatItem** and **intervalFormatItem** \- many more skeletons allowed for the id attribute, for example  EEEEd,  GyMEEEEd, GyMMMEEEEd, GyMMMMEd … +- **territory** \- added two alternative names for the territory "IO": "British Indian Ocean Territory" and "Chagos Archipelago" +- **personNames** + - Added two new parameter defaults for **length** and **formality**. These allow users to set the most customary values used in their language for common usage. + - Added a new field **nativeSpaceReplacement**. This can be used in languages that don't normally use spaces between words. + +**Supplemental Data** + +- convertUnit/systems \- additional unit systems have been added, for finer\-grained distinctions. +- unitQuantity/descriptions \- descriptions can be added for unit quantities (such as length, area, etc.) + +**BCP47** + +- key/types \- allow for an IANA parameter for timezones, so that the current 'canonical' timezone can be identified and used. + +**Keyboards** + +- see “Keyboard Changes”, below. + +### [BCP47 Changes](https://www.unicode.org/cldr/charts/44/supplemental/index.html) + +- The Islamic calendar is now described as Hijri calendar in English, and may have also changed in other locales. +- The new **iana** attribute provides the time zone ID used in zone.tab file in IANA time zone database if CLDR long canonical ID is different. For example, **iana** attribute value is "Asia/Kolkata" for CLDR long canonical ID "Asia/Calcutta". + +### [Supplemental Data Changes](https://www.unicode.org/cldr/charts/44/supplemental/index.html) + +- New locales were added, including en\_ID and es\_JP, plus many locales at a Basic level. +- Fixes + - There was a fix made for the Zanb script, which was mistakenly categorized as **special** instead of **regular**. + - There was a fix made to the BCP47 Latin↔︎ASCII transliterator ID +- Units + - The gasoline\-energy\-density unit (used in miles per gallon of gasoline equivalent (MPGe) for electric vehicles) and the pint\-imperial (used in the UK), plus many Japanese traditional units were added. + - The unit of wind speed, Beaufort, was added for translation in locales where it is used. + - Remaining SI units were added. Because these are primarily of use in scientific fields, they are not translated. + - A few traditional English units were added, such as chain and fortnight. These were not translated. + - Many traditional Japanese units were added. These were not translated, aside from Japanese and English. + - Many units have more refined (and sometimes corrected) unit systems. + - The new SI prefixes for powers of 10 have generally been added: 30, 27, \-27, \-30\. In some non\-Latin\-script languages there are not yet standard names for these, and in those the prefixes are left with Latin characters. +- Likely Subtags — general cleanup + - Addition of data donated by SIL for determining the most likely script and region for languages. + - Addition of more und\_ mappings. These provide for getting a default language if only the script, region, or both are known. These are, however, of limited usage, so implementations may want to filter them out. + - Removal of macroregion codes, such as und\_002\. These were of very limited utility, and have been removed. +- Language Containment Groups + - Additional mappings have been added. +- Plural rules — have been added for blo. +- Preferred hour formats — have changed substantially for many Latin American locales. + +### Locale Changes + +- There were general changes to fix the lenient parsing set for \$. (The previous format for entering Unicode characters led to not escaping \$; the new format is more forgiving.) +- Many locales changed the name for the code IO, "British Indian Ocean Territory", to names similar to "Chagos Archipelago". Now there are two alternate names, so implementations can use the name that works best for them. +- The name of the Islamic calendar has been changed in English (and many other locales) to use the more descriptive name "Hijri calendar". +- Some flexible date formats may use different spacing. +- Sierra Leone changed their currency — the new names are available, and the old names have an appended date range. +- The Kyrgyzstan narrow currency symbol "⃀" is now used. (Note: CLDR typically holds off on using new Unicode characters for currencies for a few cycles, to allow system fonts to catch up.) +- There was a concerted effort to fix the Person Name Formatting data for a number of locales. +- There was a concerted effort to fix the names of certain units of measurement for many locales. +- The names and search keywords of new emoji in Unicode 15\.1 have been added. +- Many languages added search keywords for symbols like ◉, ⋂, ⊆ +- Languages made improvements to other items as needed per language. + +### File Changes + +(Aside from locale files) + +**Additions:** + +**New XSD files in /common/dtd/.** + +*These correspond to the DTDs, but do not carry the extra validity annotations.* + +- ldml.xsd, ldmlBCP47\.xsd, ldmlSupplemental.xsd, xml.xsd + +**New Test Data files in /common/testData/** + +- localeIdentifiers/likelySubtags.txt +- personNameTest/\_header.txt, \_readme.txt, chr.txt, sw\_KE.txt, tg.txt, ti.txt, wo.txt +- transforms/und\-t\-und\-latn\-d0\-ascii.txt (*changed name*) + +**Removals:** + +*Files with insufficient data:* + +- /common/testData/personNameTest/br.txt, brx.txt, gaa.txt, ks\_Deva.txt, lij.txt, pcm.txt, sat.txt, syr.txt, to.txt, tt.txt, xh.txt + +*Old format keyboards were removed (see Migration):* + +- /keyboards/ + +### JSON Data Changes + +- **Available at: https://github.com/unicode-org/cldr-json/releases/tag/44.0.0** +- **Note** that the version number in npm is "44\.0\.1" instead of "44\.0\.0".  The version + +### Keyboard Changes + +**Keyboard** has a new DTD (keyboard3\.dtd and the \ element). This is a complete rewrite of the specification by the Keyboard Subcommittee, and is available as a technical preview in CLDR version 44\. See [TR35 Part 7: Keyboards](https://www.unicode.org/reports/tr35/tr35-keyboards.html). The prior DTDs are included in CLDR but are not used by CLDR data or tooling. **Note**: prior keyboard data files are not compatible, were not maintained and have also been removed. + +Note that there are additional sample keyboard data files in progress which were not complete for v44, but may be consulted as samples: + +- Bengali, Assamese Phonetic Keyboard (PR \#[3368](https://github.com/unicode-org/cldr/pull/3368)) +- French AZERTY optimisé (PR \#[3220](https://github.com/unicode-org/cldr/pull/3220)) + +See the *Known Issues* section for additional known issues. + +## Specification Changes + +Please see [Modifications section](https://www.unicode.org/reports/tr35/tr35-70/tr35.html#Modifications) in the draft spec for the list of current changes. + +A diff of the changes since CLDR 43 can be viewed [here in GitHub](https://github.com/unicode-org/cldr/pull/3317/files), which was last updated on 6 October 2023\. Clicking on the rich\-diff icon for a page ( 📄 ) will often show the differences with a rich diff, such as the following: + +![image](../../images/index/APIIntegration.png) + +## Growth + +The following chart shows the growth of CLDR locale\-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non\-locale\-specific data; nor does it include corrections (which typically outnumber new items). The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years. + +There were generally a relatively small number of additions this cycle; the focus was improvements in quality, and changes will not show up below. + +![image](../../images/index/growth44.png) + +## Migration + +- **Unit systems** provide information about general usage of units of measure. For example, "knot" is in the customary US and UK systems, but is also acceptable for use with SI. + - Implementations using the unit systems will find that some units have changed systems (either to be finer\-grained, or to incorporate corrections. +- **LikelySubtags** are used to find the most likely missing subtags in a locale identifier, and also the minimal form. Thus "de" (German) expands to "de\-Latn\-DE" (German written in Latin script as used in Germany), and all of ("de\-Latn\-DE", "de\-DE", "de\-DE") minimize to "de". + - The algorithm for lookup has changed slightly (favoring script over region), and there have been data changes: most macroregions are gone (such as mapping from und\-003\) and some other und mappings. There remain some xx\-YYY\-001 results for artificial languages. +- **Preferred hour formats** indicate the preferred form for a locale: 11 PM vs 23:00 vs 11 in the evening. + - Have changed substantially for many Latin American countries +- **Keyboard** has a new DTD (keyboard3\.dtd and the \ element). See the “Keyboard Changes” section. +- **PersonNames**: In the process of moving out of Tech Preview, there were structure additions but also changes: + - The nameField type prefix was replaced with title, and the nameField type suffix was replaced with two new types generation and credentials. + - The sampleName types givenOnly, givenSurnameOnly, given12Surname, full were replaced with new types separating samples for names in the locale from samples for foreign names: nativeG, nativeGS, nativeGGS, nativeFull, foreignG, foreignGS, foreignGGS, foreignFull +- **Redundant values that inherit “sideways” may be removed in production data**: Some data values inherit “sideways” from another element with the same parent, in the same locale. For example, consider the following items in the en locale, some added in CLDR 44 to provide clients a way to explicitly select a particular variant across locales (instead of using the default):
+\British Indian Ocean Territory\ \
+\British Indian Ocean Territory\ \
+\Chagos Archipelago\ \ +Both alt forms inherit sideways from the non\-alt form. Thus in this case, the "biot" variant is redundant and will be removed in production data. Clients that are trying to select the "biot" variant but find it missing should fall back to the non\-alt form. +Similar behavior occurs with plural forms for units, where some plural forms may match and thus fall back to the "other" form. +- *Since the last release, Unicode updated its outbound license from the "[Unicode, Inc. License \- Data Files and Software](https://opensource.org/license/unicode-inc-license-agreement-data-files-and-software)" to the "[Unicode License v3](https://opensource.org/license/unicode-license-v3)". All of the substantive terms of the license remain the same. The only changes made were non\-substantive technical edits. The new license is OSI\-approved and has been assigned the SPDX Identifier Unicode\-3\.0\.* + +## Known Issues + +- The region\-based firstDay value (see [weekData](https://www.unicode.org/reports/tr35/tr35-70/tr35-dates.html#Week_Data)) is currently used for several different purposes: + - The day that should be shown as the first day of the week in a calendar view. + - The first day of the week (day 1\) for weekday numbering. + - The first day of the week for week\-of\-year calendar calculations. + +These are not always the same. In the future, some of these functions will be separated out; see [CLDR\-17095](https://unicode-org.atlassian.net/browse/CLDR-17095). +- The test data file likelySubtags.txt has an error for input "qaa\-Cyrl\-CH"; the result should not be empty string as shown, it should either be FAIL or the input string (pending spec clarification). See [CLDR\-17150](https://unicode-org.atlassian.net/browse/CLDR-17150). +- The spec for \-u\-dx bcp47 subtag syntax requires further clarification. See [CLDR\-17194](https://unicode-org.atlassian.net/browse/CLDR-17194) . This is fixed in [version 44\.1](https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx). +- Subdivision translations were only updated on a limited basis. +- Use 44\.0\.1 for CLDR 44 JSON NPM since 44\.0\.0 was tagged incorrectly. +- unicodeVersion in ldmlSupplemental.dtd [was not updated to 15\.1 See CLDR\-17225](https://unicode-org.atlassian.net/browse/CLDR-17225). This is fixed in [version 44\.1](https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx). +- Missing derived emoji annotations [CLDR\-17230](https://unicode-org.atlassian.net/browse/CLDR-17230).  This is fixed in [version 44\.1](https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx). +- There was an error in the Keyboard3 DTD in the \ element. It is corrected in [version 44\.1](https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx), see [CLDR\-17204](https://unicode-org.atlassian.net/browse/CLDR-17204) +- The keyboard charts were not able to generate properly due to DTD changes. It is corrected in [version 44\.1](https://cldr.unicode.org/index/downloads/cldr-44#h.nvqx283jwsx). (This fixed code was used to generate the charts for version 44\.) [CLDR\-17205](https://unicode-org.atlassian.net/browse/CLDR-17205) + +## Acknowledgments + +Many people have made significant contributions to CLDR and LDML; see the [Acknowledgments](https://cldr.unicode.org/index/acknowledgments) page for a full listing. + +The Unicode [Terms of Use](https://unicode.org/copyright.html) apply to CLDR data; in particular, see [Exhibit 1](https://unicode.org/copyright.html#Exhibit1). + +For web pages with different views of CLDR data, see . + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/translation/translation-guide-general/capitalization.md b/docs/site/translation/translation-guide-general/capitalization.md new file mode 100644 index 00000000000..965777ded99 --- /dev/null +++ b/docs/site/translation/translation-guide-general/capitalization.md @@ -0,0 +1,32 @@ +--- +title: Capitalization +--- + +# Capitalization + +Beginning with CLDR 22, the guidance is that names of items such as languages, regions, calendar and collation types, as well as names of months and weekdays in calendar data and the names of calendar fields, should be capitalized as appropriate for the middle of body text (except possibly for narrow forms, see note below). + +Regarding the capitalization of months and weekdays, please apply middle\-of\-sentence capitalization rules even on stand\-alone items. + +**In your language, if month and day names are generally lower case in the middle of the sentence, then please apply this same rule (lower case) to both formatting and standalone values.** + +In your language, if month and day names are generally upper case in the middle of the sentence, then please apply the same rule (upper case) to the standalone values. + +The primary reason for having both format and stand\-alone forms is to handle any necessary grammatical distinctions (rather than capitalization distinctions). + +- Stand\-alone month names are intended to be used without a day\-of\-month number +- Format month names are intended to be used with a day\-of\-month number. + +In many languages, that means that the stand\-alone month names should be in nominative form, while the format month names should be in genitive or a related form. + +In this case, date formats will also reflect that, using the format form MMMM in a format such as “d MMMM y”, and the stand\-alone form LLLL in a format such as “LLLL y”. + +**Note:** Narrow forms for items such as month and day names are typically too short to reflect differences between grammatical forms. For capitalization purposes, format narrow names should be capitalized according to the normal conventions for their use in running text, and stand\-alone narrow names should be capitalized according to conventions for stand\-alone use. + +The new \ element now indicates how to change the capitalization for use in a menu, or for stand\-alone use such as in the title of a calendar page (the \ data cannot currently be edited in the Survey Tool; please file a bug for any necessary changes). + +However, it is also important to ensure that there is consistent casing for all of the items in a section, so before making any changes, be sure to get agreement among all the translators for your language—otherwise the capitalization of items in a section may appear random. + +To provide warnings when the capitalization of an item differs from what is intended for items in a given category, the Survey Tool now checks capitalization of items against the \ within the \ element; data for this comes from xml files in the CLDR common/casing/ directory. This data cannot be changed using the Survey Tool; if it is incorrect, please file a bug (initial data was created based on the predominant capitalization of items in each category within a locale, and may be wrong). + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/translation/translation-guide-general/default-content.md b/docs/site/translation/translation-guide-general/default-content.md new file mode 100644 index 00000000000..07ac69320c0 --- /dev/null +++ b/docs/site/translation/translation-guide-general/default-content.md @@ -0,0 +1,38 @@ +--- +title: Default Content +--- + +# Default Content + +Locales are primarily identified by their ***base*** language. For example, English \[en], Arabic \[ar] or German \[de]. + +We also label scripts explicitly, where a language is typically written in multiple scripts, such as Cyrillic or Latin. For example, Serbian (Cyrillic) \[sr\_Cyrl] and Serbian (Latin) \[sr\_Latn]. + +Each language \+ script combination is treated as a unit. (i.e. People do not mix different script in the same data set.) + +If a language is ***not*** typically written in multiple scripts, then the script sub\-tag is omitted. For example, en\_US or ko\_KR. + +Locales may also have regional variants. For example, English (US) \[en\_US] vs English (UK) \[en\_GB], or Serbian (Cyrillic, Montenegro) \[sr\_Cyrl\_ME] vs Serbian (Cyrillic, Serbia) \[sr\_Cyrl\_RS]. Regions may be countries such as China \[CN], parts of countries such as Hong Kong \[HK] or multi\-country regions such as Latin America \[419]. Also see [Regional Variants](http://cldr.unicode.org/translation/getting-started/guide#TOC-Regional-Variants-also-known-as-Sub-locales-). + +The contents for the base language should be as widely usable (neutral) as possible, but **must be** usable without modification for its *default content locale;* this is the locale for the language’s *default region,* which is typically the region with the most speakers of the language. A default content locale has no data other than identity information, it inherits all data from its parent. + +For example: + +- American English \[en\_US] is the default content locale for English \[en] +- German (Germany) \[de\_DE] is the default content locale for German \[de]. +- Portuguese (Brazil) \[pt\_BR] is the default content locale for Portuguese \[pt] +- Serbian (Cyrillic) \[sr\_Cyrl] is the default content locale for Serbian \[sr], which is the default for Serbian (Cyrillic, Seriba) \[sr\_Cyrl\_RS] . +- Arabic (World) \[ar\_001] is the default content locale for Arabic \[ar], which is for Modern Standard Arabic. + +**Tips for linguists:** + +1. Make sure the base language content is correct; as widely usable (neutral) as possible, but must be usable **without** modification in the default content locale. +2. For example: + - English \[en] locale content must be usable for English (US) + - Arabic \[ar] content must be usable for Arabic (world/neutral). +3. Make sure that where there is a difference in a sub\-region, the differences are represented in the regional\-variant locale. +4. For example: + - Spanish (Mexico) \[es\_MX] differences from Spanish (Latin America) \[es\_419] + - Arabic (Egypt) \[ar\_EG] that are different from Arabic (World) \[ar\_001] + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file