diff --git a/docs/site/downloads/cldr-31.md b/docs/site/downloads/cldr-31.md new file mode 100644 index 00000000000..16b817cf820 --- /dev/null +++ b/docs/site/downloads/cldr-31.md @@ -0,0 +1,272 @@ +--- +title: 'CLDR 31 Download' +--- + + +Unicode CLDR - CLDR 31 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 31 Release Note

Overview

Unicode CLDR 31 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:

  • Canonical codes (See Migration)

    • The subdivision codes have been changed to all have the bcp47 format.

    • The locales in the language-territory population data are in canonical format.

    • The timezone ID for GMT has been split from UTC.

    • There is a mechanism for identifying hybrid locales, such as Hinglish.

  • Emoji 5.0

    • Short names and keywords have been updated for English. (Data for other languages to be gathered in the next cycle).

    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.

    • For Emoji usage, subdivision names for Scotland, Wales, and England have been added for 65 languages.

      • [31.0.1] Added full list of derived names #10126, and fixed some collisions in derived names #10127.

For changes that may affect migration to this version, see Migration.

Other structural additions and changes

  • Codes now use canonical form, as described above.

  • New structure for lenient parsing

  • New structure for minimal pairs (for plurals)

  • New language-matching structure for matching groups of countries

  • The literacyPercent for a region is broken out from writingPercent

  • For DTD changes, see DTD Deltas

For more information, see Spec Modifications.

Other data additions and changes

  • New timezone IDs (long form and bcp47 form).

  • New currency code BYR.

  • Minimal pairs for plural rules.

  • New data for lenient parsing

  • Enhanced Language Matching data (new elements and attributes)

  • Updated Windows keyboards

  • <fields> data fleshed out for era, weekday, dayperiod, and zone, and new <fields> data added for weekOfMonth, dayOfYear, weekdayOfMonth.

  • A pseudo-locale generation tool.

  • A number of additions to exemplar characters, such as for Arabic and Farsi

    • Some improvements to the Zawgyi-to-Unicode transform, and other transforms.

  • Collation data updated for Unihan 9.0 and for Emoji 5.0

  • New unit type "length-point"

    • [31.0.1] Fixed inconsistent names in Czechia #10122, and some negative current subpatterns for compact decimal formatting #10131

    • [31.0.1] Fixed collation charts #10139

For more information, see detailed delta charts.

Growth

The following gives the total overview of the change in data items in CLDR. This release did not have a data-submission cycle, so the changes reflect cleanup and bug fixes.

* The measurement of the number of items is reflects the different ways that the information is represented. A single data field (element or attribute value) may result in multiple data items. For example, plural rules may be shared by multiple languages, and a single data field contains all the languages to which those rules apply. Sometimes a changed item appears as a deletion+addition, and sequences of items (such as sort order) are not counted as different even if the order changes.

For more details, see the Delta Data charts.

JSON data

  • No structural changes for this release, just updated to match XML data.

Survey Tool

  • no changes in the Survey Tool this release

Specification changes

For details, see Spec Modifications.

Migration

  • Code changes

    • The subdivision codes have been changed to all be the bcp47 format, eg "usca" instead of "US-CA". This affects supplemental containment and subdivisions, and translations in subdivisions/en.xml, etc. See Part 6, Sec 2.2 [#9942]

    • The locales in the language-territory population tables have been changed to be the canonical format, dropping the script where it is the default. So "ku_Latn" changes to "ku"

    • The exemplar/ locale data file names have also been changed to be the canonical format, dropping the script where it is the default.

  • Plural rules

    • The Portuguese plural rules have changed so that all (and only) integers and decimal fractions < 2 are singular.

  • Timezones

    • The GMT timezone has been split from the UTC timezone.

    • New timezone bcp47 codes have been added.

  • Language/Region data

    • The new literacyPercent attribute for supplemental <languagePopulation> has been broken out from writingPercent, the latter now only being used to reflect primarily-spoken languages. [#9421]

    • A new format for language matching is provided. To allow time for implementations to change over, the old data is retained, and the new data is marked as "written-new".

    • Languages "hr" and "sr" are no longer a short distance apart, for political reasons.

  • Other

    • The primary names for CZ changed from "Czech Republic" to "Czechia", with the longer name now the alternate.

Known Issues

“Week of” structure

The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

<dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>

<dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].

Non-unique emoji short names (fixed in 31.0.1)

Some of the emoji names are not unique. Fixes are being gathered, but are not in time for the release. See [#10116], [#10127]

Chinese stroke collation

Since CLDR 30, Chinese stroke collation has been missing entries for several basic characters. CLDR 32 reverts the stroke collation data to the CLDR 29 version; a complete fix for the underlying problem is targeted for CLDR 33. See #10497, #10642.

Others

See tickets for v31.0.1.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key

    • The Release Note contains a general description of the contents of the release, and any relevant notes about the release.

    • The Data link points to a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization).

    • The Spec is the version of UTS #35: LDML that corresponds to the release.

    • The Delta document points to a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs.

    • The SVN Tag can be used to get the files via Repository Access.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-32.md b/docs/site/downloads/cldr-32.md new file mode 100644 index 00000000000..931942c2a60 --- /dev/null +++ b/docs/site/downloads/cldr-32.md @@ -0,0 +1,198 @@ +--- +title: 'CLDR 32 Download' +--- + + +Unicode CLDR - CLDR 32 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 32 Release Note

Overview

Unicode CLDR 32 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Improvements in this release include:

  • Major contributions of main locale data for Chakma (ccp), Sindhi (sd), Odia (or), Kabyle (kab), Pashto (ps), Turkmen (tk), Norwegian Nynorsk (nn), Assamese (as), and others. See Growth.

      • Inclusion of four locales in common (from seed): Wolof, Tatar, Tajik, Chakma

  • Major additions for Emoji

      • Emoji names and keywords updates for Unicode 10.0 (Emoji 5)

      • Emoji keywords now in UCA order for consistency.

      • English name and keywords updates as per Emoji Subcommittee

      • Emoji collation update: emoji are now sorted between regular symbols and currency symbols. (Previously in v31, emoji were after all other characters.)

  • Import of draft subdivision names and language groups from wikidata. (See Known issues section blow)

    • Rule-based number formats for Indian English, Akan, Hindi (oblique), Cherokee; revisions to some others.

    • New numeric exemplars. For example, in zh: [\- , . % ‰ + 0 1 2 3 4 5 6 7 8 9 〇 一 七 三 九 二 五 八 六 四]

    • New “disjunctive” list style (eg “a, b, or c”)

  • New availableFormats items for day periods (skeleton “Bhm→ pattern “h:mm B” → “1:30 in the afternoon”)

    • Many fixes and small additions to certain preexisting data: day periods, date/time formats, Chinese collation / transliteration, transforms

    • Chinese stroke collation was reverted to the data from CLDR 29. See Migration.

For information on structural changes, see Spec Modifications.

Charts

The charts have been updated for the v32 data, and there are two new charts:

Survey Tool

    • The Moderate level has been changed to align with content language requirements.

    • A new Survey Tool Ref site is avaialbe for use as v32 release data reference: http://cldr-ref.unicode.org/cldr-apps/

For changes that may affect migration to this version, see Migration.

Other data additions and changes

The following summarizes some of the other changes in non-locale data.

  • charts/32/delta/bcp47.html

    • Added CNH currency, Masaram Gondi numbering system (gonm).

  • charts/32/delta/supplemental-data.html

    • Added currency CNH

    • Added currency changes from STD to STN, and PHP based on iso-4217 amendment.

    • Addition of some language codes, 202 macroregion, scripts, variants

    • Changes to WZoneMapping mapping

    • Some additional transforms.

    • For language distance/matching, en-GB is now the best choice from the GB cluster. Eg, en-SA is closer to en-GB instead of enON

    • Various updates / additions of language/territory data, GDP data

    • Language Groups added

    • Addition of plural or ordinal rules for for io, sd, or, ps, sd, tk. pt-PT now behaves differently.

    • Added plural ranges for ak, as, io, or, ps, sd, tk.

    • Added containment for 202

    • Added explicit currency info for CNH, DKK, NOK, SEK

    • Changed week data (min days, first day, preferred hours) for RU, NZ, GL

    • Added day periods for ccp, cy

  • charts/32/delta/transforms.html

    • Transform additions / fixes for blt→blt_FONIPA, cy→cy_FONIPA, de→ASCII, Hani→Latn, ...

    • [32.0.1] Moved several BGN transforms from status “provisional” to status “contributed”. [#10728]

For more information, see detailed delta charts.

Growth

The following gives the total overview of the change in data items in CLDR. Most of the increase in data was from the addition of new locales, more emoji names and keywords across many locales, and the import of draft wikidata subdivision names. The following table shows the increase in total CLDR data items (including locale-based and non-locale-based) compared to the last release.

* The measurement of the number of items is reflects the different ways that the information is represented. A single data field (element or attribute value) may result in multiple data items. For example, plural rules may be shared by multiple languages, and a single data field contains all the languages to which those rules apply. Sometimes a changed item appears as a deletion+addition, and sequences of items (such as sort order) are not counted as different even if the order changes.

The following chart shows the increase in locale-based data over time.

For more details, see the Delta Data charts.

There is a new chart that shows the current coverage levels for CLDR locales. The locales that are not as complete are marked 'seed', and available in a separate CLDR source directory.

Migration

  • Plural rules

      • The plural rules for pt_PT changed to be different than pt (=pt_BR). The "one" case is now only the integer 1.

  • Timezones

    • Persian (fa) localized GMT hour pattern contains bidi control character LRM before signs.

  • Currencies

    • The new code for STN (SAO TOME AND PRINCIPE) has been released, and will be valid as of 2018-09-01. It is included in the release with that effective date. However, it was too late to provide names for the locales.

  • Language/Region data

    • The UN code 202 (Sub-Saharan Africa) was added late in the process, and doesn't have names (except in English).

  • Other

    • Chakma is the first CLDR locale that uses completely supplemental (non-BMP) characters, which may expose some bugs in implementations.

    • Chinese stroke collation was reverted to the data from CLDR 29 as a short-term fix for problems introduced in CLDR 30 that resulted in missing entries for several basic characters. A complete fix for the underlying problem is targeted for CLDR 33. See #10497, #10642.

Known Issues

    1. New macroregions

      1. The UN code 202 (Sub-Saharan Africa) was added late in the process, and doesn't have names (except in English).

      2. The UN is now including Sark (680) which didn't get into the release.

  1. “Week of” structure

    1. The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

    2. <dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>

    3. <dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

    4. Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].

  2. Subdivision Names

    1. The draft subdivision names were imported from wikidata. Names that had characters outside of the language's exemplars were excluded for now. Names that would cause collisions were allowed, but marked with superscripted numbers. The goal is to clean up these names over time.

  1. German AM/PM [reverted in CLDR 32.0.1]

    1. In CLDR 32, the German AM/PM symbols were changed from “vorm.”/“nachm.” to “AM”/“PM”. This was reverted in CLDR 32.0.1 [#10735] but will be reconsidered in a future version of CLDR [#10789].

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-33-1.md b/docs/site/downloads/cldr-33-1.md new file mode 100644 index 00000000000..aac1316d0a8 --- /dev/null +++ b/docs/site/downloads/cldr-33-1.md @@ -0,0 +1,151 @@ +--- +title: 'CLDR 33-1 Download' +--- + + +Unicode CLDR - CLDR 33.1

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33.1

Overview

Unicode CLDR 33.1 is an update to CLDR 33 that focuses on Unicode 11.0 support. Improvements in this release include:

  • Data

      • Updates to Unicode 11.0

      • Adds annotations (names and keywords) for Unicode 11.0 emoji, and makes improvements to previously-existing annotations.

      • Updates Chinese collation stroke order from Unicode 7.0 to Unicode 11.0, after tooling bug fixes

  • Structure*

      • No changes. The DTD Δs and DTD Diffs links above point to v33.

  • Specification*

      • There is no LDML 33.1 document. Instead, only amendments to v33 are provided, as described below in Specification Amendments.

  • Charts*

For more details, see the list of bug fixes.

Specification Amendments

There is not a new version of the LDML spec. Instead, the following are amendments to LDML33. The changed text is indicated below by green highlighting

14.1 Synthesizing Sequence Names

  1. If sequence is an emoji flag sequence, look up the territory name in CLDR for the corresponding ASCII characters. Set suffixName to that, and prefixName to the characterLabel for "flag", and go to step 10.

    • For example, "🇵🇫" has the regional indicator symbols PF and would map to “Flagge: Französisch-Polynesien” in German.

  2. If sequence is an emoji tag sequence, look up the subdivision name in CLDR for the corresponding ASCII characters and compose as for emoji flag sequence.

    • For example, "🏴󠁧󠁢󠁳󠁣󠁴󠁿" has TAG characters gbsct and would map to “Flagge: Schottland” in German.

  3. If sequence is a keycap sequence or 🔟, use the characterLabel for "keycap" as the prefixName and set the suffix to be the ASCII characters in the sequence (or "10" in the case of 🔟), then go to step 8.

    • For example, "#⃣" would map to "Taste: #" in German.

  4. If sequence contains any emoji modifiers or hair components, move them (in order) into suffix, removing them from sequence.

    • For example, "👨🏿‍🦰" would map to "Mann: dunkle Hautfarbe, rotes Haar".

  5. Transform sequence and append to prefixName, by successively getting names for the longest subsequences, skipping any singleton ZWJ characters. If there is more than one name, use the listPatterns for "unit-short" to link them. This uses the patterns for "2", "start", "middle", and "end".

The /annotationsDerived/ folder has the available composed names, pre-built.

Migration

  • Updates German AM/PM strings to follow the English, to meet most common expectations of users of 12hr formats.

Known Issues

  1. Some of the main CLDR locales are missing a few Unicode 11.0 annotations (should be fixed in v34): #11193

    1. The segmentation rules have not been updated for changes in Unicode 11.0. This does not affect ICU, since the rules there were changed manually. Implementers may wish to patch their v33.1 versions with the data in #11203 if they use the segmentation rules independent of ICU. The changes include simplifying the break rules for Emoji and not breaking within strings of white space.

    2. The ICU4J libraries included in v33.1 were not updated to ICU 62.0. There are no known problems with using ICU 61.0, but implementers may want to update their copies to ICU 62.0.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. The contributors to v33.1 will appear on the page later, when v34 is released.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-33.md b/docs/site/downloads/cldr-33.md new file mode 100644 index 00000000000..df443a4bd67 --- /dev/null +++ b/docs/site/downloads/cldr-33.md @@ -0,0 +1,183 @@ +--- +title: 'CLDR 33 Download' +--- + + +Unicode CLDR - CLDR 33 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33 Release Note

Overview

Unicode CLDR 33 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

This release had a limited submission phase. The focus was on improvements to emoji keywords and to the Odia and Assamese locales, addition of typographic names data, and improvements to the structure for specifying keyboard layouts.

Improvements in this release include:

  • Structure

      • New structure for typographicNames translations (such as terms for Bold, Italic, ...), with data for 33 locales.

      • The structure for specifying keyboard layouts was significantly enhanced, with many new elements and attributes, and expanded syntax for some preëxisting attribute values. See spec for details: Keyboards.

  • Additional Translations/Data

      • Annotations (emoji keywords) for a limited set of locales had a full review (ar, en_GB, de, es, ja, ru).

      • Two additional locales (Odia, Assamese) were brought up to Modern coverage level; some missing items were added in other locales.

    • New typographicNames data added, with translations in 33 locales.

      • Added 4 new transforms: fa-fa_FONIPA, ha-ha_NE, nv-nv_FONIPA, vec-vec_FONIPA.

      • Added number spellout (RBNF) rules for sw (Swahili), ff (Fulfulde/Fula), qu (Quechua), lb (Luxembourgish), ccp (Chakma), su (Sundanese).

  • Property files

      • The emoji property data file ExtendedPictographic.txt has been removed from CLDR data, since the contents are now part of the UTS #51 “Unicode Emoji” data file: emoji-data.txt.

      • labels.txt was added for emoji categories and subcategories.

  • Code Updates

      • Addition of new currency code MRU for Mauritania; replaces MRO.

      • Updating of currency display names and narrow symbol for São Tomé & Príncipe Dobra (use standard names for STN, names showing older year range for STD).

    • Subdivisions (including all new codes for China).

    • Update timezone mappings for tzdata 2018c.

  • Bug fixes

For information on structural changes, see Spec Modifications.

For changes that may affect migration to this version, see Migration.

Charts

The charts have been updated for the v33 data. The Delta Data will show a number of changes in annotations that are due to the elimination of redundant keywords: see Growth.

There will also be new tab-separated-value files for loading the information into spreadsheets rather than trying to scrape the charts that will be added to CLDR33. Currently this is only for a subset of the charts.

    1. by_type.tsv

    2. delta.tsv — locales w/ inheritance

    3. delta_supp.tsv — supplemental data (eg non locale)

    4. delta_summary.tsv — stats on #2 & #3

Survey Tool

  • When collecting data for emoji names and annotations, the Survey Tool now has the capability to display its own images for emoji that may not yet be displayable on the user’s system.

Other data additions and changes

Some of the fixes and additions include:

  • Locale data:

      • Added English name for sr_ME, “Montenegrin”.

      • The cardinal (plural) rules for Macedonian (mk) have been changed so that one➞other for {11}.

      • New seed locale for scn (Sicilian), with plural rules.

      • Added exemplar characters for ha_NE (distinct from ha), nv (Navajo), cho (Choctaw).

  • Supplemental data:

      • Adjusted the territory containment data for some regions near the South Pole, following changes in UN M49, so several of these now have new containing regions.

      • Updated the <territoryInfo> GDP data for various regions.

For more information these and other bug fixes, see detailed delta charts and the list of bug fixes.

Growth

Because v33 was not a data submission release, the chart for growth differs little from that of the CLDR 32 Release Note. Here are the overall statistics:

The following files showed the largest number of raw changes:

  • annotations/as.xml, main/as.xml, annotations/ru.xml, main/br.xml, annotations/or.xml, annotations/br.xml, annotations/ga.xml

Two changes affected the statistics:

  • The keywords (in annotations) are being treated as sets for counting purposes.

    • So old:{a | b | c} → new:{a | c | d | e} counts as one deletion and 2 additions.

  • The keywords have also had some redundancies removed: if a keyword consisted entirely of other keywords, it was removed.

    • So old:{a, a b, b} → new:{a, b}.

Migration

  • Plurals: ordinal and cardinal rules have been added for scn. The cardinal (plural) rules for Macedonian (mk) have been changed so that one➞other for {11}. Should not cause migration issues.

    • The emoji property data file ExtendedPictographic.txt has been removed from CLDR data, since the contents are now part of the UTS #51 “Unicode Emoji” data file: emoji-data.txt.

    • Adjusted the territory containment data for some regions near the South Pole, following changes in UN M49, so several of these now have new containing regions.

Known Issues

    1. New macroregions

    • UN M.49 now includes Sark (680) but ISO rejected the proposed ISO 3166-1 code, so it is not included.

    1. “Week of” structure

      • The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

      • <dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>

      • <dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

      • Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].

  1. Subdivision Names

    • The draft subdivision names were imported from wikidata. Names that had characters outside of the language's exemplars were excluded for now. Names that would cause collisions were allowed, but marked with superscripted numbers. The goal is to clean up these names over time.

    1. Chinese stroke collation

      • In CLDR 30 and 31, Chinese stroke collation was missing entries for several basic characters. CLDR 32 reverted the stroke collation data to the CLDR 29 version; a complete fix for the underlying problem is targeted for CLDR 34. See #10497, #10642.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-34.md b/docs/site/downloads/cldr-34.md new file mode 100644 index 00000000000..efa7e2db76c --- /dev/null +++ b/docs/site/downloads/cldr-34.md @@ -0,0 +1,310 @@ +--- +title: 'CLDR 34 Download' +--- + + +Unicode CLDR - CLDR 34 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 34 Release Note

Overview

Unicode CLDR 34 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 34 included a full Survey Tool data collection phase, adding approximately 6M of data overall, resulting in the following language support:

🆕 is for languages reaching the level in this release. Tongan (to), Konkani (kok), Dzongkha (dz), Tatar (tt) were already in ICU, while Sindhi (sd), Maori (mi), Turkmen (tk), Javanese (jv), Interlingua (ia), Kurdish (ku), Xhosa (xh) are being included for the first time in the upcoming ICU 63. The above counts are just for the languages (with multiple entries for multi-script languages such as Serbian or Chinese) — there are many additional regional locales. 

Other notable changes include:

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes, Growth.

Detailed Specification Changes

For detailed specification changes, see LDML34 Modifications.

Detailed Structure Changes

Detailed Data Changes

In addition, the following changes were made. This is not complete: for a full list see the list of bug fixes

Growth

The following summarizes the number of changes (additions + corrections) for languages in the release.

The following shows languages with a larger relative number of changes. For the first line, there are over 20% additions alone, not counting corrections.

TBD: add chart

Migration

Known Issues

(These may addressed in a maintenance update)

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-35.md b/docs/site/downloads/cldr-35.md new file mode 100644 index 00000000000..bc2f0d06b59 --- /dev/null +++ b/docs/site/downloads/cldr-35.md @@ -0,0 +1,456 @@ +--- +title: 'CLDR 35 Download' +--- + + +Unicode CLDR - CLDR 35 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 35 Release Note

Overview

Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.

Detailed Specification Changes

Aside from documenting additional structure, there have been important modifications to the following areas of LDML:

Part 1: Core

Part 2: General

Part 4: Dates

For more detailed specification changes, see LDML35 Modifications.

Detailed Structure Changes

No DTD changes, except for the following:

XML metadata

DTDs now have enhanced syntax for valid attribute values

Detailed Data Changes

In addition, the following changes were made. This is not complete: for a full list see the list of bug fixes.

Growth

The following chart shows the growth of CLDR data over time. It counts the number of data items in /main and /annotations directories, keyed by locale.

The chart does not include data in the /annotationsDerived, /bcp47, /casing, /collation, /dtd, /keyboards, /properties, /rbnf, /segments, /subdivisions, /supplemental, /transforms, /uca, and /validity directories, which is roughly twice as much appears in the above chart.

The chart includes the latest release for each year. The latest data for 2019 will only be available in October; v35.0 just had a limited Survey Tool data collection phase as described in the Overview.

Migration

  1. Plural changes (unlikely to cause migration problems).

    1. Marathi (mr) changed the category for 0 to other.

    2. Cornish (kw) added 3 categories and changed many assignments.

  2. Hindi (hi) changed to English AM/PM strings from translations.

  3. The mapping for deprecated language code “mo” has changed from “ro_MD” to just “ro”.

V35.1

The v35.1 dot-release is focused on the new Japanese era. It includes the following tickets:

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-36.md b/docs/site/downloads/cldr-36.md new file mode 100644 index 00000000000..ba4380c3792 --- /dev/null +++ b/docs/site/downloads/cldr-36.md @@ -0,0 +1,158 @@ +--- +title: 'CLDR 36 Download' +--- + + +Unicode CLDR - CLDR 36 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 36 Release Note

Overview

Unicode CLDR 36 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

A main focus this release was on infrastructure. We moved to GitHub for source code control and Jira for bug tracking, and made significant improvements in performance of the online data-gathering Survey Tool. We are also now keeping more information in the master data, reflecting the votes in the Survey Tool for the “inherited” items (see Migration).

  • Approximately 32K items added

      • Significant increase (approx 50% or more) in moderate and/or modern coverage for: ceb (Cebuano), ha (Hausa / Latin script), ig (Igbo), kok (Konkani), qu (Quechua), to (Tongan), yo (Yoruba). Additionally, the following locales had at least a 15% increase in basic coverage: az (Azerbaijani / Latin script), so (Somali / Latin script).

    • Seed data for new locales, including three native languages of N. America: cic (Chickasaw), mus (Muscogee), osa (Osage, Osage script); an (Aragonese), su (Sundanese, Latin script), szl (Silesian).

      • Additional data for new items listed below.

  • Emoji

      • Added names and keywords for Emoji 13.0 draft candidates; these are to be fleshed out further in v36.1.

      • Refined names and keywords for Emoji 12.0, including for English.

    • Measurement units:

      • Additional compoundUnitPattern ({0}⋅{1} in root) for expressing units like newton-meter (N⋅m)

      • Additional units: dot-per-centimeter, dot-per-inch, em, megapixel, pixel, pixel-per-centimeter, pixel-per-inch; decade; therm-us; bar, pascal

  • Locale identifiers and names

    • Extended Language Matching to have fallbacks for many encompassed languages. [CLDR-13244]

    • Added more languageAliases from the BCP47 language subtag registry, for deprecated languages.

    • New alt=“menu” names for certain languages, intended to provide better sorting in menus. [CLDR-11834]

      • Updated validity and collection information for geographic subregions; updated names especially for subregions of UK and Sweden.

      • Names have been added for “pseudo-regions” XA (Pseudo-Accents) and XB (Pseudo-Bidi). These are only intended for testing purposes, you may need to add special handling to remove them for production purposes. [CLDR-13100]

    • Other

    • Additions for testing:

    • Added new test directory /common/testData/, with test data for:

      • localeIdentifiers,

      • graphemeClusters (currently supported Indic languages)

      • transforms (transliterations)

      • For test purposes, added names for “pseudo-regions” XA and XB as noted above.

  • Infrastructure

      • Moved to GitHub for source code control and Jira for bug tracking (See CLDR Change Requests for new information). Queries using Trac no longer work.

      • Data in cldr repository keeps record of votes for inherited (↑↑↑) [CLDR-11989]. A new tool GenerateProductionData is used to resolve the inheritance markers and provide appropriate minimization.

      • A new cldr-staging repository contains data that has been processed with GenerateProductionData.

    • Added new API and tooling to support conversion to other formats (ICU in particular)

    • Performance improvements in Survey Tool

For details, see Detailed Specification Changes, Detailed Structure Changes, and Detailed Data Changes.

Detailed Specification Changes

For this version, the primary change in the LDML specification was to document changes in the emoji derived name algorithm.

For more detailed specification changes, see LDML36 Modifications.

Detailed Structure Changes

This version had only one structural addition and one new “alt” value for language names:

    • A new <compoundUnit type="times"> provides the pattern for combining units in a multiplicative relationship such as Newton-meters,

    • For some language names there is a new alt="menu" form which procides names more suitable for use in menus. For example:

    • <language type="yue" alt="menu">Chinese, Cantonese</language>.

Detailed Data Changes

Important data changes besides those listed in the Overview:

    • Transforms

      • Hiragana-Katakana no longer modifies the spacing dakuten U+309B, U+309C. [CLDR-13127]

      • Latin-ASCII enhanced for Latin extended C and D and some symbols. [CLDR-11383]

    • Miscellaneous

    • zh: The currency symbol for CNY changed from fullwidth ¥(FFE5) to halfwidth ¥ (00A5)

      • fr_CA : Switched to full year (not 2-digit year) in short date formats. [CLDR-11666]

      • bg: Removed “ч.” from time formats. [CLDR-11545]

      • The translations for the new name 'North Macedonia' has been refined for many languages by contributors, and those languages with no contributors have been reverted to code 'MK'. All Alt values also have been removed [CLDR-13099].

For more details see the list of bug fixes: Δ36.

Migration

    1. CLDR has moved to GitHub for source code control and Jira for bug tracking. Queries using Trac no longer work.

  1. The data in main cldr git repository includes element values with “↑↑↑”.

    1. Such values indicate that translators explicitly determined that the parent value is always valid.

    2. These values (and the paths they belong to) are removed from the release data, but tools that explicitly access the repository information directly need to remove them.

    3. A new tool in CLDR, GenerateProductionData.java, is used to strip the ↑↑↑ and minimize the data. (Those implementations that don't use that tool can remove lines that contain ↑↑↑; they will always be leaf nodes in XML.)

    4. A new repository cldr-staging contains data that has already been processed with GenerateProductionData.java (this is the data that is used for the release).

  2. Some empty files are included in /collation/, where the root data is valid for them, while some empty files are removed from /annotationsDerived and /subdivisions/

  3. Names have been added for “pseudo-regions” XA (Pseudo-Accents) and XB (Pseudo-Bidi). These are only intended for testing purposes, you may need to add special handling to remove them for production purposes. [CLDR-13100]

    1. North Macedonia: The translations for the new name 'North Macedonia' has been refined for many languages by contributors, and those languages with no contributors have been reverted to code 'MK'. All Alt values also have been removed [CLDR-13099].

  1. Other specific data changes to be aware of:

    1. zh: The currency symbol for CNY changed from fullwidth ¥(FFE5) to halfwidth ¥ (00A5)

      1. fr_CA : Switched to full year (not 2-digit year) in short date formats. [CLDR-11666]

      2. bg: Removed “ч.” from time formats. [CLDR-11545]

Known Issues

    1. The Transform charts are temporarily disabled. [CLDR-13308]

CLDR 36.1

This dot release makes incremental additions to version 36 for support of Unicode 13 and ICU 66. (These new, extra Q1 releases are for integration by vendors who could not otherwise release their products with the newest version of Unicode.)

  1. New script codes: Chrs, Diak, Kits, Yezi

  2. New numbering system: diak, segment

  3. Unicode 13.0 updates to root collation, with updated emoji collation (from CLDR 37 alpha data)

  4. Updates to Han-Latin transform and to Chinese pinyin and stroke collations from Unihan 13.0 data

  5. Updates to emoji annotations (from CLDR 37 alpha data)

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key to Header Links

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-37.md b/docs/site/downloads/cldr-37.md new file mode 100644 index 00000000000..81dfa4185bf --- /dev/null +++ b/docs/site/downloads/cldr-37.md @@ -0,0 +1,73 @@ +--- +title: 'CLDR 37 Download' +--- + + +Unicode CLDR - CLDR 37 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 37 Release Note

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v37 focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.

Data Changes

  • Units

    • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.

    • SI Prefixes. SI prefix patterns for "kilo{0}", "mega{0}", etc. have been added, as well as the prefix terms for square and cubic. These are fallbacks for when no combined form is available, so that the name for more unusual units like megagram or square megameter can be formed in different languages.

    • Other additions. A few unit identifiers translations been added, such as duration-century, area-square-kilometer, area-square-meter.

    • See also Migration.

  • Annotations

    • Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added.

    • Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanumerics, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

  • Sorting

    • Emoji 13.0. The collation sequences are updated for new Unicode 13.0 and for Emoji 13.0.

  • Locales

    • New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese

    • New languages at Modern coverage: Nigerian Pidgin

    • See Locale Coverage Data for the coverage per locale, for both new and old locales.

  • Grammatical data

    • Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

  • Misc

    • Updates to code sets. In particular, the EU is updated (removing GB).

    • Alternate versions. In some languages

      • Some additional language names have "menu" style for alphabetizing, such as Kurdish, Central instead of Central Kurdish.

      • There are variants for Cape Verde as equivalent to Cabo Verde.

    • Myanmar-Latin transliteration added

For access to the data, see the GitHub tag above. For more details see the Delta Tickets above.

Specification Changes

The largest changes were the following:

  • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units.

    • For example, a program (or database) could use 1.88 meters internally, but then for person-height have that measurement convert to 6 foot 2 inches for en_US and to 188 centimeters for de_CH.

    • Using the unit display names and list formats, those results can then be displayed according to the desired width (eg 2″ vs 2 in vs 2 inches) and using the locale display names and number formats.

    • The size of the measurement can also be taken into account, so that an infant can have a height as 18 inches, and an adult the height as 6 foot 2 inches.

  • Grammatical features added. Grammatical features are added for many languages.

    • List Patterns. Clarified that more sophisticated processing can be used, and added examples of customized processing for specific languages.

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • New elements are added for enhanced unit preferences, such as the units to use for person-height in different countries. This is an initial phase; additional preferences will be added in the future.

  • Additionally, elements and data are added for unit conversions, so that programmers can supply amounts in one unit and get the right amounts to display for different locales.

  • Grammatical features are added for various languages, as a prelude to allowing programmers to format units according to grammatical context (eg, dative version of 3 kilometers)

  • The augmented constraints have been updated, so that the tests can apply those constraints to all of the CLDR data.

  • Annotations now include non-emoji. Note: emoji are distinguished from other symbols using Unicode properties.

For more information, see the Delta DTDs above.

Chart Changes

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories. The % values are percent of the current measure of Modern coverage. (That level is notched up each release.)

See also the Locale Coverage Data.

Migration

  • Seven unit identifiers with irregular components have been deprecated, and are given alias values to the regular forms. For example, square always comes before the unit, and is square, not squared. The validity data has also been updated to mark the older forms as deprecated.

      • inch-hg ⟹ inch-ofhg

      • liter-per-100kilometers ⟹ liter-per-100-kilometer

      • meter-per-second-squared ⟹ meter-per-square-second

      • millimeter-of-mercury ⟹ millimeter-ofhg

      • part-per-million ⟹ permillion

      • pound-foot ⟹ pound-force-foot

      • pound-per-square-inch ⟹ pound-force-per-square-inch

    • Some of the unit usage parameters were also deprecated, since they didn't differ in practice. (The spec has been updated to have fallback, so if these need to be distinct in the future, they would be of the form media-music or media-music-track.)

      • music-track ⟹ media

      • tv-program ⟹ media

    • The subdivision codes gbeng, gbsct, and gbwls (used for flag emoji) are now deprecated (ISO removed them from its latest data). This can affect implementations testing for validity if they don't also check for 'deprecated' in common/validity/subdivision.xml. Compare the Territory Subdivisions charts for v37 and v36.

Known Issues

  1. The expanded unit preferences are under development. The data is based on what was in CLDR v36, plus some other sources, but will be expanded in the future both to get better thresholds, and cover more cases where locales differ. See the ticket Improve unit structure and data [CLDR-13654]

  2. The Transform charts have been disabled. [CLDR-13308]

  3. The charts show spurious changes for gbeng, etc. That's because the file locations changed across releases.

  4. The JSON-format data for CLDR 37 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Special thanks to the contributors to Nigerian Pidgin; one of the very few locales to go from zero to Modern coverage in one submission cycle!

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-38.md b/docs/site/downloads/cldr-38.md new file mode 100644 index 00000000000..fb9da3dfe84 --- /dev/null +++ b/docs/site/downloads/cldr-38.md @@ -0,0 +1,98 @@ +--- +title: 'CLDR 38 Download' +--- + + +Unicode CLDR - CLDR 38 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 38 Release Note

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The units of measurement additions allow for support of APIs for simple unitIDs such as meter up to compound unitIDs such as cubic-meter-per-square-second or acre-feet-per-day, such as the following:

getUnitPattern(unitId, locale, width, pluralCategory, caseVariant) — to get the localized, inflected pattern for a simple or compound unit of measurement, appropriate for a position in a sentence or phrase with the appropriate pluralCategory and grammatical case (nominative, accusative, genitive, etc).

getUnitGender(unitId, locale) — to get the gender for a unit of measurement, so that other parts of a sentence or phrase can be modified to agree with that gender.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:

Added

155,131

Deleted

33,805

Changed

45,895

Data Changes

The following summarizes the changes to the data for this version of CLDR.

  • 13.1 Emoji and Unicode Symbols

      • Added names & search keywords for Emoji 13.1 and enhancements to existing emoji annotation data.

      • Added approximately 400 non-emoji Unicode symbols such as punctuation and currency symbols.

      • Added 2 character labels: superscript {0} and subscript {0}.

      • Aside from the CLDR target locales, emoji annotations and keywords expanded in Hausa (ha), Igbo (ig), Kalaallisut (kl), Luxembourgish (lb), Maori (mi), Manipuri (mni), Maltese (mt), Punjabi [Arabic] (pa_Arab), Kinyarwanda (rw), Tajik (tg), Tigrinya (ti), Uyghur (ug), Wolof (wo), Xhosa (xh), Yoruba (yo), with minor expansions in a few other languages.

  • Compact decimals and Units

      • Added 14 new units.

      • Added new binary prefixes.

      • Added new operand 'c' (with a synonym 'e') for languages like French (CLDR-12010)

  • Higher Coverage Levels

      • Modern: Norwegian Nynorsk

      • Moderate++: Hausa, Igbo, Breton, Quechua, Yoruba — made significant improvements, but didn't make it quite to Modern

      • Moderate: Fulah (Adlam), Chakma, Asturian

      • Basic+: Wolof, Tajik, Maori, Luxembourgish, Uyghur, Tigrinya — made significant improvements, but didn't get near to Moderate

      • Basic: Sanskrit, Dogri

  • Unit Inflections

      • Completed phase 1. The full goal is to add full case and gender support for formatted units. During phase 1, a limited number of locales (see below) and units of measurement are being handled, so that we can work kinks out of the process before expanding to all units for all locales (where we can get the grammatical structure).

      • Case & Gender: Polish (pl), Russian (ru), German (de), Hindi (hi) (in rough order of complexity)

      • Gender Only: Dutch (nl), Norwegian Bokmål (nb), Danish (da), Swedish (sv), French (fr), Italian (it), Portuguese (pt), Spanish (es)

  • Performance & Quality

      • Made substantial improvements in Survey Tool performance, lowering cost for translation.

      • Made substantial improvement in quality, using structured Forum topics to allow translators to collaborate more effectively.

      • Improved detection of translator errors.

  • ICU support

      • Improvements to CLDR API, providing a limited, stable API for extracting CLDR data.

      • Adding approximatelySign for number formatting.

  • Unicode locale identifiers and BCP 47

      • Added a new -u locale extension keyword -dx, used to specify scripts to exclude from dictionary break (for word and line break)

      • Added a new short timezone identifier: tz-glgoh

      • Revamped the language, script, region, and variant alias data to improve replacement of deprecated codes.

For access to the draft data, see the git tag above. For more details see the Delta tickets above.

JSON Data Changes

JSON data now includes data for plural ranges, grammatical inflections, typographical labels, and annotations. If you are making use of JSON data, please join the [cldr-users] mailing list where we would like to hear your feedback.

CLDR JSON data for v38 is available, please see https://github.com/unicode-org/cldr-json

Specification Changes

The largest changes were the following:

  • To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for canonicalization. (This was done in concert with fixes to the alias data to work better with the specification.) See Migration and Annex C. LocaleId Canonicalization for more details.

  • To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.

  • For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.

    • For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French. (CLDR-12010)

  • To support inflected units of measurement (phase 1), add specifications for the new elements listed under Structure Changes and an algorithm for how to construct grammatical unit names (simple or compound).

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • Added additional structure for unit inflections

    • New elements:

      • minimalPairs adds new elements caseMinimalPairs and genderMinimalPairs

      • unit adds a new element gender

      • grammaticalData adds new elements grammaticalDerivations, deriveCompound, and deriveComponent

    • New attributes for existing elements:

      • unitPattern adds a new attribute case

      • grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope

      • compoundUnitPattern1 adds new attributes case and gender

      • compoundUnitPattern adds a new attribute case

  • Number symbols adds approximatelySign element

  • Some additional attribute value constraints are added

    • for example, characterLabelPattern@type now allows for superscript and subscript values, indicated by the notation ⟪… strokes⟫➠⟪… strokes, subscript, superscript⟫ in Delta DTDs

    • some of these constraints are expanded due to new structure, while others are

For more details, see the Delta DTDs above.

Chart Changes

  • All charts are updated for the new data; for example, Romance Annotations shows the new non-emoji symbols and punctuation for Romance languages.

  • The DTD Deltas chart has a more compact representation for changes in attribute constraints, making the changes easier to see.

  • The new Grammatical Forms Charts show the new grammatical forms for units.

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories.

The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

See also the Locale Coverage Data v38 and for details of the changes see delta_summary.tsv and locale-growth.tsv

Migration

  • The plural rules for French changed to add a new category, 'many', using the new operand 'c' (with a synonym 'e'). It should only have effect on compact number handling.

    • Important: according to the spec, when there is no message for a plural category, the message for 'other' should be returned. As long as implementations observe this policy, migration to this should work without problems.

  • <languageMatches type="written"> was deprecated some time ago, and now has been removed. Clients should use <languageMatches type="written_new"> (recognizing that there are some syntax changes). CLDR-13245

  • The following locales have been moved in the folder structures. CLDR-14080

    • Seed → Common: Sanskrit (sa)

    • Common → Seed: Church Slavic (cu), Volapük (vo), Prussian (prg)

  • The specification for using aliases is more rigorous, and some alias data has changed. Programs using this data may need modification:

    • The specification processes the rules in a certain order, so the file order needs to be maintained.

    • The specification now explicitly takes multiple passes (though that can be optimized by implementations)

    • Various variantAliases are replaced by languageAliases where they require more context to be properly handed (the former specification did not handle variant aliases correctly).

      • AALAND ⇒ AX is replaced by und_aaland ⇒ und_AX

      • arevmda ⇒ hyw is replaced by two rules: hy_arevmda ⇒ hyw & und_arevmda ⇒ und

    • Some spurious aliases have been removed, where they are not properly aliases but rather partial duplications of more complete information:

      • Those covered by the parent locale data and/or likely subtag data, such as az_AZ ⇒ az_Latn_AZ

      • Those covered by canonicalization of extlang subtags, such as zh_wuu ⇒ wuu

    • Changes to the download files:

      • cldr-tools-*.zip no longer contains a built cldr.jar, use the separate cldr-tools-*.jar instead.

        • And as of v38.1 and later, cldr-tools-*.zip is no longer included at all. You can download or checkout the source tree directly from GitHub.

      • cldr-tools-*.jar is a standalone .jar file containing the CLDR tools and all needed dependencies.

      • There is a new "hashes/" subdirectory which contains GPG signatures and SHA-512 sums.

External Data Version

Known Issues

  1. The Transform charts have been disabled until the generating code could be fixed. [CLDR-11019]

  2. The JSON-format data for CLDR 38 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

  3. Hebrew compact number formatting scrambles text if embedded in RTL message [CLDR-14256]

    1. There are a number of fixes needed in the LDML specification.

    2. CLDR-14272 The documentation of @targets and @scope in grammaticalFeatures is missing; see the ticket for the missing text.

      1. CLDR-14312 replacement in subdivisionAlias in common/supplemental/supplementalMetadata.xml contains alpha{2}

      2. CLDR-14318 Should not remove "true" of tfield in UTS35 Appendix A

      3. CLDR-14319 Remove wrong/duplicated example below "Territory Exception" in UTS35 Appendix A

      4. CLDR-14320 "Put all <keywords, tfields> pairs into alphabetical order" is wrong in Appendix A of UTS35

      5. CLDR-13894 Need to use variantAlias replacement in BCP 47 Language Tag to Unicode BCP 47 Locale Identifier

      6. CLDR-14244 Document special 'alt' inheritance

CLDR 38.1

This dot release makes a very small number of incremental additions to version 38 to address the specific bugs listed in Δ38.1. The data changes are summarized in 38.1/delta/index.html. CLDR v38.1 is also included in ICU 68.2.

Migration note for CLDR 38.1:

    • As of v38.1 and later, cldr-tools-*.zip is no longer included in the download files. You can download or checkout the source tree directly from GitHub.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-39.md b/docs/site/downloads/cldr-39.md new file mode 100644 index 00000000000..969ceebb6d9 --- /dev/null +++ b/docs/site/downloads/cldr-39.md @@ -0,0 +1,77 @@ +--- +title: 'CLDR 39 Download' +--- + + +Unicode CLDR - CLDR 39 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 39 Release Note

See Key to Header Links

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

NOTE: The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — are not yet complete. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content. The link above goes to the directory

CLDR v39 had no submission phase. Instead the focus was on modernizing the Survey Tool software, preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are too difficult to do during a submission cycle, and various other fixes. There was a major change in how Norwegian is handled, in order to align the way that the locale identifiers no, nb, and nn are used. The CLDR github repo is changing the name of “master” branch to “main” branch. The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data. Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

Data Changes

Locale Changes (Sample Link)

There were general changes across all locales:

In addition, a number of other corrections were made on a per-locale basis.

JSON Data Changes

JSON data is available at https://github.com/unicode-org/cldr-json/releases/tag/39.0.0 

It is also available in packages published under the npm version "39.0.0"

Note the following change:

- The npm packages now have individual README and LICENSE files [CLDR-14451]

Please note the following upcoming changes, planned for cldr-json in CLDR v40:

Specification Changes

The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — may not be complete by the release date. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content.

Chart Changes

Growth

The usual growth chart has been omitted, since this release had no data submission phase. For the previous version's chart, see Growth Chart (v38.x)

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. Special thanks to Jan Kučera for his work on the migration to Markdown

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-40.md b/docs/site/downloads/cldr-40.md new file mode 100644 index 00000000000..d9b10503342 --- /dev/null +++ b/docs/site/downloads/cldr-40.md @@ -0,0 +1,76 @@ +--- +title: 'CLDR 40 Download' +--- + + +Unicode CLDR - CLDR 40 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 40 Release Note

See Key to Header Links

Overview

Unicode CLDR  provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR v40, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case. See also: Inflection Points.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Approximately 140,000 data items were added or changed.

Data Changes

Segmentation Changes

Locale Changes

File Changes

JSON Data Changes

Specification Changes

Locale Identifiers

Dates

Units of Measurement

Growth

The chart below shows the growth over time, with the additions from the latest release in the top blue section.

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-41.md b/docs/site/downloads/cldr-41.md new file mode 100644 index 00000000000..0d7b3f01f47 --- /dev/null +++ b/docs/site/downloads/cldr-41.md @@ -0,0 +1,151 @@ +--- +title: 'CLDR 41 Download' +--- + + +Unicode CLDR - CLDR 41 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 41 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.


CLDR v41 is a limited-submission release. Most work was on tooling, with only specified updates to the data, namely Phase 3 of the grammatical units of measurement project. The required grammar data for the Modern coverage level increased, with 40 locales adding an average of 4% new data each. Ukrainian grew the most, by 15.6%.


The tooling changes  are targeted at the v42 general submission release. They include a number of features and improvements such as progress meter widgets in the Survey Tool


Finally, the Basic level has been modified to make it easier to onboard new languages, and easier for implementations to filter locale data based on coverage levels.

The following table shows the number of Languages/Locales in this version. (See the v41 Locale Coverage table for more information.)

Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:

Data Changes

Because this is a limited-submission release, the data changes are limited. The focus for data this release was on Phase 3 of the project for providing grammatical information for units of measurement, with more locales reaching a modern coverage level, plus Phase 1 of a project to revamp Coverage levels.

Locale Changes

File Changes

JSON Data Changes

Specification Changes

The following are the main changes in the specification:

Tooling Changes

Survey Tool

Developer

Migration

Upcoming Changes

Growth

The following shows the growth of CLDR data per year, represented as an area chart. 

Known Issues

This section will contain issues that arise after the data, code, or spec has been frozen.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-42.md b/docs/site/downloads/cldr-42.md new file mode 100644 index 00000000000..66fa3ead79f --- /dev/null +++ b/docs/site/downloads/cldr-42.md @@ -0,0 +1,83 @@ +--- +title: 'CLDR 42 Download' +--- + + +Unicode CLDR - CLDR 42 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 42 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 42, the focus is on:

Locale Status

CLDR v42 Language Count

Data Changes

There were two areas of focus for this release: the formatting of Personal Names, and the upgrade of Modern to include many more languages.

Locale Changes

File Changes 

JSON Data Changes

Background

Formatting people’s names

Software needs to be able to format people's names, such as John Smith or 宮崎駿. The data is typically drawn from a database, where a name record will have fields for the parts of people’s names, such as a given field with a value of “Maria”, and a surname field value of “Schmidt”. 

There are many complications in dealing with the variety of different ways this needs to be done across languages:

CLDR has added structured patterns that enable implementations to format available name fields for a given language. The formatting for a name can vary according to the available name fields, the language of the name and of the viewer, and various input settings.

The new Person Name formatting data has a tech preview status. The CLDR committee is requesting feedback on the data and structure so that it can be refined and enhanced in the next release. ICU will also be offering a tech preview API in its next release. Other clients of CLDR are recommended to try out the new data and structure, and supply feedback back to the CLDR committee in the next few months.

Specification Changes

The following are the main changes in the specification:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v42 release and v41 are at v42 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Changed/Deleted. See v42 locale-growth.tsv for the detailed figures behind the chart.

CLDR v42 Growth

Migration

Known Issues

Upcoming changes

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-43.md b/docs/site/downloads/cldr-43.md new file mode 100644 index 00000000000..0f250af598e --- /dev/null +++ b/docs/site/downloads/cldr-43.md @@ -0,0 +1,113 @@ +--- +title: 'CLDR 43 Download' +--- + + +Unicode CLDR - CLDR 43 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 43 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).

CLDR 43.1 is a dot release focused on fixing specific issues. For more details for see Version 43.1 Changes.

CLDR 43 is a limited-submission release, focusing on just a few areas:

For details, see below.

Locale Status

The bar for each coverage level increases each release. Faroese (fo) increased from Basic to Moderate, while Cherokee (chr), Lower Sorbian (dsb), and Upper Sorbian (hsb) dropped from Modern to Moderate.

CLDR v43 Coverage

Version 43.1 Changes

Version 43.1 currently in Beta. It is planned to be a dot release that addresses the following issues. The main changes are for compatibility (including parser compatibility and GB 18030-2022 Level 2 support). To access the release data, use the release tag or the json link. The following tickets are included:

GB18030-2022 Compliance

Compatibility

The following changes are included to allow for better compatibility with certain parsers.

Other


The only DTD change is the additional of alt="ascii" for time formats:

<!ATTLIST pattern alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/alphaNextToNumber, ascii, noCurrency, variant-->
<!ATTLIST dateFormatItem alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/ascii, variant-->

Data Changes

Locale Changes

File Changes

New files:

Note: All files were moved from seed to common (see the Migration section)

JSON Data Changes

See the Migration section for general data changes.

Specification Changes


Please see Modifications section in the LDML for full list of items:


Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v43 release and v42 are at v43 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Deleted/Changed.

Because this was a limited-submission release, there are a small number of changes visible.

Language Matching

CLDR has data for language matching, as in this chart. The purpose and usage is sometimes misunderstood. 

So how is this used? Consider a user whose first language is Breton. If they open an application that only has localizations for English, German, and French, then Breton will not be available. In that case, the data in CLDR can be used to select French as a fallback localization — in the absence of other information. 

That last clause is important. The CLDR data is based on the likelihood that a person using language X understands text written in language Y, but large portions of the population for X might prefer other languages. 

The CLDR language matching data can and should be overridden whenever there is more information available from a user that allows an implementation to do a better job. It is strongly recommended that systems allow users to not only specify their preferred language, but also any secondary languages in order of priority. Thus a person speaking Kazakh who also knows French could specify French as a secondary language, and get a French localization for an app instead of the CLDR match. This has been done on both Android and iOS, for example.

Important:  language matching is different from the CLDR inheritance mechanism: they serve different purposes, and are not aligned. The CLDR inheritance mechanism is how CLDR organizes localized data, and should not be used for language matching. Applications do not need to follow the CLDR inheritance chain.

References: LDML Language Matching, LDML Inheritance vs Related Information, ICU4J Locale Matcher, ICU4C Locale Matcher 

Migration

Known Issues

None currently.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.


The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see https://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file diff --git a/docs/site/downloads/cldr-44.md b/docs/site/downloads/cldr-44.md new file mode 100644 index 00000000000..822504b2c27 --- /dev/null +++ b/docs/site/downloads/cldr-44.md @@ -0,0 +1,121 @@ +--- +title: 'CLDR 44 Download' +--- + + +Unicode CLDR - CLDR 44 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 44 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 44, the focus is on:

Locale Coverage Status

The coverage status determines how well languages are supported on laptops, phones, and other computing devices. In particular, qualifying at a Basic level is typically a requirement just for being selectable on phones as a language. Note that for each language there are typically multiple locales, so 90 languages at Modern coverage corresponds to more than 350 locales at that coverage.

Below is the coverage in this release:

CLDR v44 Coverage

Version 44.1 Changes

DTD Changes 

Specification Changes 

Data Changes 

Data Changes

The following is a summary of the DTD changes which reflect changes in the structure. The relevant ones are described more fully in the data changes.

LDML

Supplemental Data

BCP47 

Keyboards

 Locale Changes

File Changes

(Aside from locale files)

Additions:

New XSD files in /common/dtd/. 

These correspond to the DTDs, but do not carry the extra validity annotations.

New Test Data files in /common/testData/

Removals:

Files with insufficient data

Old format keyboards were removed (see Migration):

JSON Data Changes

Keyboard Changes

Keyboard has a new DTD (keyboard3.dtd and the <keyboard3> element). This is a complete rewrite of the specification by the Keyboard Subcommittee, and is available as a technical preview in CLDR version 44. See TR35 Part 7: Keyboards. The prior DTDs are included in CLDR but are not used by CLDR data or tooling. Note: prior keyboard data files are not compatible, were not maintained and have also been removed.

Note that there are additional sample keyboard data files in progress which were not complete for v44, but may be consulted as samples:

See the Known Issues section for additional known issues.

Specification Changes

Please see Modifications section in the draft spec for the list of current changes.

A diff of the changes since CLDR 43 can be viewed here in GitHub, which was last updated on 6 October 2023. Clicking on the rich-diff icon for a page ( 📄 ) will often show the differences with a rich diff, such as the following:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data; nor does it include corrections (which typically outnumber new items). The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

There were generally a relatively small number of additions this cycle; the focus was improvements in quality, and changes will not show up below.

Migration

Known Issues

These are not always the same. In the future, some of these functions will be separated out; see CLDR-17095.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file