From 7144e81ec45f20d027ce02455472c7b2bae3727d Mon Sep 17 00:00:00 2001 From: "Steven R. Loomis" Date: Thu, 5 Sep 2024 11:31:55 -0500 Subject: [PATCH] CLDR-17803 site: checkin raw old download pages - scraped, and then truncated the html, then used css to try to zap some of the leaking bugs --- docs/site/assets/css/page.css | 13 +++++++++ docs/site/downloads/cldr-31.md | 38 +++++--------------------- docs/site/downloads/cldr-32.md | 38 +++++--------------------- docs/site/downloads/cldr-33-1.md | 46 +++++--------------------------- docs/site/downloads/cldr-33.md | 38 +++++--------------------- docs/site/downloads/cldr-34.md | 36 +++++-------------------- docs/site/downloads/cldr-35.md | 38 +++++--------------------- docs/site/downloads/cldr-36.md | 38 +++++--------------------- docs/site/downloads/cldr-37.md | 40 +++++---------------------- docs/site/downloads/cldr-38.md | 40 +++++---------------------- docs/site/downloads/cldr-39.md | 28 ++----------------- docs/site/downloads/cldr-40.md | 30 +++------------------ docs/site/downloads/cldr-41.md | 28 +++---------------- docs/site/downloads/cldr-42.md | 32 +++------------------- docs/site/downloads/cldr-43.md | 30 +++------------------ docs/site/downloads/cldr-44.md | 34 ++++------------------- 16 files changed, 97 insertions(+), 450 deletions(-) diff --git a/docs/site/assets/css/page.css b/docs/site/assets/css/page.css index 4cc90e94ab5..5b1117833d5 100644 --- a/docs/site/assets/css/page.css +++ b/docs/site/assets/css/page.css @@ -105,3 +105,16 @@ footer { .markdown-alert-important .markdown-alert-title { color: blueviolet; } + +/* unusable header and stuff from old Sites. */ +svg.K4B8Y, div.Xb9hP, div.hBW7Hb, div.WIdY2d, span.Lw7GHd, svg.K4B8Y, path.MrYMx, path.K4B8Y, span.Ce1Y1c, svg.hmuWb, svg.wFCWne { + display: none; +} + +header#atViewHeader { + display: none !important; +} + +div > header { + display: none !important; +} diff --git a/docs/site/downloads/cldr-31.md b/docs/site/downloads/cldr-31.md index 16b817cf820..8c5fd992bf4 100644 --- a/docs/site/downloads/cldr-31.md +++ b/docs/site/downloads/cldr-31.md @@ -1,33 +1,9 @@ ---- -title: 'CLDR 31 Download' ---- - - -Unicode CLDR - CLDR 31 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 31 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 31 Release Note

For details, see Spec Modifications.

Migration

  • Code changes

    • The subdivision codes have been changed to all be the bcp47 format, eg "usca" instead of "US-CA". This affects supplemental containment and subdivisions, and translations in subdivisions/en.xml, etc. See Part 6, Sec 2.2 [#9942]

    • The locales in the language-territory population tables have been changed to be the canonical format, dropping the script where it is the default. So "ku_Latn" changes to "ku"

    • The exemplar/ locale data file names have also been changed to be the canonical format, dropping the script where it is the default.

  • Plural rules

    • The Portuguese plural rules have changed so that all (and only) integers and decimal fractions < 2 are singular.

  • Timezones

    • The GMT timezone has been split from the UTC timezone.

    • New timezone bcp47 codes have been added.

  • Language/Region data

    • The new literacyPercent attribute for supplemental <languagePopulation> has been broken out from writingPercent, the latter now only being used to reflect primarily-spoken languages. [#9421]

    • A new format for language matching is provided. To allow time for implementations to change over, the old data is retained, and the new data is marked as "written-new".

    • Languages "hr" and "sr" are no longer a short distance apart, for political reasons.

  • Other

    • The primary names for CZ changed from "Czech Republic" to "Czechia", with the longer name now the alternate.

Known Issues

“Week of” structure

The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

<dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>

<dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].

Non-unique emoji short names (fixed in 31.0.1)

Some of the emoji names are not unique. Fixes are being gathered, but are not in time for the release. See [#10116], [#10127]

Chinese stroke collation

Since CLDR 30, Chinese stroke collation has been missing entries for several basic characters. CLDR 32 reverts the stroke collation data to the CLDR 29 version; a complete fix for the underlying problem is targeted for CLDR 33. See #10497, #10642.

Others

See tickets for v31.0.1.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key

    • The Release Note contains a general description of the contents of the release, and any relevant notes about the release.

    • The Data link points to a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization).

    • The Spec is the version of UTS #35: LDML that corresponds to the release.

    • The Delta document points to a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs.

    • The SVN Tag can be used to get the files via Repository Access.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

For details, see Spec Modifications.

Migration

  • Code changes

    • The subdivision codes have been changed to all be the bcp47 format, eg "usca" instead of "US-CA". This affects supplemental containment and subdivisions, and translations in subdivisions/en.xml, etc. See Part 6, Sec 2.2 [#9942]

    • The locales in the language-territory population tables have been changed to be the canonical format, dropping the script where it is the default. So "ku_Latn" changes to "ku"

    • The exemplar/ locale data file names have also been changed to be the canonical format, dropping the script where it is the default.

  • Plural rules

    • The Portuguese plural rules have changed so that all (and only) integers and decimal fractions < 2 are singular.

  • Timezones

    • The GMT timezone has been split from the UTC timezone.

    • New timezone bcp47 codes have been added.

  • Language/Region data

    • The new literacyPercent attribute for supplemental <languagePopulation> has been broken out from writingPercent, the latter now only being used to reflect primarily-spoken languages. [#9421]

    • A new format for language matching is provided. To allow time for implementations to change over, the old data is retained, and the new data is marked as "written-new".

    • Languages "hr" and "sr" are no longer a short distance apart, for political reasons.

  • Other

    • The primary names for CZ changed from "Czech Republic" to "Czechia", with the longer name now the alternate.

Known Issues

“Week of” structure

The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

<dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>

<dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers. For more information see [#9801].

Non-unique emoji short names (fixed in 31.0.1)

Some of the emoji names are not unique. Fixes are being gathered, but are not in time for the release. See [#10116], [#10127]

Chinese stroke collation

Since CLDR 30, Chinese stroke collation has been missing entries for several basic characters. CLDR 32 reverts the stroke collation data to the CLDR 29 version; a complete fix for the underlying problem is targeted for CLDR 33. See #10497, #10642.

Others

See tickets for v31.0.1.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Key

    • The Release Note contains a general description of the contents of the release, and any relevant notes about the release.

    • The Data link points to a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization).

    • The Spec is the version of UTS #35: LDML that corresponds to the release.

    • The Delta document points to a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs.

    • The SVN Tag can be used to get the files via Repository Access.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-32.md b/docs/site/downloads/cldr-32.md index 931942c2a60..e6217f25f67 100644 --- a/docs/site/downloads/cldr-32.md +++ b/docs/site/downloads/cldr-32.md @@ -1,33 +1,9 @@ ---- -title: 'CLDR 32 Download' ---- - - -Unicode CLDR - CLDR 32 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 32 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 32 Release Note

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-33-1.md b/docs/site/downloads/cldr-33-1.md index aac1316d0a8..426c494db4a 100644 --- a/docs/site/downloads/cldr-33-1.md +++ b/docs/site/downloads/cldr-33-1.md @@ -1,41 +1,9 @@ ---- -title: 'CLDR 33-1 Download' ---- - - -Unicode CLDR - CLDR 33.1

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33.1

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33.1

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-33.md b/docs/site/downloads/cldr-33.md index df443a4bd67..f311c19a973 100644 --- a/docs/site/downloads/cldr-33.md +++ b/docs/site/downloads/cldr-33.md @@ -1,33 +1,9 @@ ---- -title: 'CLDR 33 Download' ---- - - -Unicode CLDR - CLDR 33 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 33 Release Note

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-34.md b/docs/site/downloads/cldr-34.md index efa7e2db76c..6e922ec7ff8 100644 --- a/docs/site/downloads/cldr-34.md +++ b/docs/site/downloads/cldr-34.md @@ -1,31 +1,9 @@ ---- -title: 'CLDR 34 Download' ---- - - -Unicode CLDR - CLDR 34 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 34 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 34 Release Note

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-35.md b/docs/site/downloads/cldr-35.md index bc2f0d06b59..ebf8886d2bd 100644 --- a/docs/site/downloads/cldr-35.md +++ b/docs/site/downloads/cldr-35.md @@ -1,33 +1,9 @@ ---- -title: 'CLDR 35 Download' ---- - - -Unicode CLDR - CLDR 35 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 35 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 35 Release Note

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-36.md b/docs/site/downloads/cldr-36.md index ba4380c3792..679dc91b7ac 100644 --- a/docs/site/downloads/cldr-36.md +++ b/docs/site/downloads/cldr-36.md @@ -1,33 +1,9 @@ ---- -title: 'CLDR 36 Download' ---- - - -Unicode CLDR - CLDR 36 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 36 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 36 Release Note

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-37.md b/docs/site/downloads/cldr-37.md index 81dfa4185bf..90b8228118e 100644 --- a/docs/site/downloads/cldr-37.md +++ b/docs/site/downloads/cldr-37.md @@ -1,35 +1,9 @@ ---- -title: 'CLDR 37 Download' ---- - - -Unicode CLDR - CLDR 37 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 37 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 37 Release Note

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v37 focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.

Data Changes

  • Units

    • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.

    • SI Prefixes. SI prefix patterns for "kilo{0}", "mega{0}", etc. have been added, as well as the prefix terms for square and cubic. These are fallbacks for when no combined form is available, so that the name for more unusual units like megagram or square megameter can be formed in different languages.

    • Other additions. A few unit identifiers translations been added, such as duration-century, area-square-kilometer, area-square-meter.

    • See also Migration.

  • Annotations

    • Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added.

    • Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanumerics, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

  • Sorting

    • Emoji 13.0. The collation sequences are updated for new Unicode 13.0 and for Emoji 13.0.

  • Locales

    • New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese

    • New languages at Modern coverage: Nigerian Pidgin

    • See Locale Coverage Data for the coverage per locale, for both new and old locales.

  • Grammatical data

    • Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

  • Misc

    • Updates to code sets. In particular, the EU is updated (removing GB).

    • Alternate versions. In some languages

      • Some additional language names have "menu" style for alphabetizing, such as Kurdish, Central instead of Central Kurdish.

      • There are variants for Cape Verde as equivalent to Cabo Verde.

    • Myanmar-Latin transliteration added

For access to the data, see the GitHub tag above. For more details see the Delta Tickets above.

Specification Changes

The largest changes were the following:

  • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units.

    • For example, a program (or database) could use 1.88 meters internally, but then for person-height have that measurement convert to 6 foot 2 inches for en_US and to 188 centimeters for de_CH.

    • Using the unit display names and list formats, those results can then be displayed according to the desired width (eg 2″ vs 2 in vs 2 inches) and using the locale display names and number formats.

    • The size of the measurement can also be taken into account, so that an infant can have a height as 18 inches, and an adult the height as 6 foot 2 inches.

  • Grammatical features added. Grammatical features are added for many languages.

    • List Patterns. Clarified that more sophisticated processing can be used, and added examples of customized processing for specific languages.

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • New elements are added for enhanced unit preferences, such as the units to use for person-height in different countries. This is an initial phase; additional preferences will be added in the future.

  • Additionally, elements and data are added for unit conversions, so that programmers can supply amounts in one unit and get the right amounts to display for different locales.

  • Grammatical features are added for various languages, as a prelude to allowing programmers to format units according to grammatical context (eg, dative version of 3 kilometers)

  • The augmented constraints have been updated, so that the tests can apply those constraints to all of the CLDR data.

  • Annotations now include non-emoji. Note: emoji are distinguished from other symbols using Unicode properties.

For more information, see the Delta DTDs above.

Chart Changes

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories. The % values are percent of the current measure of Modern coverage. (That level is notched up each release.)

See also the Locale Coverage Data.

Migration

  • Seven unit identifiers with irregular components have been deprecated, and are given alias values to the regular forms. For example, square always comes before the unit, and is square, not squared. The validity data has also been updated to mark the older forms as deprecated.

      • inch-hg ⟹ inch-ofhg

      • liter-per-100kilometers ⟹ liter-per-100-kilometer

      • meter-per-second-squared ⟹ meter-per-square-second

      • millimeter-of-mercury ⟹ millimeter-ofhg

      • part-per-million ⟹ permillion

      • pound-foot ⟹ pound-force-foot

      • pound-per-square-inch ⟹ pound-force-per-square-inch

    • Some of the unit usage parameters were also deprecated, since they didn't differ in practice. (The spec has been updated to have fallback, so if these need to be distinct in the future, they would be of the form media-music or media-music-track.)

      • music-track ⟹ media

      • tv-program ⟹ media

    • The subdivision codes gbeng, gbsct, and gbwls (used for flag emoji) are now deprecated (ISO removed them from its latest data). This can affect implementations testing for validity if they don't also check for 'deprecated' in common/validity/subdivision.xml. Compare the Territory Subdivisions charts for v37 and v36.

Known Issues

  1. The expanded unit preferences are under development. The data is based on what was in CLDR v36, plus some other sources, but will be expanded in the future both to get better thresholds, and cover more cases where locales differ. See the ticket Improve unit structure and data [CLDR-13654]

  2. The Transform charts have been disabled. [CLDR-13308]

  3. The charts show spurious changes for gbeng, etc. That's because the file locations changed across releases.

  4. The JSON-format data for CLDR 37 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Special thanks to the contributors to Nigerian Pidgin; one of the very few locales to go from zero to Modern coverage in one submission cycle!

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v37 focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.

Data Changes

  • Units

    • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.

    • SI Prefixes. SI prefix patterns for "kilo{0}", "mega{0}", etc. have been added, as well as the prefix terms for square and cubic. These are fallbacks for when no combined form is available, so that the name for more unusual units like megagram or square megameter can be formed in different languages.

    • Other additions. A few unit identifiers translations been added, such as duration-century, area-square-kilometer, area-square-meter.

    • See also Migration.

  • Annotations

    • Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added.

    • Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanumerics, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

  • Sorting

    • Emoji 13.0. The collation sequences are updated for new Unicode 13.0 and for Emoji 13.0.

  • Locales

    • New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese

    • New languages at Modern coverage: Nigerian Pidgin

    • See Locale Coverage Data for the coverage per locale, for both new and old locales.

  • Grammatical data

    • Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

  • Misc

    • Updates to code sets. In particular, the EU is updated (removing GB).

    • Alternate versions. In some languages

      • Some additional language names have "menu" style for alphabetizing, such as Kurdish, Central instead of Central Kurdish.

      • There are variants for Cape Verde as equivalent to Cabo Verde.

    • Myanmar-Latin transliteration added

For access to the data, see the GitHub tag above. For more details see the Delta Tickets above.

Specification Changes

The largest changes were the following:

  • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units.

    • For example, a program (or database) could use 1.88 meters internally, but then for person-height have that measurement convert to 6 foot 2 inches for en_US and to 188 centimeters for de_CH.

    • Using the unit display names and list formats, those results can then be displayed according to the desired width (eg 2″ vs 2 in vs 2 inches) and using the locale display names and number formats.

    • The size of the measurement can also be taken into account, so that an infant can have a height as 18 inches, and an adult the height as 6 foot 2 inches.

  • Grammatical features added. Grammatical features are added for many languages.

    • List Patterns. Clarified that more sophisticated processing can be used, and added examples of customized processing for specific languages.

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • New elements are added for enhanced unit preferences, such as the units to use for person-height in different countries. This is an initial phase; additional preferences will be added in the future.

  • Additionally, elements and data are added for unit conversions, so that programmers can supply amounts in one unit and get the right amounts to display for different locales.

  • Grammatical features are added for various languages, as a prelude to allowing programmers to format units according to grammatical context (eg, dative version of 3 kilometers)

  • The augmented constraints have been updated, so that the tests can apply those constraints to all of the CLDR data.

  • Annotations now include non-emoji. Note: emoji are distinguished from other symbols using Unicode properties.

For more information, see the Delta DTDs above.

Chart Changes

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories. The % values are percent of the current measure of Modern coverage. (That level is notched up each release.)

See also the Locale Coverage Data.

Migration

  • Seven unit identifiers with irregular components have been deprecated, and are given alias values to the regular forms. For example, square always comes before the unit, and is square, not squared. The validity data has also been updated to mark the older forms as deprecated.

      • inch-hg ⟹ inch-ofhg

      • liter-per-100kilometers ⟹ liter-per-100-kilometer

      • meter-per-second-squared ⟹ meter-per-square-second

      • millimeter-of-mercury ⟹ millimeter-ofhg

      • part-per-million ⟹ permillion

      • pound-foot ⟹ pound-force-foot

      • pound-per-square-inch ⟹ pound-force-per-square-inch

    • Some of the unit usage parameters were also deprecated, since they didn't differ in practice. (The spec has been updated to have fallback, so if these need to be distinct in the future, they would be of the form media-music or media-music-track.)

      • music-track ⟹ media

      • tv-program ⟹ media

    • The subdivision codes gbeng, gbsct, and gbwls (used for flag emoji) are now deprecated (ISO removed them from its latest data). This can affect implementations testing for validity if they don't also check for 'deprecated' in common/validity/subdivision.xml. Compare the Territory Subdivisions charts for v37 and v36.

Known Issues

  1. The expanded unit preferences are under development. The data is based on what was in CLDR v36, plus some other sources, but will be expanded in the future both to get better thresholds, and cover more cases where locales differ. See the ticket Improve unit structure and data [CLDR-13654]

  2. The Transform charts have been disabled. [CLDR-13308]

  3. The charts show spurious changes for gbeng, etc. That's because the file locations changed across releases.

  4. The JSON-format data for CLDR 37 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

Special thanks to the contributors to Nigerian Pidgin; one of the very few locales to go from zero to Modern coverage in one submission cycle!

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-38.md b/docs/site/downloads/cldr-38.md index fb9da3dfe84..d0554342824 100644 --- a/docs/site/downloads/cldr-38.md +++ b/docs/site/downloads/cldr-38.md @@ -1,35 +1,9 @@ ---- -title: 'CLDR 38 Download' ---- - - -Unicode CLDR - CLDR 38 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 38 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 38 Release Note

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The units of measurement additions allow for support of APIs for simple unitIDs such as meter up to compound unitIDs such as cubic-meter-per-square-second or acre-feet-per-day, such as the following:

getUnitPattern(unitId, locale, width, pluralCategory, caseVariant) — to get the localized, inflected pattern for a simple or compound unit of measurement, appropriate for a position in a sentence or phrase with the appropriate pluralCategory and grammatical case (nominative, accusative, genitive, etc).

getUnitGender(unitId, locale) — to get the gender for a unit of measurement, so that other parts of a sentence or phrase can be modified to agree with that gender.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:

Added

155,131

Deleted

33,805

Changed

45,895

Data Changes

The following summarizes the changes to the data for this version of CLDR.

  • 13.1 Emoji and Unicode Symbols

      • Added names & search keywords for Emoji 13.1 and enhancements to existing emoji annotation data.

      • Added approximately 400 non-emoji Unicode symbols such as punctuation and currency symbols.

      • Added 2 character labels: superscript {0} and subscript {0}.

      • Aside from the CLDR target locales, emoji annotations and keywords expanded in Hausa (ha), Igbo (ig), Kalaallisut (kl), Luxembourgish (lb), Maori (mi), Manipuri (mni), Maltese (mt), Punjabi [Arabic] (pa_Arab), Kinyarwanda (rw), Tajik (tg), Tigrinya (ti), Uyghur (ug), Wolof (wo), Xhosa (xh), Yoruba (yo), with minor expansions in a few other languages.

  • Compact decimals and Units

      • Added 14 new units.

      • Added new binary prefixes.

      • Added new operand 'c' (with a synonym 'e') for languages like French (CLDR-12010)

  • Higher Coverage Levels

      • Modern: Norwegian Nynorsk

      • Moderate++: Hausa, Igbo, Breton, Quechua, Yoruba — made significant improvements, but didn't make it quite to Modern

      • Moderate: Fulah (Adlam), Chakma, Asturian

      • Basic+: Wolof, Tajik, Maori, Luxembourgish, Uyghur, Tigrinya — made significant improvements, but didn't get near to Moderate

      • Basic: Sanskrit, Dogri

  • Unit Inflections

      • Completed phase 1. The full goal is to add full case and gender support for formatted units. During phase 1, a limited number of locales (see below) and units of measurement are being handled, so that we can work kinks out of the process before expanding to all units for all locales (where we can get the grammatical structure).

      • Case & Gender: Polish (pl), Russian (ru), German (de), Hindi (hi) (in rough order of complexity)

      • Gender Only: Dutch (nl), Norwegian Bokmål (nb), Danish (da), Swedish (sv), French (fr), Italian (it), Portuguese (pt), Spanish (es)

  • Performance & Quality

      • Made substantial improvements in Survey Tool performance, lowering cost for translation.

      • Made substantial improvement in quality, using structured Forum topics to allow translators to collaborate more effectively.

      • Improved detection of translator errors.

  • ICU support

      • Improvements to CLDR API, providing a limited, stable API for extracting CLDR data.

      • Adding approximatelySign for number formatting.

  • Unicode locale identifiers and BCP 47

      • Added a new -u locale extension keyword -dx, used to specify scripts to exclude from dictionary break (for word and line break)

      • Added a new short timezone identifier: tz-glgoh

      • Revamped the language, script, region, and variant alias data to improve replacement of deprecated codes.

For access to the draft data, see the git tag above. For more details see the Delta tickets above.

JSON Data Changes

JSON data now includes data for plural ranges, grammatical inflections, typographical labels, and annotations. If you are making use of JSON data, please join the [cldr-users] mailing list where we would like to hear your feedback.

CLDR JSON data for v38 is available, please see https://github.com/unicode-org/cldr-json

Specification Changes

The largest changes were the following:

  • To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for canonicalization. (This was done in concert with fixes to the alias data to work better with the specification.) See Migration and Annex C. LocaleId Canonicalization for more details.

  • To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.

  • For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.

    • For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French. (CLDR-12010)

  • To support inflected units of measurement (phase 1), add specifications for the new elements listed under Structure Changes and an algorithm for how to construct grammatical unit names (simple or compound).

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • Added additional structure for unit inflections

    • New elements:

      • minimalPairs adds new elements caseMinimalPairs and genderMinimalPairs

      • unit adds a new element gender

      • grammaticalData adds new elements grammaticalDerivations, deriveCompound, and deriveComponent

    • New attributes for existing elements:

      • unitPattern adds a new attribute case

      • grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope

      • compoundUnitPattern1 adds new attributes case and gender

      • compoundUnitPattern adds a new attribute case

  • Number symbols adds approximatelySign element

  • Some additional attribute value constraints are added

    • for example, characterLabelPattern@type now allows for superscript and subscript values, indicated by the notation ⟪… strokes⟫➠⟪… strokes, subscript, superscript⟫ in Delta DTDs

    • some of these constraints are expanded due to new structure, while others are

For more details, see the Delta DTDs above.

Chart Changes

  • All charts are updated for the new data; for example, Romance Annotations shows the new non-emoji symbols and punctuation for Romance languages.

  • The DTD Deltas chart has a more compact representation for changes in attribute constraints, making the changes easier to see.

  • The new Grammatical Forms Charts show the new grammatical forms for units.

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories.

The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

See also the Locale Coverage Data v38 and for details of the changes see delta_summary.tsv and locale-growth.tsv

Migration

  • The plural rules for French changed to add a new category, 'many', using the new operand 'c' (with a synonym 'e'). It should only have effect on compact number handling.

    • Important: according to the spec, when there is no message for a plural category, the message for 'other' should be returned. As long as implementations observe this policy, migration to this should work without problems.

  • <languageMatches type="written"> was deprecated some time ago, and now has been removed. Clients should use <languageMatches type="written_new"> (recognizing that there are some syntax changes). CLDR-13245

  • The following locales have been moved in the folder structures. CLDR-14080

    • Seed → Common: Sanskrit (sa)

    • Common → Seed: Church Slavic (cu), Volapük (vo), Prussian (prg)

  • The specification for using aliases is more rigorous, and some alias data has changed. Programs using this data may need modification:

    • The specification processes the rules in a certain order, so the file order needs to be maintained.

    • The specification now explicitly takes multiple passes (though that can be optimized by implementations)

    • Various variantAliases are replaced by languageAliases where they require more context to be properly handed (the former specification did not handle variant aliases correctly).

      • AALAND ⇒ AX is replaced by und_aaland ⇒ und_AX

      • arevmda ⇒ hyw is replaced by two rules: hy_arevmda ⇒ hyw & und_arevmda ⇒ und

    • Some spurious aliases have been removed, where they are not properly aliases but rather partial duplications of more complete information:

      • Those covered by the parent locale data and/or likely subtag data, such as az_AZ ⇒ az_Latn_AZ

      • Those covered by canonicalization of extlang subtags, such as zh_wuu ⇒ wuu

    • Changes to the download files:

      • cldr-tools-*.zip no longer contains a built cldr.jar, use the separate cldr-tools-*.jar instead.

        • And as of v38.1 and later, cldr-tools-*.zip is no longer included at all. You can download or checkout the source tree directly from GitHub.

      • cldr-tools-*.jar is a standalone .jar file containing the CLDR tools and all needed dependencies.

      • There is a new "hashes/" subdirectory which contains GPG signatures and SHA-512 sums.

External Data Version

Known Issues

  1. The Transform charts have been disabled until the generating code could be fixed. [CLDR-11019]

  2. The JSON-format data for CLDR 38 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

  3. Hebrew compact number formatting scrambles text if embedded in RTL message [CLDR-14256]

    1. There are a number of fixes needed in the LDML specification.

    2. CLDR-14272 The documentation of @targets and @scope in grammaticalFeatures is missing; see the ticket for the missing text.

      1. CLDR-14312 replacement in subdivisionAlias in common/supplemental/supplementalMetadata.xml contains alpha{2}

      2. CLDR-14318 Should not remove "true" of tfield in UTS35 Appendix A

      3. CLDR-14319 Remove wrong/duplicated example below "Territory Exception" in UTS35 Appendix A

      4. CLDR-14320 "Put all <keywords, tfields> pairs into alphabetical order" is wrong in Appendix A of UTS35

      5. CLDR-13894 Need to use variantAlias replacement in BCP 47 Language Tag to Unicode BCP 47 Locale Identifier

      6. CLDR-14244 Document special 'alt' inheritance

CLDR 38.1

This dot release makes a very small number of incremental additions to version 38 to address the specific bugs listed in Δ38.1. The data changes are summarized in 38.1/delta/index.html. CLDR v38.1 is also included in ICU 68.2.

Migration note for CLDR 38.1:

    • As of v38.1 and later, cldr-tools-*.zip is no longer included in the download files. You can download or checkout the source tree directly from GitHub.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

See Key to Header Links

Overview

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The units of measurement additions allow for support of APIs for simple unitIDs such as meter up to compound unitIDs such as cubic-meter-per-square-second or acre-feet-per-day, such as the following:

getUnitPattern(unitId, locale, width, pluralCategory, caseVariant) — to get the localized, inflected pattern for a simple or compound unit of measurement, appropriate for a position in a sentence or phrase with the appropriate pluralCategory and grammatical case (nominative, accusative, genitive, etc).

getUnitGender(unitId, locale) — to get the gender for a unit of measurement, so that other parts of a sentence or phrase can be modified to agree with that gender.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:

Added

155,131

Deleted

33,805

Changed

45,895

Data Changes

The following summarizes the changes to the data for this version of CLDR.

  • 13.1 Emoji and Unicode Symbols

      • Added names & search keywords for Emoji 13.1 and enhancements to existing emoji annotation data.

      • Added approximately 400 non-emoji Unicode symbols such as punctuation and currency symbols.

      • Added 2 character labels: superscript {0} and subscript {0}.

      • Aside from the CLDR target locales, emoji annotations and keywords expanded in Hausa (ha), Igbo (ig), Kalaallisut (kl), Luxembourgish (lb), Maori (mi), Manipuri (mni), Maltese (mt), Punjabi [Arabic] (pa_Arab), Kinyarwanda (rw), Tajik (tg), Tigrinya (ti), Uyghur (ug), Wolof (wo), Xhosa (xh), Yoruba (yo), with minor expansions in a few other languages.

  • Compact decimals and Units

      • Added 14 new units.

      • Added new binary prefixes.

      • Added new operand 'c' (with a synonym 'e') for languages like French (CLDR-12010)

  • Higher Coverage Levels

      • Modern: Norwegian Nynorsk

      • Moderate++: Hausa, Igbo, Breton, Quechua, Yoruba — made significant improvements, but didn't make it quite to Modern

      • Moderate: Fulah (Adlam), Chakma, Asturian

      • Basic+: Wolof, Tajik, Maori, Luxembourgish, Uyghur, Tigrinya — made significant improvements, but didn't get near to Moderate

      • Basic: Sanskrit, Dogri

  • Unit Inflections

      • Completed phase 1. The full goal is to add full case and gender support for formatted units. During phase 1, a limited number of locales (see below) and units of measurement are being handled, so that we can work kinks out of the process before expanding to all units for all locales (where we can get the grammatical structure).

      • Case & Gender: Polish (pl), Russian (ru), German (de), Hindi (hi) (in rough order of complexity)

      • Gender Only: Dutch (nl), Norwegian Bokmål (nb), Danish (da), Swedish (sv), French (fr), Italian (it), Portuguese (pt), Spanish (es)

  • Performance & Quality

      • Made substantial improvements in Survey Tool performance, lowering cost for translation.

      • Made substantial improvement in quality, using structured Forum topics to allow translators to collaborate more effectively.

      • Improved detection of translator errors.

  • ICU support

      • Improvements to CLDR API, providing a limited, stable API for extracting CLDR data.

      • Adding approximatelySign for number formatting.

  • Unicode locale identifiers and BCP 47

      • Added a new -u locale extension keyword -dx, used to specify scripts to exclude from dictionary break (for word and line break)

      • Added a new short timezone identifier: tz-glgoh

      • Revamped the language, script, region, and variant alias data to improve replacement of deprecated codes.

For access to the draft data, see the git tag above. For more details see the Delta tickets above.

JSON Data Changes

JSON data now includes data for plural ranges, grammatical inflections, typographical labels, and annotations. If you are making use of JSON data, please join the [cldr-users] mailing list where we would like to hear your feedback.

CLDR JSON data for v38 is available, please see https://github.com/unicode-org/cldr-json

Specification Changes

The largest changes were the following:

  • To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for canonicalization. (This was done in concert with fixes to the alias data to work better with the specification.) See Migration and Annex C. LocaleId Canonicalization for more details.

  • To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.

  • For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.

    • For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French. (CLDR-12010)

  • To support inflected units of measurement (phase 1), add specifications for the new elements listed under Structure Changes and an algorithm for how to construct grammatical unit names (simple or compound).

For more detailed specification changes, see the Spec above, and look at the Modifications section.

Structure Changes

  • Added additional structure for unit inflections

    • New elements:

      • minimalPairs adds new elements caseMinimalPairs and genderMinimalPairs

      • unit adds a new element gender

      • grammaticalData adds new elements grammaticalDerivations, deriveCompound, and deriveComponent

    • New attributes for existing elements:

      • unitPattern adds a new attribute case

      • grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope

      • compoundUnitPattern1 adds new attributes case and gender

      • compoundUnitPattern adds a new attribute case

  • Number symbols adds approximatelySign element

  • Some additional attribute value constraints are added

    • for example, characterLabelPattern@type now allows for superscript and subscript values, indicated by the notation ⟪… strokes⟫➠⟪… strokes, subscript, superscript⟫ in Delta DTDs

    • some of these constraints are expanded due to new structure, while others are

For more details, see the Delta DTDs above.

Chart Changes

  • All charts are updated for the new data; for example, Romance Annotations shows the new non-emoji symbols and punctuation for Romance languages.

  • The DTD Deltas chart has a more compact representation for changes in attribute constraints, making the changes easier to see.

  • The new Grammatical Forms Charts show the new grammatical forms for units.

Growth

The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories.

The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

See also the Locale Coverage Data v38 and for details of the changes see delta_summary.tsv and locale-growth.tsv

Migration

  • The plural rules for French changed to add a new category, 'many', using the new operand 'c' (with a synonym 'e'). It should only have effect on compact number handling.

    • Important: according to the spec, when there is no message for a plural category, the message for 'other' should be returned. As long as implementations observe this policy, migration to this should work without problems.

  • <languageMatches type="written"> was deprecated some time ago, and now has been removed. Clients should use <languageMatches type="written_new"> (recognizing that there are some syntax changes). CLDR-13245

  • The following locales have been moved in the folder structures. CLDR-14080

    • Seed → Common: Sanskrit (sa)

    • Common → Seed: Church Slavic (cu), Volapük (vo), Prussian (prg)

  • The specification for using aliases is more rigorous, and some alias data has changed. Programs using this data may need modification:

    • The specification processes the rules in a certain order, so the file order needs to be maintained.

    • The specification now explicitly takes multiple passes (though that can be optimized by implementations)

    • Various variantAliases are replaced by languageAliases where they require more context to be properly handed (the former specification did not handle variant aliases correctly).

      • AALAND ⇒ AX is replaced by und_aaland ⇒ und_AX

      • arevmda ⇒ hyw is replaced by two rules: hy_arevmda ⇒ hyw & und_arevmda ⇒ und

    • Some spurious aliases have been removed, where they are not properly aliases but rather partial duplications of more complete information:

      • Those covered by the parent locale data and/or likely subtag data, such as az_AZ ⇒ az_Latn_AZ

      • Those covered by canonicalization of extlang subtags, such as zh_wuu ⇒ wuu

    • Changes to the download files:

      • cldr-tools-*.zip no longer contains a built cldr.jar, use the separate cldr-tools-*.jar instead.

        • And as of v38.1 and later, cldr-tools-*.zip is no longer included at all. You can download or checkout the source tree directly from GitHub.

      • cldr-tools-*.jar is a standalone .jar file containing the CLDR tools and all needed dependencies.

      • There is a new "hashes/" subdirectory which contains GPG signatures and SHA-512 sums.

External Data Version

Known Issues

  1. The Transform charts have been disabled until the generating code could be fixed. [CLDR-11019]

  2. The JSON-format data for CLDR 38 currently omits the data from the CLDR common/supplemental files grammaticalFeatures.xml and units.xml. These are all new items in CLDR 37 except for the <unitPreferenceData>, which was formerly in supplementalData.xml. This will be addressed as soon as possible. [CLDR-13730]

  3. Hebrew compact number formatting scrambles text if embedded in RTL message [CLDR-14256]

    1. There are a number of fixes needed in the LDML specification.

    2. CLDR-14272 The documentation of @targets and @scope in grammaticalFeatures is missing; see the ticket for the missing text.

      1. CLDR-14312 replacement in subdivisionAlias in common/supplemental/supplementalMetadata.xml contains alpha{2}

      2. CLDR-14318 Should not remove "true" of tfield in UTS35 Appendix A

      3. CLDR-14319 Remove wrong/duplicated example below "Territory Exception" in UTS35 Appendix A

      4. CLDR-14320 "Put all <keywords, tfields> pairs into alphabetical order" is wrong in Appendix A of UTS35

      5. CLDR-13894 Need to use variantAlias replacement in BCP 47 Language Tag to Unicode BCP 47 Locale Identifier

      6. CLDR-14244 Document special 'alt' inheritance

CLDR 38.1

This dot release makes a very small number of incremental additions to version 38 to address the specific bugs listed in Δ38.1. The data changes are summarized in 38.1/delta/index.html. CLDR v38.1 is also included in ICU 68.2.

Migration note for CLDR 38.1:

    • As of v38.1 and later, cldr-tools-*.zip is no longer included in the download files. You can download or checkout the source tree directly from GitHub.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-39.md b/docs/site/downloads/cldr-39.md index 969ceebb6d9..5224a248b94 100644 --- a/docs/site/downloads/cldr-39.md +++ b/docs/site/downloads/cldr-39.md @@ -3,31 +3,7 @@ title: 'CLDR 39 Download' --- -Unicode CLDR - CLDR 39 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 39 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 39 Release Note

See Key to Header Links

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

NOTE: The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — are not yet complete. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content. The link above goes to the directory

CLDR v39 had no submission phase. Instead the focus was on modernizing the Survey Tool software, preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are too difficult to do during a submission cycle, and various other fixes. There was a major change in how Norwegian is handled, in order to align the way that the locale identifiers no, nb, and nn are used. The CLDR github repo is changing the name of “master” branch to “main” branch. The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data. Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

Data Changes

Locale Changes (Sample Link)

There were general changes across all locales:

In addition, a number of other corrections were made on a per-locale basis.

JSON Data Changes

JSON data is available at https://github.com/unicode-org/cldr-json/releases/tag/39.0.0 

It is also available in packages published under the npm version "39.0.0"

Note the following change:

- The npm packages now have individual README and LICENSE files [CLDR-14451]

Please note the following upcoming changes, planned for cldr-json in CLDR v40:

Specification Changes

The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — may not be complete by the release date. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content.

Chart Changes

Growth

The usual growth chart has been omitted, since this release had no data submission phase. For the previous version's chart, see Growth Chart (v38.x)

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. Special thanks to Jan Kučera for his work on the migration to Markdown

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

See Key to Header Links

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

NOTE: The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — are not yet complete. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content. The link above goes to the directory

CLDR v39 had no submission phase. Instead the focus was on modernizing the Survey Tool software, preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are too difficult to do during a submission cycle, and various other fixes. There was a major change in how Norwegian is handled, in order to align the way that the locale identifiers no, nb, and nn are used. The CLDR github repo is changing the name of “master” branch to “main” branch. The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data. Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

Data Changes

Locale Changes (Sample Link)

There were general changes across all locales:

In addition, a number of other corrections were made on a per-locale basis.

JSON Data Changes

JSON data is available at https://github.com/unicode-org/cldr-json/releases/tag/39.0.0 

It is also available in packages published under the npm version "39.0.0"

Note the following change:

- The npm packages now have individual README and LICENSE files [CLDR-14451]

Please note the following upcoming changes, planned for cldr-json in CLDR v40:

Specification Changes

The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — may not be complete by the release date. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content.

Chart Changes

Growth

The usual growth chart has been omitted, since this release had no data submission phase. For the previous version's chart, see Growth Chart (v38.x)

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. Special thanks to Jan Kučera for his work on the migration to Markdown

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-40.md b/docs/site/downloads/cldr-40.md index d9b10503342..20d3792d9ff 100644 --- a/docs/site/downloads/cldr-40.md +++ b/docs/site/downloads/cldr-40.md @@ -3,31 +3,7 @@ title: 'CLDR 40 Download' --- -Unicode CLDR - CLDR 40 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 40 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 40 Release Note

See Key to Header Links

Overview

Unicode CLDR  provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR v40, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case. See also: Inflection Points.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Approximately 140,000 data items were added or changed.

Data Changes

Segmentation Changes

Locale Changes

File Changes

JSON Data Changes

Specification Changes

Locale Identifiers

Dates

Units of Measurement

Growth

The chart below shows the growth over time, with the additions from the latest release in the top blue section.

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +</table>" jsaction="rcuQ6b:WYd;">

See Key to Header Links

Overview

Unicode CLDR  provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR v40, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case. See also: Inflection Points.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Approximately 140,000 data items were added or changed.

Data Changes

Segmentation Changes

Locale Changes

File Changes

JSON Data Changes

Specification Changes

Locale Identifiers

Dates

Units of Measurement

Growth

The chart below shows the growth over time, with the additions from the latest release in the top blue section.

Migration

Known Issues

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-41.md b/docs/site/downloads/cldr-41.md index 0d7b3f01f47..ee641668260 100644 --- a/docs/site/downloads/cldr-41.md +++ b/docs/site/downloads/cldr-41.md @@ -3,29 +3,7 @@ title: 'CLDR 41 Download' --- -Unicode CLDR - CLDR 41 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 41 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 41 Release Note

Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:

Data Changes

Because this is a limited-submission release, the data changes are limited. The focus for data this release was on Phase 3 of the project for providing grammatical information for units of measurement, with more locales reaching a modern coverage level, plus Phase 1 of a project to revamp Coverage levels.

Locale Changes

File Changes

JSON Data Changes

Specification Changes

The following are the main changes in the specification:

Tooling Changes

Survey Tool

Developer

Migration

Upcoming Changes

Growth

The following shows the growth of CLDR data per year, represented as an area chart. 

Known Issues

This section will contain issues that arise after the data, code, or spec has been frozen.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +" jsaction="rcuQ6b:WYd;">

Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:

Data Changes

Because this is a limited-submission release, the data changes are limited. The focus for data this release was on Phase 3 of the project for providing grammatical information for units of measurement, with more locales reaching a modern coverage level, plus Phase 1 of a project to revamp Coverage levels.

Locale Changes

File Changes

JSON Data Changes

Specification Changes

The following are the main changes in the specification:

Tooling Changes

Survey Tool

Developer

Migration

Upcoming Changes

Growth

The following shows the growth of CLDR data per year, represented as an area chart. 

Known Issues

This section will contain issues that arise after the data, code, or spec has been frozen.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-42.md b/docs/site/downloads/cldr-42.md index 66fa3ead79f..950c6a544ba 100644 --- a/docs/site/downloads/cldr-42.md +++ b/docs/site/downloads/cldr-42.md @@ -3,31 +3,7 @@ title: 'CLDR 42 Download' --- -Unicode CLDR - CLDR 42 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 42 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 42 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 42, the focus is on:

Locale Status

CLDR v42 Language Count

Data Changes

There were two areas of focus for this release: the formatting of Personal Names, and the upgrade of Modern to include many more languages.

Locale Changes

File Changes 

JSON Data Changes

Background

Formatting people’s names

Software needs to be able to format people's names, such as John Smith or 宮崎駿. The data is typically drawn from a database, where a name record will have fields for the parts of people’s names, such as a given field with a value of “Maria”, and a surname field value of “Schmidt”. 

There are many complications in dealing with the variety of different ways this needs to be done across languages:

CLDR has added structured patterns that enable implementations to format available name fields for a given language. The formatting for a name can vary according to the available name fields, the language of the name and of the viewer, and various input settings.

The new Person Name formatting data has a tech preview status. The CLDR committee is requesting feedback on the data and structure so that it can be refined and enhanced in the next release. ICU will also be offering a tech preview API in its next release. Other clients of CLDR are recommended to try out the new data and structure, and supply feedback back to the CLDR committee in the next few months.

Specification Changes

The following are the main changes in the specification:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v42 release and v41 are at v42 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Changed/Deleted. See v42 locale-growth.tsv for the detailed figures behind the chart.

CLDR v42 Growth

Migration

Known Issues

Upcoming changes

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +<center>See <a href="https://cldr.unicode.org/index/downloads#h.xq13gabuoy9w" rel="nofollow" target="_blank">Key to Header Links</a>" jsaction="rcuQ6b:WYd;">

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 42, the focus is on:

Locale Status

CLDR v42 Language Count

Data Changes

There were two areas of focus for this release: the formatting of Personal Names, and the upgrade of Modern to include many more languages.

Locale Changes

File Changes 

JSON Data Changes

Background

Formatting people’s names

Software needs to be able to format people's names, such as John Smith or 宮崎駿. The data is typically drawn from a database, where a name record will have fields for the parts of people’s names, such as a given field with a value of “Maria”, and a surname field value of “Schmidt”. 

There are many complications in dealing with the variety of different ways this needs to be done across languages:

CLDR has added structured patterns that enable implementations to format available name fields for a given language. The formatting for a name can vary according to the available name fields, the language of the name and of the viewer, and various input settings.

The new Person Name formatting data has a tech preview status. The CLDR committee is requesting feedback on the data and structure so that it can be refined and enhanced in the next release. ICU will also be offering a tech preview API in its next release. Other clients of CLDR are recommended to try out the new data and structure, and supply feedback back to the CLDR committee in the next few months.

Specification Changes

The following are the main changes in the specification:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v42 release and v41 are at v42 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Changed/Deleted. See v42 locale-growth.tsv for the detailed figures behind the chart.

CLDR v42 Growth

Migration

Known Issues

Upcoming changes

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-43.md b/docs/site/downloads/cldr-43.md index 0f250af598e..17f6609cec0 100644 --- a/docs/site/downloads/cldr-43.md +++ b/docs/site/downloads/cldr-43.md @@ -3,29 +3,7 @@ title: 'CLDR 43 Download' --- -Unicode CLDR - CLDR 43 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 43 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 43 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).

CLDR 43.1 is a dot release focused on fixing specific issues. For more details for see Version 43.1 Changes.

CLDR 43 is a limited-submission release, focusing on just a few areas:

For details, see below.

Locale Status

The bar for each coverage level increases each release. Faroese (fo) increased from Basic to Moderate, while Cherokee (chr), Lower Sorbian (dsb), and Upper Sorbian (hsb) dropped from Modern to Moderate.

CLDR v43 Coverage

Version 43.1 Changes

Version 43.1 currently in Beta. It is planned to be a dot release that addresses the following issues. The main changes are for compatibility (including parser compatibility and GB 18030-2022 Level 2 support). To access the release data, use the release tag or the json link. The following tickets are included:

GB18030-2022 Compliance

Compatibility

The following changes are included to allow for better compatibility with certain parsers.

Other


The only DTD change is the additional of alt="ascii" for time formats:

<!ATTLIST pattern alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/alphaNextToNumber, ascii, noCurrency, variant-->
<!ATTLIST dateFormatItem alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/ascii, variant-->

Data Changes

Locale Changes

File Changes

New files:

Note: All files were moved from seed to common (see the Migration section)

JSON Data Changes

See the Migration section for general data changes.

Specification Changes


Please see Modifications section in the LDML for full list of items:


Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v43 release and v42 are at v43 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Deleted/Changed.

Because this was a limited-submission release, there are a small number of changes visible.

Language Matching

CLDR has data for language matching, as in this chart. The purpose and usage is sometimes misunderstood. 

So how is this used? Consider a user whose first language is Breton. If they open an application that only has localizations for English, German, and French, then Breton will not be available. In that case, the data in CLDR can be used to select French as a fallback localization — in the absence of other information. 

That last clause is important. The CLDR data is based on the likelihood that a person using language X understands text written in language Y, but large portions of the population for X might prefer other languages. 

The CLDR language matching data can and should be overridden whenever there is more information available from a user that allows an implementation to do a better job. It is strongly recommended that systems allow users to not only specify their preferred language, but also any secondary languages in order of priority. Thus a person speaking Kazakh who also knows French could specify French as a secondary language, and get a French localization for an app instead of the CLDR match. This has been done on both Android and iOS, for example.

Important:  language matching is different from the CLDR inheritance mechanism: they serve different purposes, and are not aligned. The CLDR inheritance mechanism is how CLDR organizes localized data, and should not be used for language matching. Applications do not need to follow the CLDR inheritance chain.

References: LDML Language Matching, LDML Inheritance vs Related Information, ICU4J Locale Matcher, ICU4C Locale Matcher 

Migration

Known Issues

None currently.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.


The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see https://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +" jsaction="rcuQ6b:WYd;">

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).

CLDR 43.1 is a dot release focused on fixing specific issues. For more details for see Version 43.1 Changes.

CLDR 43 is a limited-submission release, focusing on just a few areas:

For details, see below.

Locale Status

The bar for each coverage level increases each release. Faroese (fo) increased from Basic to Moderate, while Cherokee (chr), Lower Sorbian (dsb), and Upper Sorbian (hsb) dropped from Modern to Moderate.

CLDR v43 Coverage

Version 43.1 Changes

Version 43.1 currently in Beta. It is planned to be a dot release that addresses the following issues. The main changes are for compatibility (including parser compatibility and GB 18030-2022 Level 2 support). To access the release data, use the release tag or the json link. The following tickets are included:

GB18030-2022 Compliance

Compatibility

The following changes are included to allow for better compatibility with certain parsers.

Other


The only DTD change is the additional of alt="ascii" for time formats:

<!ATTLIST pattern alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/alphaNextToNumber, ascii, noCurrency, variant-->
<!ATTLIST dateFormatItem alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/ascii, variant-->

Data Changes

Locale Changes

File Changes

New files:

Note: All files were moved from seed to common (see the Migration section)

JSON Data Changes

See the Migration section for general data changes.

Specification Changes


Please see Modifications section in the LDML for full list of items:


Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v43 release and v42 are at v43 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Deleted/Changed.

Because this was a limited-submission release, there are a small number of changes visible.

Language Matching

CLDR has data for language matching, as in this chart. The purpose and usage is sometimes misunderstood. 

So how is this used? Consider a user whose first language is Breton. If they open an application that only has localizations for English, German, and French, then Breton will not be available. In that case, the data in CLDR can be used to select French as a fallback localization — in the absence of other information. 

That last clause is important. The CLDR data is based on the likelihood that a person using language X understands text written in language Y, but large portions of the population for X might prefer other languages. 

The CLDR language matching data can and should be overridden whenever there is more information available from a user that allows an implementation to do a better job. It is strongly recommended that systems allow users to not only specify their preferred language, but also any secondary languages in order of priority. Thus a person speaking Kazakh who also knows French could specify French as a secondary language, and get a French localization for an app instead of the CLDR match. This has been done on both Android and iOS, for example.

Important:  language matching is different from the CLDR inheritance mechanism: they serve different purposes, and are not aligned. The CLDR inheritance mechanism is how CLDR organizes localized data, and should not be used for language matching. Applications do not need to follow the CLDR inheritance chain.

References: LDML Language Matching, LDML Inheritance vs Related Information, ICU4J Locale Matcher, ICU4C Locale Matcher 

Migration

Known Issues

None currently.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.


The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see https://cldr.unicode.org/index/charts.

Page updated
Report abuse
diff --git a/docs/site/downloads/cldr-44.md b/docs/site/downloads/cldr-44.md index 822504b2c27..d81a5a3d763 100644 --- a/docs/site/downloads/cldr-44.md +++ b/docs/site/downloads/cldr-44.md @@ -3,31 +3,7 @@ title: 'CLDR 44 Download' --- -Unicode CLDR - CLDR 44 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 44 Release Note

🏗 The CLDR site has been migrated to a new platform. Formatting and links continue to be fixed.

CLDR 44 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 44, the focus is on:

Locale Coverage Status

The coverage status determines how well languages are supported on laptops, phones, and other computing devices. In particular, qualifying at a Basic level is typically a requirement just for being selectable on phones as a language. Note that for each language there are typically multiple locales, so 90 languages at Modern coverage corresponds to more than 350 locales at that coverage.

Below is the coverage in this release:

CLDR v44 Coverage

Version 44.1 Changes

DTD Changes 

Specification Changes 

Data Changes 

Data Changes

The following is a summary of the DTD changes which reflect changes in the structure. The relevant ones are described more fully in the data changes.

LDML

Supplemental Data

BCP47 

Keyboards

 Locale Changes

File Changes

(Aside from locale files)

Additions:

New XSD files in /common/dtd/. 

These correspond to the DTDs, but do not carry the extra validity annotations.

New Test Data files in /common/testData/

Removals:

Files with insufficient data

Old format keyboards were removed (see Migration):

JSON Data Changes

Keyboard Changes

Keyboard has a new DTD (keyboard3.dtd and the <keyboard3> element). This is a complete rewrite of the specification by the Keyboard Subcommittee, and is available as a technical preview in CLDR version 44. See TR35 Part 7: Keyboards. The prior DTDs are included in CLDR but are not used by CLDR data or tooling. Note: prior keyboard data files are not compatible, were not maintained and have also been removed.

Note that there are additional sample keyboard data files in progress which were not complete for v44, but may be consulted as samples:

See the Known Issues section for additional known issues.

Specification Changes

Please see Modifications section in the draft spec for the list of current changes.

A diff of the changes since CLDR 43 can be viewed here in GitHub, which was last updated on 6 October 2023. Clicking on the rich-diff icon for a page ( 📄 ) will often show the differences with a rich diff, such as the following:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data; nor does it include corrections (which typically outnumber new items). The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

There were generally a relatively small number of additions this cycle; the focus was improvements in quality, and changes will not show up below.

Migration

Known Issues

These are not always the same. In the future, some of these functions will be separated out; see CLDR-17095.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse
\ No newline at end of file +*Note: For NPM, the JSON data uses version 44.0.1" jsaction="rcuQ6b:WYd;">

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

In CLDR 44, the focus is on:

Locale Coverage Status

The coverage status determines how well languages are supported on laptops, phones, and other computing devices. In particular, qualifying at a Basic level is typically a requirement just for being selectable on phones as a language. Note that for each language there are typically multiple locales, so 90 languages at Modern coverage corresponds to more than 350 locales at that coverage.

Below is the coverage in this release:

CLDR v44 Coverage

Version 44.1 Changes

DTD Changes 

Specification Changes 

Data Changes 

Data Changes

The following is a summary of the DTD changes which reflect changes in the structure. The relevant ones are described more fully in the data changes.

LDML

Supplemental Data

BCP47 

Keyboards

 Locale Changes

File Changes

(Aside from locale files)

Additions:

New XSD files in /common/dtd/. 

These correspond to the DTDs, but do not carry the extra validity annotations.

New Test Data files in /common/testData/

Removals:

Files with insufficient data

Old format keyboards were removed (see Migration):

JSON Data Changes

Keyboard Changes

Keyboard has a new DTD (keyboard3.dtd and the <keyboard3> element). This is a complete rewrite of the specification by the Keyboard Subcommittee, and is available as a technical preview in CLDR version 44. See TR35 Part 7: Keyboards. The prior DTDs are included in CLDR but are not used by CLDR data or tooling. Note: prior keyboard data files are not compatible, were not maintained and have also been removed.

Note that there are additional sample keyboard data files in progress which were not complete for v44, but may be consulted as samples:

See the Known Issues section for additional known issues.

Specification Changes

Please see Modifications section in the draft spec for the list of current changes.

A diff of the changes since CLDR 43 can be viewed here in GitHub, which was last updated on 6 October 2023. Clicking on the rich-diff icon for a page ( 📄 ) will often show the differences with a rich diff, such as the following:

Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data; nor does it include corrections (which typically outnumber new items). The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

There were generally a relatively small number of additions this cycle; the focus was improvements in quality, and changes will not show up below.

Migration

Known Issues

These are not always the same. In the future, some of these functions will be separated out; see CLDR-17095.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.

Page updated
Report abuse