Skip to content

Commit

Permalink
CLDR-17566 Converting ddl, bcp47-extension, chars, keyboard-workgroup…
Browse files Browse the repository at this point in the history
…, process, and survey-tool
  • Loading branch information
chpy04 committed Jun 7, 2024
1 parent 393350f commit 8549753
Show file tree
Hide file tree
Showing 14 changed files with 499 additions and 33 deletions.
2 changes: 1 addition & 1 deletion docs/site/TEMP-TEXT-FILES/ddl.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CLDR DDL Subcommittee
The Common Locale Data Repository (CLDR) is widely used, and the content has grown dramatically over the years with participation by organizations of all types and sizes, as well as many individual contributors.
The Common Locale Data Repository (CLDR) is widely used, and the content has grown dramatically over the years with participation by organizations of all types and sizes, as well as many individual contributors.
Contributors for Digitally Disadvantaged Languages (DDL) face unique challenges. The CLDR-DDL subcommittee has been formed to evaluate mechanisms to make it easier for contributors for DDLs to:
become contributors to CLDR
improve the coverage for their language in CLDR
Expand Down
16 changes: 8 additions & 8 deletions docs/site/TEMP-TEXT-FILES/index-bcp47-extension.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ The subtags available for use in the 'u' extension provide language tag extensio
The subtags available for use in the 't' extension provide language tag extensions that provide for additional information needed for identifying transformed content, or a request to transform content in a certain way. For example, the language tag "ja-Kana-t-it" can be used as a content tag indicates Japanese Katakana transformed from Italian. It can also be used as a request for a given transformation.
For more details on the valid subtags for these extensions, their syntax, and their meanings, see LDML Section 3.7 Unicode BCP 47 Extension Data.
Machine-Readable Files for Validity Testing
Beginning with CLDR version 1.7.2, machine-readable files are available listing the valid attributes, keys, and types for each successive version of LDML. The most recently released version is always available at http://unicode.org/Public/cldr/latest/ in a file of the form cldr-common*.zip (in older versions the file was of the form cldr-core*.zip). Inside that file, the directory "common/bcp47/" contains the data files defining the valid attributes, keys, and types.
Beginning with CLDR version 1.7.2, machine-readable files are available listing the valid attributes, keys, and types for each successive version of LDML. The most recently released version is always available at http://unicode.org/Public/cldr/latest/ in a file of the form cldr-common*.zip (in older versions the file was of the form cldr-core*.zip). Inside that file, the directory "common/bcp47/" contains the data files defining the valid attributes, keys, and types.
The BCP47 data is also currently maintained in a source code repository, with each release tagged, for viewing directly without unzipping. For example, see https://github.com/unicode-org/cldr/tree/release-38/common/bcp47. The current development snapshot is found at https://github.com/unicode-org/cldr/tree/master/common/bcp47.
All releases including the latest are listed on http://cldr.unicode.org/index/downloads, with a link to each respective data directory under the column heading Data, and direct access to the repository under the GitHub Tag.
For example, the timezone.xml file looks like the following:
<keyword>
<key name="tz" alias="timezone">
<type name="adalv" alias="Europe/Andorra"/>
<type name="aedxb" alias="Asia/Dubai"/>
For example, the timezone.xml file looks like the following:
<keyword>
<key name="tz" alias="timezone">
<type name="adalv" alias="Europe/Andorra"/>
<type name="aedxb" alias="Asia/Dubai"/>
Using this data, an implementation would determine that "fr-u-tz-adalv" and fr-u-tz-aedxb" are both valid. Some data in the CLDR data files also requires reference to LDML for validation according to Appendix Q of LDML. For example, LDML defines the type 'codepoints' to define specific code point ranges in Unicode for specific purposes.
Version Information
The following is not necessary for correct validation of the -u- extension, but may be useful for some readers.
The following is not necessary for correct validation of the -u- extension, but may be useful for some readers.
Each release has an associated data directory of the form "http://unicode.org/Public/cldr/<version>", where "<version>" is replaced by the release number. The version number for any file is given by the directory where it was downloaded from. If that information is no longer available, the version can still be accessed by looking at the common/dtd/ldml.dtd file in the cldr-common*.zip file (for older versions, the core.zip file), at the element cldrVersion, such as the following. This information is also accessible with a validating XML parser.
<!ATTLIST version cldrVersion CDATA #FIXED "1.8" >
For each release after CLDR 1.8, types introduced in that release are also marked in the data files by the XML attribute "since", such as in the following example: <type name="adp" since="1.9"/>
For each release after CLDR 1.8, types introduced in that release are also marked in the data files by the XML attribute "since", such as in the following example: <type name="adp" since="1.9"/>
6 changes: 3 additions & 3 deletions docs/site/TEMP-TEXT-FILES/index-charts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ Delta Data - Data that changed in the current release.
Delta DTDs - Differences between CLDR DTD's over time.
Locale-Based Data
Verification - Constructed data for verification: Dates, Timezones, Numbers
Summary - Provides a summary view of the main locale data. Language locales (those with no territory or variant) are presented with fully resolved data; the inherited or aliased data can be hidden if desired. Other locales do not show inherited or aliased data, just the differences from the respective language locale. The English value is provided for comparison (shown as "=" if it is equal to the localized value, and n/a if not available. The Sublocales column shows variations across locales. Hovering over each Sublocale value shows a pop-up with the locales that have that value.
Summary - Provides a summary view of the main locale data. Language locales (those with no territory or variant) are presented with fully resolved data; the inherited or aliased data can be hidden if desired. Other locales do not show inherited or aliased data, just the differences from the respective language locale. The English value is provided for comparison (shown as "=" if it is equal to the localized value, and n/a if not available). The Sublocales column shows variations across locales. Hovering over each Sublocale value shows a pop-up with the locales that have that value.
By-Type - provides a side-by-side comparison of data from different locales for each field. For example, one can see all the locales that are left-to-right, or all the different translaitons of the Arabic script across languages. Data that is unconfimred or provisional is marked by a red-italic locale ID, such as ·bn_BD·.
Character Annotations - The CLDR emoji character annotations.
Subdivision Names - The (draft) CLDR subdivision names (names for states, provinces, cantons, etc.).
Collation Tailorings - Collation charts (draft) for CLDR locales.
Other Data
Supplemental Data - General data that is not part of the locale hierarchy but is still part of CLDR. Includes: plural rules, day-period rules, language matching, language-script information, territories (countries), and their subdivisions, timezones, and so on.
Transform - (Disabled temporarily) Some of the transforms in CLDR: the transliterations between different scripts. For more on transliterations, see Transliteration Guidelines.
Keyboards - Provides a view of keyboard data: layouts for different locales, mappings from characters to keyboards, and from keyboards to characters.
Transform - (Disabled temporarily) Some of the transforms in CLDR: the transliterations between different scripts. For more on transliterations, see Transliteration Guidelines.
Keyboards - Provides a view of keyboard data: layouts for different locales, mappings from characters to keyboards, and from keyboards to characters.
For more details on the locale data collection process, please see the CLDR process. For filing or viewing bug reports, see CLDR Bug Reports.
14 changes: 7 additions & 7 deletions docs/site/TEMP-TEXT-FILES/index-keyboard-workgroup.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@ CLDR Keyboard Subcommittee
The CLDR Keyboard Subcommittee is developing a new cross-platform standard XML format for use by keyboard authors for inclusion in the CLDR source repository.
News
2023-Feb-29: The CLDR-TC has authorized the proposed specification to be released as stable (out of Technical Preview).
2023-May-15: The CLDR-TC has authorized Public Review Issue #476 of the proposed specification, as a Technical Preview. The PRI closed on 2023-Jul-15.
2023-May-15: The CLDR-TC has authorized Public Review Issue #476 of the proposed specification, as a "Technical Preview." The PRI closed on 2023-Jul-15.
Background
CLDR (Common Locale Data Repository)
Computing devices have become increasingly personal and increasingly affordable to the point that they are now within reach of most people on the planet. The diverse linguistic requirements of the worlds 7+ billion people do not scale to traditional models of software development. In response to this, Unicode CLDR has emerged as a standards-based solution that empowers specialist and community input, as a means of balancing the needs of language communities with the technologies of major platform and service providers.
Computing devices have become increasingly personal and increasingly affordable to the point that they are now within reach of most people on the planet. The diverse linguistic requirements of the world's 7+ billion people do not scale to traditional models of software development. In response to this, Unicode CLDR has emerged as a standards-based solution that empowers specialist and community input, as a means of balancing the needs of language communities with the technologies of major platform and service providers.
The challenge and promise of Keyboards
Text input is a core component of most computing experiences and is most commonly achieved using a keyboard, whether hardware or virtual (on-screen or touch). However, keyboard support for most of the worlds languages is either completely missing or often does not adequately support the input needs of language communities. Improving text input support for minority languages is an essential part of the Unicode mission.
Text input is a core component of most computing experiences and is most commonly achieved using a keyboard, whether hardware or virtual (on-screen or touch). However, keyboard support for most of the world's languages is either completely missing or often does not adequately support the input needs of language communities. Improving text input support for minority languages is an essential part of the Unicode mission.
Keyboard data is currently completely platform-specific. Consequently, language communities and other keyboard authors must see their designs developed independently for every platform/operating system, resulting in unnecessary duplication of technical and organizational effort.
There is no central repository or contact point for this data, meaning that such authors must separately and independently contact all platform/operating system developers.
LDML: The universal interchange format for keyboards
The CLDR Keyboard Subcommittee is currently rewriting and redeveloping the existing LDML (XML) definition for keyboards (UTS#35 part 7) in order to define core keyboard-based text input requirements for the worlds languages. This format allows the physical and virtual (on-screen or touch) keyboard layouts for a language to be defined in a single file. Input Method Editors (IME) or other input methods are not currently in scope for this format.
CLDR: A home for the worlds newest keyboards
The CLDR Keyboard Subcommittee is currently rewriting and redeveloping the existing LDML (XML) definition for keyboards (UTS#35 part 7) in order to define core keyboard-based text input requirements for the world's languages. This format allows the physical and virtual (on-screen or touch) keyboard layouts for a language to be defined in a single file. Input Method Editors (IME) or other input methods are not currently in scope for this format.
CLDR: A home for the world's newest keyboards
Today, there are many existing platform-specific implementations and keyboard definitions. This project does not intend to remove or replace existing well-established support.
The goal of this project is that, where otherwise unsupported languages are concerned, CLDR becomes the common source for keyboard data, for use by platform/operating system developers and vendors.
As a result, CLDR will also become the point of contact for keyboard authors and language communities to submit new or updated keyboard layouts to serve those user communities. CLDR has already become the definitive and publicly available source for the world's locale data.
Unicode: Enabling the worlds languages
As a result, CLDR will also become the point of contact for keyboard authors and language communities to submit new or updated keyboard layouts to serve those user communities. CLDR has already become the definitive and publicly available source for the world's locale data.
Unicode: Enabling the world's languages
Keyboard support is part of a multi-step, often multi-year process of enabling a new language or script.
Three critical parts of initial support for a language in content are:
Encoding, in the Unicode Standard
Expand Down
Loading

0 comments on commit 8549753

Please sign in to comment.