diff --git a/docs/site/TEMP-TEXT-FILES/ddl.txt b/docs/site/TEMP-TEXT-FILES/ddl.txt new file mode 100644 index 00000000000..ffc93725ea4 --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/ddl.txt @@ -0,0 +1,7 @@ +CLDR DDL Subcommittee +The Common Locale Data Repository (CLDR) is widely used, and the content has grown dramatically over the years with participation by organizations of all types and sizes, as well as many individual contributors. +Contributors for Digitally Disadvantaged Languages (DDL) face unique challenges. The CLDR-DDL subcommittee has been formed to evaluate mechanisms to make it easier for contributors for DDLs to: +become contributors to CLDR +improve the coverage for their language in CLDR +raise the status of their contributions, so that the CLDR data for their language is incorporated into more products. +The DDL Subcommittee has started to meet every other week as of June, 2023. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/index-bcp47-extension.txt b/docs/site/TEMP-TEXT-FILES/index-bcp47-extension.txt new file mode 100644 index 00000000000..af25e2b780a --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/index-bcp47-extension.txt @@ -0,0 +1,20 @@ +Unicode Extensions for BCP 47 +IETF BCP 47 Tags for Identifying Languages defines the language identifiers (tags) used on the Internet and in many standards. It has an extension mechanism that allows additional information to be included. The Unicode Consortium is the maintainer of the extension ‘u’ for Locale Extensions, as described in rfc6067, and the extension 't' for Transformed Content, as described in rfc6497. +The subtags available for use in the 'u' extension provide language tag extensions that provide for additional information needed for identifying locales. The 'u' subtags consist of a set of keys and associated values (types). For example, a locale identifier for British English with numeric collation has the following form: en-GB-u-kn-true +The subtags available for use in the 't' extension provide language tag extensions that provide for additional information needed for identifying transformed content, or a request to transform content in a certain way. For example, the language tag "ja-Kana-t-it" can be used as a content tag indicates Japanese Katakana transformed from Italian. It can also be used as a request for a given transformation. +For more details on the valid subtags for these extensions, their syntax, and their meanings, see LDML Section 3.7 Unicode BCP 47 Extension Data. +Machine-Readable Files for Validity Testing +Beginning with CLDR version 1.7.2, machine-readable files are available listing the valid attributes, keys, and types for each successive version of LDML. The most recently released version is always available at http://unicode.org/Public/cldr/latest/ in a file of the form cldr-common*.zip (in older versions the file was of the form cldr-core*.zip). Inside that file, the directory "common/bcp47/" contains the data files defining the valid attributes, keys, and types. +The BCP47 data is also currently maintained in a source code repository, with each release tagged, for viewing directly without unzipping. For example, see https://github.com/unicode-org/cldr/tree/release-38/common/bcp47. The current development snapshot is found at https://github.com/unicode-org/cldr/tree/master/common/bcp47. +All releases including the latest are listed on http://cldr.unicode.org/index/downloads, with a link to each respective data directory under the column heading Data, and direct access to the repository under the GitHub Tag. +For example, the timezone.xml file looks like the following: + + + + +Using this data, an implementation would determine that "fr-u-tz-adalv" and fr-u-tz-aedxb" are both valid. Some data in the CLDR data files also requires reference to LDML for validation according to Appendix Q of LDML. For example, LDML defines the type 'codepoints' to define specific code point ranges in Unicode for specific purposes. +Version Information +The following is not necessary for correct validation of the -u- extension, but may be useful for some readers. +Each release has an associated data directory of the form "http://unicode.org/Public/cldr/", where "" is replaced by the release number. The version number for any file is given by the directory where it was downloaded from. If that information is no longer available, the version can still be accessed by looking at the common/dtd/ldml.dtd file in the cldr-common*.zip file (for older versions, the core.zip file), at the element cldrVersion, such as the following. This information is also accessible with a validating XML parser. + +For each release after CLDR 1.8, types introduced in that release are also marked in the data files by the XML attribute "since", such as in the following example: \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/index-charts.txt b/docs/site/TEMP-TEXT-FILES/index-charts.txt new file mode 100644 index 00000000000..a5210ca0799 --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/index-charts.txt @@ -0,0 +1,23 @@ +CLDR Charts +The Unicode CLDR Charts provide different ways to view the Common Locale Data Repository data. +Latest - The charts for the latest release version +Dev - A snapshot of data under development +Previous - Previous available charts are linked from the download page in the Charts column +The format of most of the fields in the charts will be clear from the Name and ID, such as the months of the year. The format for others, such as the date or time formats, is structured and requires more interpretation. For more information, see UTS #35: Locale Data Markup Language (LDML). +Most charts have "double links" somewhere in each row. These are links that put the address of that row into the address bar of the browser for copying. +Note that not all CLDR data is included in the charts. +Version Deltas +Delta Data - Data that changed in the current release. +Delta DTDs - Differences between CLDR DTD's over time. +Locale-Based Data +Verification - Constructed data for verification: Dates, Timezones, Numbers +Summary - Provides a summary view of the main locale data. Language locales (those with no territory or variant) are presented with fully resolved data; the inherited or aliased data can be hidden if desired. Other locales do not show inherited or aliased data, just the differences from the respective language locale. The English value is provided for comparison (shown as "=" if it is equal to the localized value, and n/a if not available). The Sublocales column shows variations across locales. Hovering over each Sublocale value shows a pop-up with the locales that have that value. +By-Type - provides a side-by-side comparison of data from different locales for each field. For example, one can see all the locales that are left-to-right, or all the different translaitons of the Arabic script across languages. Data that is unconfimred or provisional is marked by a red-italic locale ID, such as ·bn_BD·. +Character Annotations - The CLDR emoji character annotations. +Subdivision Names - The (draft) CLDR subdivision names (names for states, provinces, cantons, etc.). +Collation Tailorings - Collation charts (draft) for CLDR locales. +Other Data +Supplemental Data - General data that is not part of the locale hierarchy but is still part of CLDR. Includes: plural rules, day-period rules, language matching, language-script information, territories (countries), and their subdivisions, timezones, and so on. +Transform - (Disabled temporarily) Some of the transforms in CLDR: the transliterations between different scripts. For more on transliterations, see Transliteration Guidelines. +Keyboards - Provides a view of keyboard data: layouts for different locales, mappings from characters to keyboards, and from keyboards to characters. +For more details on the locale data collection process, please see the CLDR process. For filing or viewing bug reports, see CLDR Bug Reports. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/index-keyboard-workgroup.txt b/docs/site/TEMP-TEXT-FILES/index-keyboard-workgroup.txt new file mode 100644 index 00000000000..4cf1cca777c --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/index-keyboard-workgroup.txt @@ -0,0 +1,37 @@ +CLDR Keyboard Subcommittee +The CLDR Keyboard Subcommittee is developing a new cross-platform standard XML format for use by keyboard authors for inclusion in the CLDR source repository. +News +2023-Feb-29: The CLDR-TC has authorized the proposed specification to be released as stable (out of Technical Preview). +2023-May-15: The CLDR-TC has authorized Public Review Issue #476 of the proposed specification, as a "Technical Preview." The PRI closed on 2023-Jul-15. +Background +CLDR (Common Locale Data Repository) +Computing devices have become increasingly personal and increasingly affordable to the point that they are now within reach of most people on the planet. The diverse linguistic requirements of the world's 7+ billion people do not scale to traditional models of software development. In response to this, Unicode CLDR has emerged as a standards-based solution that empowers specialist and community input, as a means of balancing the needs of language communities with the technologies of major platform and service providers. +The challenge and promise of Keyboards +Text input is a core component of most computing experiences and is most commonly achieved using a keyboard, whether hardware or virtual (on-screen or touch). However, keyboard support for most of the world's languages is either completely missing or often does not adequately support the input needs of language communities. Improving text input support for minority languages is an essential part of the Unicode mission. +Keyboard data is currently completely platform-specific. Consequently, language communities and other keyboard authors must see their designs developed independently for every platform/operating system, resulting in unnecessary duplication of technical and organizational effort. +There is no central repository or contact point for this data, meaning that such authors must separately and independently contact all platform/operating system developers. +LDML: The universal interchange format for keyboards +The CLDR Keyboard Subcommittee is currently rewriting and redeveloping the existing LDML (XML) definition for keyboards (UTS#35 part 7) in order to define core keyboard-based text input requirements for the world's languages. This format allows the physical and virtual (on-screen or touch) keyboard layouts for a language to be defined in a single file. Input Method Editors (IME) or other input methods are not currently in scope for this format. +CLDR: A home for the world's newest keyboards +Today, there are many existing platform-specific implementations and keyboard definitions. This project does not intend to remove or replace existing well-established support. +The goal of this project is that, where otherwise unsupported languages are concerned, CLDR becomes the common source for keyboard data, for use by platform/operating system developers and vendors. +As a result, CLDR will also become the point of contact for keyboard authors and language communities to submit new or updated keyboard layouts to serve those user communities. CLDR has already become the definitive and publicly available source for the world's locale data. +Unicode: Enabling the world's languages +Keyboard support is part of a multi-step, often multi-year process of enabling a new language or script. +Three critical parts of initial support for a language in content are: +Encoding, in the Unicode Standard +Display, including fonts and text layout +Input +Today, the vast majority of the languages of the world are already in the Unicode encoding. The open-source Noto font provides a wide range of fonts to support display, and the Unicode character properties play a vital role in display. However, input support often lags many years behind when a script is added to Unicode. +The LDML keyboard format, and the CLDR repository, will make it much easier to deliver text input. +Common Questions +What is the history of this effort? +In 2012, the original LDML keyboard format was designed to describe keyboards for comparative purposes. In 2018, a PRI was created soliciting further feedback. +The CLDR Keyboard Subcommittee was formed and has been meeting since mid-2020. It quickly became apparent that the existing LDML format was insufficient for implementing new keyboard layouts. +What is the current status? +Release +Updates to LDML (UTS#35) Part 7: Keyboards are scheduled to be released as part of CLDR v45. +Implementations +The SIL Keyman project is actively working on an open-source implementation of the LDML format. +How can I get involved? +If you want to be engaged in this workgroup, please contact the CLDR Keyboard Subcommittee via the Unicode contact form. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/index-process.txt b/docs/site/TEMP-TEXT-FILES/index-process.txt new file mode 100644 index 00000000000..58a7f404862 --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/index-process.txt @@ -0,0 +1,144 @@ +CLDR Process +Introduction +This document describes the Unicode CLDR Technical Committee's process for data collection, resolution, public feedback and release. +The process is designed to be light-weight; in particular, the meetings are frequent, short, and informal. Most of the work is by email or phone, with a database recording requested changes (See change request). +When gathering data for a region and language, it is important to have multiple sources for that data to produce the most commonly used data. The initial versions of the data were based on best available sources, and updates with new and improvements are released twice a year with work by contributors inside and outside of the Unicode Consortium. +It is important to note that CLDR is a Repository, not a Registration. That is, contributors should NOT expect that their suggestions will simply be adopted into the repository; instead, it will be vetted by other contributors. +The CLDR Survey Tool is the main channel for collecting data, and bug/feature request are tracked in a database (CLDR Bug Reports). +The final approval of the release of any version of CLDR is up to the decision of the CLDR Technical Committee. +Formal Technical Committee Procedures +For more information on the formal procedures for the Unicode CLDR Technical Committee, see the Technical Committee Procedures for the Unicode Consortium. +Specification Changes +The UTS #35: Locale Data Markup Language (LDML) specification are kept up to date with each release with change/added structure for new data types or other features. +Requests for changes are entered in the bug/feature request database (CLDR Bug Reports). +Structural changes are always backwards-compatible. That is, previous files will continue to work. Deprecated elements remain, although their usage is strongly discouraged. +There is a standing policy for structural changes that require non-trivial code for proper implementation, such as time zone fallback or alias mechanisms. These require design discussions in the Technical Committee that demonstrates correct function according to the proposed specification. +Data- Submission and Vetting +The contributors of locale data are expected to be language speakers residing in the country/region. In particular, national standards organizations are encouraged to be involved in the data vetting process. +There are two types of data in the repository: +Core data (See Core data for new locales): The content is collected from language experts typically with a CLDR Technical Committee member involvement, and is reviewed by the committee. This is required for a new language to be added in CLDR. See also Exemplar Character Sources. +Common locale data: This is the bulk of the CLDR data and data collection occurs twice a year using the Survey tool. (See How to Contribute.) +The following 4 states are used to differentiate the data contribution levels. The initial data contributions are normally marked as draft; this may be changed once the data is vetted. +Level 1: unconfirmed +Level 2: provisional +Level 3: contributed (= minimally approved) +Level 4: approved (equivalent to an absent draft attribute) +Implementations may choose the level at which they wish to accept data. They may choose to accept even unconfirmed data if having some data is better than no data for their purpose. Approved data are vetted by language speakers; however, this does not mean that the data is guaranteed to be error-free -- this is simply the best judgment of the vetters and the committee according to the process. +Survey Tool User Levels +There are multiple levels of access and control: +Vetter Level Number of Votes Description +TC Member 50 / 6 or 4 - Manage users in their organization +- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.) +- Can see the email addresses for all vetters in their organization +- Only uses a 50 vote for items agreed to by the CLDR technical Committee +- TC members may have a 6 or 4 regular vote depending on how actively their organization participates in the TC +TC Organization Managers 6 - Manage users in their organization +- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.) +- Can see the email addresses for all vetters in their organization +Organization Managers 4 -Manage users in their organization +- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.) +- Can see the email addresses for all vetters in their organization +TC Organization Vetter 6 - Can vet and submit data for a particular set of locales. +- Can see the email addresses for submitted data in their locales. +- Cannot manage other users. +Organization Vetter 4 - Can vet and submit data for a particular set of locales +- Can see the email addresses for submitted data in their locales. +- Cannot manage other users. +Guest Vetter 1 - Can vet and submit data for a particular set of locales +- Cannot see email addresses. +- Cannot manage other users. +Locked Vetter 0 - If a user is locked or removed, then their vote is considered a zero weight. +These levels are decided by the technical committee and the TC representative for the respective organizations. +Unicode TC members (full/institutional/supporting) can assign its users to Regular or Guest level, and with approval of the TC, users at the Expert level. +TC Organizations that are fully engaged in the CLDR Technical Committee are given a higher vote level of 6 votes to reflect their level of expertise and coordination in the working of CLDR and the survey tool as compared to the normal organization vote level of 4 votes +Liaison or associate members can assign to Guest, or to other levels with approval of the TC. +The liaison/associate member him/herself gets TC status in order to manage users, but gets a Guest status in terms of voting, unless the committee approves a higher level. +Users assigned to "unicode.org" are normally assigned as Guest, but the committee can assign a different level. +Voting Process +Each user gets a vote on each value, but the strength of the vote varies according to the user level (see table above). +For each value, each organization gets a vote based on the maximum (not cumulative) strength of the votes of its users who voted on that item. +For example, if an organization has 10 Vetters for one locale, if the highest user level who voted has user level of 4 votes, then the vote count attributed to the organization as a whole is 4 for that item. +Optimal Field Value +For each release, there is one optimal field value determined by the following: +Add up the votes for each value from each organization. +Sort the possible alternative values for a given field +by the most votes (descending) +then by UCA order of the values (ascending) +The first value is the optimal value (O). +The second value (if any) is the next best value (N). +Draft Status of Optimal Field Value +Let O be the optimal value's vote, N be the vote of the next best value (or zero if there is none), and G be the number of organizations that voted for the optimal value. Let oldStatus be the draft status of the previously released value. +Assign the draft status according to the first of the conditions below that applies: +Resulting Draft Status Condition +approved - O > N and O ≥ 8, for established locales* +- O > N and O ≥ 4, for other locales +contributed - O > N and O ≥ 4 and oldstatus < contributed +- O > N and O ≥ 2 and G ≥ 2 +provisional O ≥ N and O ≥ 2 +unconfirmed otherwise +Established locales are currently found in coverageLevels.xml, with approvalRequirement[@votes="8"] +Some specific items have an even higher threshold. See approvalRequirement elements in coverageLevels.xml for details. +If the oldStatus is better than the new draft status, then no change is made. Otherwise, the optimal value and its draft status are made part of the new release. +For example, if the new optimal value does not have the status of approved, and the previous release had an approved value (one that does not have an error and is not a fallback), then that previously-released value stays approved and replaces the optimal value in the following steps. +It is difficult to develop a formulation that provides for stability, yet allows people to make needed changes. The CLDR committee welcomes suggestions for tuning this mechanism. Such suggestions can be made by filing a new ticket. +Data- Resolution +After the contribution of collecting and vetting data, the data needs to be refined free of errors for the release: +Collisions errors are resolved by retaining one of the values and removing the other(s). +The resolution choice is based on the judgment of the committee, typically according to which field is most commonly used. +When an item is removed, an alternate may then become the new optimal value. +All values with errors are removed. +Non-optimal values are handled as follows +Those with no votes are removed. +Those with votes are marked with alt=proposed and given the draft status: unconfirmed +If a locale does not have minimal data (at least at a provisional level), then it may be excluded from the release. Where this is done, it may be restored to the repository for the next submission cycle. +This process can be fine-tuned by the Technical Committee as needed, to resolve any problems that turn up. A committee decision can also override any of the above process for any specific values. +For more information see the key links in CLDR Survey Tool (especially the Vetting Phase). +Notes: +If data has a formal problem, it can be fixed directly (in CVS) without going through the above process. Examples include: +syntactic problems in pattern, extra trailing spaces, inconsistent decimals, mechanical sweeps to change attributes, translatable characters not quoted in patterns, changing ' (punctuation mark) to curly apostrophe or s-cedilla to s-comma-below, removing disallowed exemplar characters (non-letter, number, mark, uppercase when there is a lowercase). +These are changed in-place, without changing the draft status. +Linguistically-sensitive data should always go through the survey tool. Examples include: +names of months, territories, number formats, changing ASCII apostrophe to U+02BC modifier letter apostrophe or U+02BB modifier letter turned comma, or U+02BD modifier letter reversed comma, adding/removing normal exemplar characters. +The TC committee can authorize bulk submissions of new data directly (CVS), with all new data marked draft="unconfirmed" (or other status decided by the committee), but only where the data passes the CheckCLDR console tests. +The survey tool does not currently handle all CLDR data. For data it doesn't cover, the regular bug system is used to submit new data or ask for revisions of this data. In particular: +Collation, transforms, or text segmentation, which are more complex. +For collation data, see the comparison charts at http://www.unicode.org/cldr/comparison_charts.html or the XML data at http://unicode.org/cldr/data/common/collation/ +For transforms, see the XML data at http://unicode.org/cldr/data/common/transforms/ +Non-linguistic locale data: +XML data: http://unicode.org/cldr/data/common/supplemental/ +HTML view: http://www.unicode.org/cldr/data/diff/supplemental/supplemental.html +Prioritization +There may be conflicting common practices or standards for a given country and language. Thus LDML provides keyword variants to reflect the different practices (for example, for German it allows the distinction between PHONEBOOK and DICTIONARY collation.). +When there is an existing national standard for a country that is widely accepted in practice, the goal is to follow that standard as much as possible. Where the common practice in the country deviates from the national standard, or if there are multiple conflicting common practices, or options in conforming to the national standard, or conflicting national standards, multiple variants may be entered into the CLDR, distinguished by keyword variants or variant locale identifiers. +Where a data value is identified as following a particular national standard (or other reference), the goal is to keep that data aligned with that standard. There is, however, no guarantee that data will be tagged with any or all of the national standards that it follows. +Maintenance Releases +Maintenance releases, such as 26.1, are issued whenever the standard identifiers change (that is, BCP 47 identifiers, Time zone identifiers, or ISO 4217 Currency identifiers). Updates to identifiers will also mean updating the English names for those identifiers. +Corrigenda may also be included in maintenance releases. Maintenance releases may also be issued if there are substantive changes to supplemental data (non-language such as script info, transforms) data or other critical data changes that impact the CLDR data users community. +The structure and DTD may change, but except for additions or for small bug fixes, data will not be changed in a way that would affect the content of resolved data. +Data Retention Policy +Public Feedback Process +The public can supply formal feedback into CLDR via the Survey Tool or by filing a Bug Report or Feature Request. There is also a public forum for questions at CLDRMailing List (details on archives are found there). +There is also a members-only CLDRmailing list for members of the CLDR Technical Committee. +Public Review Issues may be posted in cases where broader public feedback is desired on a particular issue. +Be aware that changes and updates to CLDR will only be taken in response to information entered in the Survey Tool or by filing a Bug Report or Feature Request. Discussion on public mailing lists is not monitored; no actions will be taken in response to such discussion -- only in response to filed bugs. The process of checking and entering data takes time and effort; so even when bugs/feature requests are accepted, it may take some time before they are in a release of CLDR. +Data Release Process +Version Numbering +The locale data is frozen per version. Once a version is released, it is never modified. Any changes, however minor, will mean a newer version of the locale data being released. The version numbering scheme is "xy.z", where z is incremented for maintenance releases, and xy is incremented for regular semi-annual releases as defined by the regular semi-annual schedule +Release Schedule +Early releases of a version of the common locale data will be issued as either alpha or beta releases, available for public feedback. The dates for the next scheduled release will be on CLDR Project. +The schedule milestones are listed below. +Milestone JiraPhase Description +Survey Tool Shakedown Selected survey tool users try out the survey tool and supply feedback. The contributed data will be considered as real data. +Data Submission dsub All survey tool registered u sers can add data and vet (vote for) for data +Data Vetting dvet The survey tool users focus shifts to resolving data differences/disputes, and resolve errors. +Data Resolution T he data contribution is closed for general contributors. The Technical Committee will close remaining errors and issues found during the release process . +Alpha and Beta releases rc The release candidates are available for testing. Only showstoppers will be triage and fixed at this point. +Release final Release completed with referenceable release notes and links. +Labels in the Jira column correspond to the phase field in Jira. Phase field in Jira is used to identify tickets that need to be completed before the start of each milestone (table above). +Meetings and Communication +The currently-scheduled meetings are listed on the Unicode Calendar. Meetings are held by phone, every week at 8:00 AM Pacific Time (-08:00 GMT in winter, -07:00 GMT in summer). Additional meeting is scheduled every other Mondays depending on the need and people's availability. +There is an internal email list for the Unicode CLDR Technical Committee, open to Unicode members and invited experts. All national standards bodies who are interested in locale data are also invited to become involved by establishing a Liaison membership in the Unicode Consortium, to gain access to this list. +Officers +The current Technical Committee Officers are: +Chair: Mark Davis (Google) +Vice-Chair: Annemarie Apple (Google) \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/index-survey-tool.txt b/docs/site/TEMP-TEXT-FILES/index-survey-tool.txt new file mode 100644 index 00000000000..5aeb81462c1 --- /dev/null +++ b/docs/site/TEMP-TEXT-FILES/index-survey-tool.txt @@ -0,0 +1,16 @@ +CLDR Survey Tool +Survey Tool | Accounts | Guide | FAQ and Known Bugs +Introduction +CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. +Translations in the Unicode Common Locale Data Repository are gathered and processed via what is called the Survey Tool, an online tool that can be used to view data for different languages and propose additions or changes. This tool provides a way to propose new localized data, see what others have proposed, and communicate with them to resolve differences. During each submission period, contributors from Unicode Consortium members, other organizations and the public at large are invited to review the data for their languages and countries, and propose new translations of terms or modifications, including language translations entirely new to the repository. +Below are the main pages to look at. +Schedule +For the Milestone schedule, see the navigation bar on the left. +Accounts +You don't need an account to view data for a particular language. If you wish to propose changes or additions, you will need an account. For how to get one, see Survey Tool Accounts. If you would like to add data for a new locale, see Adding New Locales. +Guide +For an overview of how the Survey Tool works, see the Survey Tool Guide. +New Fields +To see a summary of the new fields that will be in the next version of CLDR, see http://cldr.unicode.org/index/downloads/dev. At the top of that page you can follow a link to the beta release page. +Development +For developers, see the development pages. \ No newline at end of file diff --git a/docs/site/ddl.md b/docs/site/ddl.md new file mode 100644 index 00000000000..3d94769ce84 --- /dev/null +++ b/docs/site/ddl.md @@ -0,0 +1,17 @@ +--- +title: CLDR DDL Subcommittee +--- + +# CLDR DDL Subcommittee + +The Common Locale Data Repository (CLDR) is [widely used](https://cldr.unicode.org/index), and the content has grown dramatically over the years with participation by organizations of all types and sizes, as well as many individual contributors. + +Contributors for Digitally Disadvantaged Languages (DDL) face unique challenges. The CLDR-DDL subcommittee has been formed to evaluate mechanisms to make it easier for contributors for DDLs to: + +1. become contributors to CLDR +2. improve the coverage for their language in CLDR +3. raise the status of their contributions, so that the CLDR data for their language is incorporated into more products. + +The DDL Subcommittee has started to meet every other week as of June, 2023. + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/images/keyboard-workgroup-keyboards.jpeg b/docs/site/images/keyboard-workgroup-keyboards.jpeg new file mode 100644 index 00000000000..b7f08465b9a Binary files /dev/null and b/docs/site/images/keyboard-workgroup-keyboards.jpeg differ diff --git a/docs/site/images/keyboard-workgroup-rowkeys.png b/docs/site/images/keyboard-workgroup-rowkeys.png new file mode 100644 index 00000000000..bf08a10b194 Binary files /dev/null and b/docs/site/images/keyboard-workgroup-rowkeys.png differ diff --git a/docs/site/index/bcp47-extension.md b/docs/site/index/bcp47-extension.md new file mode 100644 index 00000000000..ce9b98107d2 --- /dev/null +++ b/docs/site/index/bcp47-extension.md @@ -0,0 +1,45 @@ +--- +title: Unicode Extensions for BCP 47 +--- + +# Unicode Extensions for BCP 47 + +[IETF BCP 47 *Tags for Identifying Languages*](https://www.rfc-editor.org/info/bcp47) defines the language identifiers (tags) used on the Internet and in many standards. It has an extension mechanism that allows additional information to be included. The Unicode Consortium is the maintainer of the extension ‘u’ for Locale Extensions, as described in [rfc6067](https://datatracker.ietf.org/doc/html/rfc6067), and the extension 't' for Transformed Content, as described in [rfc6497](https://datatracker.ietf.org/doc/html/rfc6497). + +- The subtags available for use in the 'u' extension provide language tag extensions that provide for additional information needed for identifying locales. The 'u' subtags consist of a set of keys and associated values (types). For example, a locale identifier for British English with numeric collation has the following form: en-GB-**u-kn-true** +- The subtags available for use in the 't' extension provide language tag extensions that provide for additional information needed for identifying transformed content, or a request to transform content in a certain way. For example, the language tag "ja-Kana-t-it" can be used as a content tag indicates Japanese Katakana transformed from Italian. It can also be used as a request for a given transformation. + + +For more details on the valid subtags for these extensions, their syntax, and their meanings, see LDML Section 3.7 [*Unicode BCP 47 Extension Data*](https://www.unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data). + +## Machine-Readable Files for Validity Testing + +Beginning with CLDR version 1.7.2, machine-readable files are available listing the valid attributes, keys, and types for each successive version of [LDML](https://unicode.org/reports/tr35/). The most recently released version is always available at http://unicode.org/Public/cldr/latest/ in a file of the form cldr-common\*.zip (in older versions the file was of the form cldr-core\*.zip). Inside that file, the directory "common/bcp47/" contains the data files defining the valid attributes, keys, and types. + +The BCP47 data is also currently maintained in a source code repository, with each release tagged, for viewing directly without unzipping. For example, see https://github.com/unicode-org/cldr/tree/release-38/common/bcp47. The current development snapshot is found at https://github.com/unicode-org/cldr/tree/master/common/bcp47. + +All releases including the latest are listed on http://cldr.unicode.org/index/downloads, with a link to each respective data directory under the column heading **Data**, and direct access to the repository under the **GitHub Tag.** + +For example, the timezone.xml file looks like the following: + +\ + +\ + +\ + +\ + +Using this data, an implementation would determine that "fr-u-tz-adalv" and fr-u-tz-aedxb" are both valid. Some data in the CLDR data files also requires reference to [LDML](https://unicode.org/reports/tr35/) for validation according to [Appendix Q](https://unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data) of [LDML](https://unicode.org/reports/tr35/). For example, LDML defines the type 'codepoints' to define specific code point ranges in Unicode for specific purposes. + +## Version Information + +The following is not necessary for correct validation of the -u- extension, but may be useful for some readers. + +Each release has an associated data directory of the form "http://unicode.org/Public/cldr/\", where "\" is replaced by the release number. The version number for any file is given by the directory where it was downloaded from. If that information is no longer available, the version can still be accessed by looking at the common/dtd/ldml.dtd file in the cldr-common\*.zip file (for older versions, the core.zip file), at the element cldrVersion, such as the following. This information is also accessible with a validating XML parser. + +\ + +For each release after CLDR 1.8, types introduced in that release are also marked in the data files by the XML attribute "since", such as in the following example: \ + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/index/charts.md b/docs/site/index/charts.md new file mode 100644 index 00000000000..6a896612019 --- /dev/null +++ b/docs/site/index/charts.md @@ -0,0 +1,43 @@ +--- +title: CLDR Charts +--- + +# CLDR Charts + +The Unicode CLDR Charts provide different ways to view the Common Locale Data Repository data. + +- [Latest](https://www.unicode.org/cldr/charts/latest) - The charts for the latest release version +- [Dev](https://www.unicode.org/cldr/charts/dev) - A snapshot of data under development +- [Previous](https://cldr.unicode.org/index/downloads) - Previous available charts are linked from the download page in the Charts column + +The format of most of the fields in the charts will be clear from the Name and ID, such as the months of the year. The format for others, such as the date or time formats, is structured and requires more interpretation. For more information, see [UTS #35: Locale Data Markup Language (LDML)](http://www.unicode.org/reports/tr35/). + +Most charts have "double links" somewhere in each row. These are links that put the address of that row into the address bar of the browser for copying. + +*Note that not all CLDR data is included in the charts.* + +### Version Deltas + +- [**Delta Data**](https://www.unicode.org/cldr/charts/latest/delta/index.html) - Data that changed in the current release. +- [**Delta DTDs**](https://www.unicode.org/cldr/charts/latest/supplemental/dtd_deltas.html) - Differences between CLDR DTD's over time. + + +### Locale-Based Data + +- [**Verification**](https://www.unicode.org/cldr/charts/latest/verify/index.html) - Constructed data for verification: Dates, Timezones, Numbers +- [**Summary**](https://www.unicode.org/cldr/charts/latest/summary/root.html) - Provides a summary view of the main locale data. Language locales (those with no territory or variant) are presented with fully resolved data; the inherited or aliased data can be hidden if desired. Other locales do not show inherited or aliased data, just the differences from the respective language locale. The English value is provided for comparison (shown as "=" if it is equal to the localized value, and n/a if not available). The Sublocales column shows variations across locales. Hovering over each Sublocale value shows a pop-up with the locales that have that value. +- [**By-Type**](https://www.unicode.org/cldr/charts/latest/by_type/index.html) - provides a side-by-side comparison of data from different locales for each field. For example, one can see all the locales that are left-to-right, or all the different translaitons of the Arabic script across languages. Data that is unconfimred or provisional is marked by a red-italic locale ID, such as *·bn\_BD·*. +- [**Character Annotations**](https://www.unicode.org/cldr/charts/latest/annotations/index.html) - The CLDR emoji character annotations. +- [**Subdivision Names**](https://www.unicode.org/cldr/charts/latest/subdivisionNames/index.html) - The (draft) CLDR subdivision names (names for states, provinces, cantons, etc.). +- [**Collation Tailorings**](https://www.unicode.org/cldr/charts/latest/collation/index.html) - Collation charts (draft) for CLDR locales. + + +Other Data + +- [**Supplemental Data**](https://www.unicode.org/cldr/charts/latest/supplemental/index.html) - General data that is not part of the locale hierarchy but is still part of CLDR. Includes: *plural rules, day-period rules, language matching, language-script information, territories (countries),* and their *subdivisions, timezones,* and so on. +- **Transform** - (Disabled temporarily) Some of the transforms in CLDR: the transliterations between different scripts. For more on transliterations, see [Transliteration Guidelines](https://cldr.unicode.org/index/cldr-spec/transliteration-guidelines). +- [**Keyboards**](https://www.unicode.org/cldr/charts/latest/keyboards/index.html) - Provides a view of keyboard data: layouts for different locales, mappings from characters to keyboards, and from keyboards to characters. + +For more details on the locale data collection process, please see the [CLDR process](https://cldr.unicode.org/index/process). For filing or viewing bug reports, see [CLDR Bug Reports](https://github.com/unicode-org/cldr/blob/main/docs/requesting_changes.md). + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/index/keyboard-workgroup.md b/docs/site/index/keyboard-workgroup.md new file mode 100644 index 00000000000..5e1c774fe54 --- /dev/null +++ b/docs/site/index/keyboard-workgroup.md @@ -0,0 +1,82 @@ +--- +title: CLDR Keyboard Subcommittee +--- + +# CLDR Keyboard Subcommittee + +The CLDR Keyboard Subcommittee is developing a new cross-platform standard XML format for use by keyboard authors for inclusion in the CLDR source repository. + +## News + +2023-Feb-29: The CLDR-TC has authorized the proposed specification to be released as stable (out of Technical Preview). + +2023-May-15: The CLDR-TC has authorized [Public Review Issue #476](https://www.unicode.org/review/pri476/) of the proposed specification, as a "Technical Preview." The PRI closed on 2023-Jul-15. + +## Background + +**CLDR (Common Locale Data Repository)** + +Computing devices have become increasingly personal and increasingly affordable to the point that they are now within reach of most people on the planet. The diverse linguistic requirements of the world's 7+ billion people do not scale to traditional models of software development. In response to this, Unicode [CLDR](https://cldr.unicode.org/) has emerged as a standards-based solution that empowers specialist and community input, as a means of balancing the needs of language communities with the technologies of major platform and service providers. + +![alt-text](../images/keyboard-workgroup-keyboards.jpeg) + +### The challenge and promise of Keyboards + +Text input is a core component of most computing experiences and is most commonly achieved using a keyboard, whether hardware or virtual (on-screen or touch). However, keyboard support for most of the world's languages is either completely missing or often does not adequately support the input needs of language communities. Improving text input support for minority languages is an essential part of the Unicode mission. + +Keyboard data is currently completely platform-specific. Consequently, language communities and other keyboard authors must see their designs developed independently for every platform/operating system, resulting in unnecessary duplication of technical and organizational effort. + +There is no central repository or contact point for this data, meaning that such authors must separately and independently contact all platform/operating system developers. + +## LDML: The universal interchange format for keyboards + +The CLDR Keyboard Subcommittee is currently rewriting and redeveloping the existing LDML (XML) definition for keyboards (UTS#35 part 7) in order to define core keyboard-based text input requirements for the world's languages. This format allows the physical and virtual (on-screen or touch) keyboard layouts for a language to be defined in a single file. Input Method Editors (IME) or other input methods are not currently in scope for this format. + +![alt-text](../images/keyboard-workgroup-rowkeys.png) + +## CLDR: A home for the world's newest keyboards + +Today, there are many existing platform-specific implementations and keyboard definitions. This project does not intend to remove or replace existing well-established support. + +The goal of this project is that, **where otherwise unsupported languages are concerned**, CLDR becomes the common source for keyboard data, for use by platform/operating system developers and vendors. + +As a result, CLDR will also become the point of contact for keyboard authors and language communities to submit new or updated keyboard layouts to serve those user communities. CLDR has already become the definitive and publicly available source for the world's locale data. + +## Unicode: Enabling the world's languages + +Keyboard support is part of a multi-step, often multi-year process of enabling a new language or script. + +Three critical parts of initial support for a language in content are: + +- Encoding, in [the Unicode Standard](https://www.unicode.org/standard/standard.html) +- Display, including fonts and text layout +- Input + + +Today, the vast majority of the languages of the world are already in the Unicode encoding. The open-source Noto font provides a wide range of fonts to support display, and the Unicode character properties play a vital role in display. However, input support often lags many years behind when a script is added to Unicode. + +The LDML keyboard format, and the CLDR repository, will make it much easier to deliver text input. + +## Common Questions + +### What is the history of this effort? + +In 2012, the original LDML keyboard format was designed to describe keyboards for comparative purposes. In 2018, a [PRI was created](http://blog.unicode.org/2018/01/unicode-ldml-keyboard-enhancements.html) soliciting further feedback. + +The CLDR Keyboard Subcommittee was formed and has been meeting since mid-2020. It quickly became apparent that the existing LDML format was insufficient for implementing new keyboard layouts. + +### What is the current status? + +Release + +Updates to LDML (UTS#35) Part 7: Keyboards are scheduled to be released as part of [CLDR v45](https://cldr.unicode.org/index/downloads/cldr-45). + +Implementations + +- The [SIL Keyman](https://keyman.com/ldml/) project is actively working on an open-source implementation of the LDML format. + +### How can I get involved? + +If you want to be engaged in this workgroup, please contact the CLDR Keyboard Subcommittee via the [Unicode contact form](https://corp.unicode.org/reporting/staff-contact.html). + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/index/process.md b/docs/site/index/process.md new file mode 100644 index 00000000000..eccd63cd543 --- /dev/null +++ b/docs/site/index/process.md @@ -0,0 +1,206 @@ +--- +title: CLDR Process +--- + +# CLDR Process + +## Introduction + +This document describes the Unicode CLDR Technical Committee's process for data collection, resolution, public feedback and release. + +- The process is designed to be light-weight; in particular, the meetings are frequent, short, and informal. Most of the work is by email or phone, with a database recording requested changes (See [change request](http://cldr.unicode.org/index/bug-reports)). +- When gathering data for a region and language, it is important to have multiple sources for that data to produce the most commonly used data. The initial versions of the data were based on best available sources, and updates with new and improvements are released twice a year with work by contributors inside and outside of the Unicode Consortium. +- It is important to note that CLDR is a Repository, not a Registration. That is, contributors should NOT expect that their suggestions will simply be adopted into the repository; instead, it will be vetted by other contributors. +- The [CLDR Survey Tool](http://www.unicode.org/cldr/survey_tool.html) is the main channel for collecting data, and bug/feature request are tracked in a database ([CLDR Bug Reports](http://www.unicode.org/cldr/filing_bug_reports.html)). +- The final approval of the release of any version of CLDR is up to the decision of the CLDR Technical Committee. + +## Formal Technical Committee Procedures + +For more information on the formal procedures for the Unicode CLDR Technical Committee, see the [Technical Committee Procedures for the Unicode Consortium](http://www.unicode.org/consortium/tc-procedures.html). + +## Specification Changes + +The [UTS #35: Locale Data Markup Language (LDML)](http://www.unicode.org/reports/tr35/) specification are kept up to date with each release with change/added structure for new data types or other features. + +- Requests for changes are entered in the bug/feature request database ([CLDR Bug Reports](http://www.unicode.org/cldr/filing_bug_reports.html)). +- Structural changes are always backwards-compatible. That is, previous files will continue to work. Deprecated elements remain, although their usage is strongly discouraged. +- There is a standing policy for structural changes that require non-trivial code for proper implementation, such as time zone fallback or alias mechanisms. These require design discussions in the Technical Committee that demonstrates correct function according to the proposed specification. + +## Data- Submission and Vetting + +The contributors of locale data are expected to be language speakers residing in the country/region. In particular, national standards organizations are encouraged to be involved in the data vetting process. + +There are two types of data in the repository: + +- **Core data** (See [Core data for new locales](http://cldr.unicode.org/index/cldr-spec/minimaldata)): The content is collected from language experts typically with a CLDR Technical Committee member involvement, and is reviewed by the committee. This is required for a new language to be added in CLDR. See also [Exemplar Character Sources](http://www.unicode.org/cldr/filing_bug_reports.html#Exemplar_Characters). +- **Common locale data**: This is the bulk of the CLDR data and data collection occurs twice a year using the Survey tool. (See [How to Contribute](http://cldr.unicode.org/#TOC-How-to-Contribute-).) + + +The following 4 states are used to differentiate the data contribution levels. The initial data contributions are normally marked as draft; this may be changed once the data is vetted. + +- Level 1: **unconfirmed** +- Level 2: **provisional** +- Level 3: **contributed (= minimally approved)** +- Level 4: **approved** (equivalent to an absent draft attribute) + +Implementations may choose the level at which they wish to accept data. They may choose to accept even **unconfirmed** data if having some data is better than no data for their purpose. Approved data are vetted by language speakers; however, this does not mean that the data is guaranteed to be error-free -- this is simply the best judgment of the vetters and the committee according to the process. + +### Survey Tool User Levels + +There are multiple levels of access and control: + +| **Vetter Level** | **Number of Votes** | **Description** | | +|---|---|---|:---:| +| *TC Member* | 50 / 6 or 4 | - Manage users in their organization
- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.)
- Can see the email addresses for all vetters in their organization
- Only uses a 50 vote for items agreed to by the CLDR technical Committee
- TC members may have a 6 or 4 regular vote depending on how actively their organization participates in the TC | | +| *TC Organization Managers* | 6 | - Manage users in their organization
- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.)
- Can see the email addresses for all vetters in their organization | | +| *Organization Managers* | 4 | -Manage users in their organization
- Can vet and submit data for all locales (However, their vetting work is only done to correct issues.)
- Can see the email addresses for all vetters in their organization | | +| *TC Organization Vetter* | 6 | - Can vet and submit data for a particular set of locales.
- Can see the email addresses for submitted data in their locales.
- Cannot manage other users. | | +| *Organization Vetter* | 4 | - Can vet and submit data for a particular set of locales
- Can see the email addresses for submitted data in their locales.
- Cannot manage other users. | | +| *Guest Vetter* | 1 | - Can vet and submit data for a particular set of locales
- Cannot see email addresses.
- Cannot manage other users. | | +| *Locked Vetter* | 0 | - If a user is locked or removed, then their vote is considered a zero weight. | | + +These levels are decided by the technical committee and the TC representative for the respective organizations. + +- Unicode TC members (full/institutional/supporting) can assign its users to Regular or Guest level, and with approval of the TC, users at the Expert level. +- TC Organizations that are fully engaged in the CLDR Technical Committee are given a higher vote level of 6 votes to reflect their level of expertise and coordination in the working of CLDR and the survey tool as compared to the normal organization vote level of 4 votes +- Liaison or associate members can assign to Guest, or to other levels with approval of the TC. + - The liaison/associate member him/herself gets TC status in order to manage users, but gets a Guest status in terms of voting, unless the committee approves a higher level. +- Users assigned to "[unicode.org](http://unicode.org/)" are normally assigned as Guest, but the committee can assign a different level. + +### Voting Process + +- Each user gets a vote on each value, but the strength of the vote varies according to the user level (see table above). +- For each value, each organization gets a vote based on the maximum (not cumulative) strength of the votes of its users who voted on that item. +- For example, if an organization has 10 Vetters for one locale, if the highest user level who voted has user level of 4 votes, then the vote count attributed to the organization as a whole is 4 for that item. + +### Optimal Field Value + +For each release, there is one optimal field value determined by the following: + +- Add up the votes for each value from each organization. +- Sort the possible alternative values for a given field + - by the most votes (descending) + - then by UCA order of the values (ascending) +- The first value is the optimal value (**O**). +- The second value (if any) is the next best value (**N**). + +### Draft Status of Optimal Field Value + +1. Let **O** be the optimal value's vote, **N** be the vote of the next best value (or zero if there is none), and G be the number of organizations that voted for the optimal value. Let **oldStatus** be the draft status of the previously released value. + +2. Assign the draft status according to the first of the conditions below that applies: + +| **Resulting Draft Status** | **Condition** | +|---|---| +| *approved* | - O > N and O ≥ 8, for *established* locales*
- O > N and O ≥ 4, for other locales | +| *contributed* | - O > N and O ≥ 4 and oldstatus < contributed
- O > N and O ≥ 2 and G ≥ 2 | +| *provisional* | O ≥ N and O ≥ 2 | +| *unconfirmed* | *otherwise* | + + +1. *Established* locales are currently found in [coverageLevels.xml](https://github.com/unicode-org/cldr/blob/master/common/supplemental/coverageLevels.xml), with approvalRequirement\[@votes="8"\] + - Some specific items have an even higher threshold. See approvalRequirement elements in [coverageLevels.xml](http://unicode.org/repos/cldr/trunk/common/supplemental/coverageLevels.xml) for details. +2. If the oldStatus is better than the new draft status, then no change is made. Otherwise, the optimal value and its draft status are made part of the new release. + - For example, if the new optimal value does not have the status of **approved**, and the previous release had an **approved** value (one that does not have an error and is not a fallback), then that previously-released value stays **approved** and replaces the optimal value in the following steps. + +It is difficult to develop a formulation that provides for stability, yet allows people to make needed changes. The CLDR committee welcomes suggestions for tuning this mechanism. Such suggestions can be made by filing a [new ticket](https://cldr.unicode.org/index/bug-reports#TOC-Filing-a-Ticket). + +## Data- Resolution + +After the contribution of collecting and vetting data, the data needs to be refined free of errors for the release: + +- Collisions errors are resolved by retaining one of the values and removing the other(s). +- The resolution choice is based on the judgment of the committee, typically according to which field is most commonly used. + - When an item is removed, an alternate may then become the new optimal value. + - All values with errors are removed. +- Non-optimal values are handled as follows + - Those with no votes are removed. + - Those with votes are marked with *alt=proposed* and given the draft status: **unconfirmed** + +If a locale does not have minimal data (at least at a provisional level), then it may be excluded from the release. Where this is done, it may be restored to the repository for the next submission cycle. + +This process can be fine-tuned by the Technical Committee as needed, to resolve any problems that turn up. A committee decision can also override any of the above process for any specific values. + +For more information see the key links in [CLDR Survey Tool](http://www.unicode.org/cldr/survey_tool.html) (especially the Vetting Phase). + +**Notes:** +- If data has a formal problem, it can be fixed directly (in CVS) without going through the above process. Examples include: + - syntactic problems in pattern, extra trailing spaces, inconsistent decimals, mechanical sweeps to change attributes, translatable characters not quoted in patterns, changing ' (punctuation mark) to curly apostrophe or s-cedilla to s-comma-below, removing disallowed exemplar characters (non-letter, number, mark, uppercase when there is a lowercase). + - These are changed in-place, without changing the draft status. +- Linguistically-sensitive data should always go through the survey tool. Examples include: + - names of months, territories, number formats, changing ASCII apostrophe to U+02BC modifier letter apostrophe or U+02BB modifier letter turned comma, or U+02BD modifier letter reversed comma, adding/removing normal exemplar characters. +- The TC committee can authorize bulk submissions of new data directly (CVS), with all new data marked draft="unconfirmed" (or other status decided by the committee), but only where the data passes the CheckCLDR console tests. +- The survey tool does not currently handle all CLDR data. For data it doesn't cover, the regular bug system is used to submit new data or ask for revisions of this data. In particular: + - Collation, transforms, or text segmentation, which are more complex. + - For collation data, see the comparison charts at [http://www.unicode.org/cldr/comparison\_charts.html](http://www.unicode.org/cldr/comparison_charts.html) or the XML data at [http://unicode.org/cldr/data/common/collation/](http://unicode.org/cldr/data/common/collation/) + - For transforms, see the XML data at [http://unicode.org/cldr/data/common/transforms/](http://unicode.org/cldr/data/common/transforms/) + - Non-linguistic locale data: + - XML data: [http://unicode.org/cldr/data/common/supplemental/](http://unicode.org/cldr/data/common/supplemental/) + - HTML view: [http://www.unicode.org/cldr/data/diff/supplemental/supplemental.html](http://www.unicode.org/cldr/data/diff/supplemental/supplemental.html) + + +### Prioritization + +There may be conflicting common practices or standards for a given country and language. Thus LDML provides keyword variants to reflect the different practices (for example, for German it allows the distinction between PHONEBOOK and DICTIONARY collation.). + +When there is an existing national standard for a country that is widely accepted in practice, the goal is to follow that standard as much as possible. Where the common practice in the country deviates from the national standard, or if there are multiple conflicting common practices, or options in conforming to the national standard, or conflicting national standards, multiple variants may be entered into the CLDR, distinguished by keyword variants or variant locale identifiers. + +Where a data value is identified as following a particular national standard (or other reference), the goal is to keep that data aligned with that standard. There is, however, no guarantee that data will be tagged with any or all of the national standards that it follows. + +### Maintenance Releases + +Maintenance releases, such as 26.1, are issued whenever the standard identifiers change (that is, BCP 47 identifiers, Time zone identifiers, or ISO 4217 Currency identifiers). Updates to identifiers will also mean updating the English names for those identifiers. + +Corrigenda may also be included in maintenance releases. Maintenance releases may also be issued if there are substantive changes to supplemental data (non-language such as script info, transforms) data or other critical data changes that impact the CLDR data users community. + +The structure and DTD may change, but except for additions or for small bug fixes, data will not be changed in a way that would affect the content of resolved data. + +[**Data Retention Policy**](/index/process/cldr-data-retention-policy) + +## Public Feedback Process + +The public can supply formal feedback into CLDR via the [Survey Tool](http://unicode.org/cldr/apps/survey/) or by filing a [Bug Report or Feature Request](http://www.unicode.org/cldr/filing_bug_reports.html). There is also a public forum for questions at [CLDRMailing List](https://www.unicode.org/consortium/distlist.html#cldr_list) (details on archives are found there). + +There is also a members-only [CLDRmailing list](https://www.unicode.org/members/index.html#cldr) for members of the CLDR Technical Committee. + +[Public Review Issues](http://www.unicode.org/review/) may be posted in cases where broader public feedback is desired on a particular issue. + +Be aware that changes and updates to CLDR will only be taken in response to information entered in the [Survey Tool](http://unicode.org/cldr/apps/survey/) or by filing a [Bug Report or Feature Request](http://www.unicode.org/cldr/filing_bug_reports.html). Discussion on public mailing lists is not monitored; no actions will be taken in response to such discussion -- only in response to filed bugs. The process of checking and entering data takes time and effort; so even when bugs/feature requests are accepted, it may take some time before they are in a release of CLDR. + +## Data Release Process + +### Version Numbering + +The locale data is frozen per version. Once a version is released, it is never modified. Any changes, however minor, will mean a newer version of the locale data being released. The version numbering scheme is "xy.z", where z is incremented for maintenance releases, and xy is incremented for regular semi-annual releases as defined by the [regular semi-annual schedule](http://cldr.unicode.org/index#TOC-General-Schedule-) + +### Release Schedule + +Early releases of a version of the common locale data will be issued as either alpha or beta releases, available for public feedback. The dates for the next scheduled release will be on [CLDR Project](http://www.unicode.org/cldr/index.html). + +The schedule milestones are listed below. + +| **Milestone** | **JiraPhase** | **Description** | +|---|---|---| +| **Survey Tool Shakedown** | | Selected survey tool users try out the survey tool and supply feedback. The contributed data will be considered as real data. | +| **Data Submission** | dsub | All survey tool registered u sers can add data and vet (vote for) for data | +| **Data Vetting** | dvet | The survey tool users focus shifts to resolving data differences/disputes, and resolve errors. | +| **Data Resolution** | | T he data contribution is closed for general contributors. The Technical Committee will close remaining errors and issues found during the release process . | +| **Alpha and Beta releases** | rc | The release candidates are available for testing. Only showstoppers will be triage and fixed at this point. | +| **Release** | final | Release completed with referenceable release notes and links. | + +Labels in the **Jira** column correspond to the **phase** field in Jira. Phase field in Jira is used to identify tickets that need to be completed ***before*** the start of each milestone (table above). + +## Meetings and Communication + +The currently-scheduled meetings are listed on the [Unicode Calendar](http://www.unicode.org/timesens/calendar.html). Meetings are held by phone, every week at 8:00 AM Pacific Time (-08:00 GMT in winter, -07:00 GMT in summer). Additional meeting is scheduled every other Mondays depending on the need and people's availability. + +There is an internal email list for the Unicode CLDR Technical Committee, open to Unicode members and invited experts. All national standards bodies who are interested in locale data are also invited to become involved by establishing a [Liaison membership](http://www.unicode.org/consortium/join.html) in the Unicode Consortium, to gain access to this list. + +## Officers + +The current Technical Committee Officers are: + +- Chair: Mark Davis (Google) +- Vice-Chair: Annemarie Apple (Google) + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/index/survey-tool.md b/docs/site/index/survey-tool.md new file mode 100644 index 00000000000..782d2dbce13 --- /dev/null +++ b/docs/site/index/survey-tool.md @@ -0,0 +1,37 @@ +--- +title: CLDR Survey Tool +--- + +# CLDR Survey Tool + +[**Survey Tool**](https://st.unicode.org/cldr-apps/v#locales///) **|** [**Accounts**](https://cldr.unicode.org/index/survey-tool/survey-tool-accounts) **|** [**Guide**](https://cldr.unicode.org/translation/getting-started/guide) **|** [**FAQ and Known Bugs**](https://cldr.unicode.org/index/survey-tool/faq-and-known-bugs) + +### Introduction + +CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. + +Translations in the Unicode Common Locale Data Repository are gathered and processed via what is called the Survey Tool, an online tool that can be used to view data for different languages and propose additions or changes. This tool provides a way to propose new localized data, see what others have proposed, and communicate with them to resolve differences. During each submission period, contributors from Unicode Consortium members, other organizations and the public at large are invited to review the data for their languages and countries, and propose new translations of terms or modifications, including language translations entirely new to the repository. + +Below are the main pages to look at. + +### Schedule + +For the Milestone schedule, see the navigation bar on the left. + +### Accounts + +You don't need an account to view data for a particular language. If you wish to propose changes or additions, you will need an account. For how to get one, see [Survey Tool Accounts](https://cldr.unicode.org/index/survey-tool/survey-tool-accounts). If you would like to add data for a new locale, see [Adding New Locales](https://github.com/unicode-org/cldr/blob/main/docs/requesting_changes.md#adding-new-locales). + +### Guide + +For an overview of how the Survey Tool works, see the [Survey Tool Guide](https://cldr.unicode.org/translation/getting-started/guide). + +### New Fields + +To see a summary of the new fields that will be in the next version of CLDR, see http://cldr.unicode.org/index/downloads/dev. At the top of that page you can follow a link to the beta release page. + +### Development + +For developers, see the [development pages](https://cldr.unicode.org/development/cldr-development-site). + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file