CLDR-17566 text diffs and minor changes

unicode-org · Jun 27, 2024 · 7d6c205 · 7d6c205
1 parent 02af3d3
commit 7d6c205
Show file tree

Hide file tree

Showing 6 changed files with 68 additions and 24 deletions.
diff --git a/docs/site/TEMP-TEXT-FILES/characters.txt b/docs/site/TEMP-TEXT-FILES/characters.txt
@@ -2,12 +2,18 @@ Alphabetic Information
 Ellipsis Patterns
 Ellipsis patterns are used in a display when the text is too long to be shown. It will be used in environments where there is very little space, so it should be just one character; where that really can't work, it should be as short as possible.
 There are three different possible patterns that need to be translated. Typically the same character is used in all three, but three choices are provided just in case different characters would be appropriate in different contexts, for some languages.
+English Pattern	English Example	Meaning
+{0}… or { FIRST_PART_OF_TEXT }…	The quick brown f...	The end of the string is being truncated.
+{0}…{1} or { FIRST_PART_OF_TEXT }…{ LAST_PART_OF_TEXT }	The quic…azy dog.	The middle of the string is being truncated.
+…{1} or …{ LAST_PART_OF_TEXT }	…ver the lazy dog.	The start of the string is being truncated.
 English uses the same basic text for all three cases, and just changes the placeholders. An example of where a language might use different characters is where a space should come between the placeholder and the elipsis. In that case, the patterns would be as in the second column below.
+English Pattern	With Spaces
+{0}…	{0} …
+{0}…{1}	{0} … {1}
+…{1}	… {1}
 English uses the elipsis character (Unicode U+2026), which is preferred over three periods in a row. The latter may have a different appearance, as in the following table.
-Ellipsis Character
-Three dots (periods/full-stops)
-…
-...
+Ellipsis Character	…
+Three dots (periods/full-stops)	...
 If your language also uses three dots to indicate that some text is being elided, then you should also use the elipsis character unless three separate dots are strongly preferred.
 Parse (Parse Lenient)
 This list of characters are those that should be treated the same when a program (or system) reads it as input. An example would be when you type a date into a browser URL field.
@@ -21,14 +27,11 @@ The delimiters are the characters used for quoting text. For example, for Englis
 BIDI languages (Arabic, Hebrew,…):
 “Start” means the character that starts the quotation, and “end” the one that finishes it. With most languages, the start quotation will appear on the left, while with BIDI languages, it will appear on the right.
 Valid Delimiters
-Currently the CLDR survey tool checks input delimiters against a predefined set of possibilities.  The following delimiters are considered "valid" by the CLDR survey tool.
-‘  U+2018 LEFT SINGLE QUOTATION MARK  ’  U+2019 RIGHT SINGLE QUOTATION MARK  ‚  U+201A SINGLE LOW-9 QUOTATION MARK  “  U+201C LEFT DOUBLE QUOTATION MARK  ”  U+201D RIGHT DOUBLE QUOTATION MARK  „  U+201E DOUBLE LOW-9 QUOTATION MARK   「  U+300C LEFT CORNER BRACKET  」  U+300D RIGHT CORNER BRACKET  『  U+300E LEFT WHITE CORNER BRACKET  』  U+300F RIGHT WHITE CORNER BRACKET    ‹  U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK  ›  U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK «  U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK  »  U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+Currently the CLDR survey tool checks input delimiters against a predefined set of possibilities. The following delimiters are considered "valid" by the CLDR survey tool.
+‘ U+2018 LEFT SINGLE QUOTATION MARK ’ U+2019 RIGHT SINGLE QUOTATION MARK ‚ U+201A SINGLE LOW-9 QUOTATION MARK “ U+201C LEFT DOUBLE QUOTATION MARK ” U+201D RIGHT DOUBLE QUOTATION MARK „ U+201E DOUBLE LOW-9 QUOTATION MARK 「 U+300C LEFT CORNER BRACKET 」 U+300D RIGHT CORNER BRACKET 『 U+300E LEFT WHITE CORNER BRACKET 』 U+300F RIGHT WHITE CORNER BRACKET ‹ U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK › U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK « U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK » U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
 If you need to enter a delimiter that is not one of the characters on this list, please file a new ticket by following these instructions.
 Yes/No
 There are special versions of "Yes" and "No" used in POSIX (Portable Operating System Interface) context or other similar applications. Please supply the full word in your language (in lowercase if applicable), followed by a colon, then a common abbreviation separated by colons.
-Name
-Yes
-No
-English Example
-yes:y
-no:n
+Name	English Example
+Yes	yes:y
+No	no:n
diff --git a/docs/site/TEMP-TEXT-FILES/exemplars.txt b/docs/site/TEMP-TEXT-FILES/exemplars.txt
@@ -14,23 +14,58 @@ See the table on the left; you can copy an escape from the left column to insert
 The ➖, ❰, and ❱ characters are chosen to be unusual, so that it is unlikely that they would be normally among the characters you would want to have in a set such as the punctuation characters used in your language
 You can add characters in any order: they'll be displayed in the default order for your locale. Exceptions are very large character sets like Korean Hangul, which use a code point order so that they can make use of the ➖ character.
 In CLDR 43 and previous versions, a different format was used, one that require special "escapes" for certain characters and for strings. This caused problems for many people, and was replaced by the simpler format above.
+Key to Escapes
+Abbr.	Code Point	Name
+❰TAB❱	U+0009	tab
+❰LF❱	U+000A	line feed
+❰CR❱	U+000D	carriage return
+❰SP❱	U+0020	space
+❰NSP❱	U+2009	narrow/thin space
+❰NBSP❱	U+00A0	no-break space
+❰NNBSP❱	U+202F	narrow/thin no-break space
+❰WNJ❱	U+200B	allow line wrap after, aka ZWSP
+❰WJ❱	U+2060	prevent line wrap
+❰SHY❱	U+00AD	soft hyphen
+❰ZWNJ❱	U+200C	cursive non-joiner
+❰ZWJ❱	U+200D	cursive joiner
+❰ALM❱	U+061C	Arabic letter mark
+❰LRM❱	U+200E	left-right mark
+❰RLM❱	U+200F	right-left mark
+❰LRO❱	U+202D	left-right override
+❰RLO❱	U+202E	right-left override
+❰PDF❱	U+202C	end override
+❰BOM❱	U+FEFF	byte-order mark
+❰ANS❱	U+0600	Arabic number sign
+❰ASNS❱	U+0601	Arabic sanah sign
+❰AFM❱	U+0602	Arabic footnote marker
+❰ASFS❱	U+0603	Arabic safha sign
+❰SAM❱	U+070F	Syriac abbreviation mark
+❰KIAQ❱	U+17B4	Khmer inherent aq
+❰KIAA❱	U+17B5	Khmer inherent aa
+❰RANGE❱	U+2796	range syntax mark
+❰ESCS❱	U+2770	escape start
+❰ESCE❱	U+2771	escape end
+❰…❱	U+…	Other; … = hex notation
 Examples
 In the info panel, a mouse hover over the non-winning values shows a comparison to the Winning value. The ➕ { } indicates that { and } are additions to the Winning value, and ➖ ‐ – … ' ‘ ’ " “ ” § @ * / & # † ′ ″ indicates that ➖, ‐. –. …. and so on are subtractions from the Winning value. That makes it much easier to see what the difference in the outcome would be.
 The very last line shows an internal UnicodeSet format. You can normally ignore this. However, if you want more details about the characters you can copy the [...] from that line in the Info Panel and paste that into the Input box on UnicodeSet (and hit Show Set) to see more information about the characters, such as [!(),-.\:;?\[\]\{\}‑].
-Table of Contents
-Format
-Examples
-Exemplar Characters
-Parse Characters
-Handling Warnings in Exemplar characters
-Key to Escapes
-Examplar Examples
 Exemplar Characters
 The exemplar character sets contain the commonly used letters for a given modern form of a language. These are used for testing and for determining the appropriate repertoire of letters for various tasks, like choosing charset converters that can handle a given language. The term “letter” is interpreted broadly, and includes characters used to form words, such as 是 or 가. It should not include presentation forms, like U+FE90 ( ‎ﺐ‎ ) ARABIC LETTER BEH FINAL FORM, or isolated Jamo characters (for Hangul).
 For charts of the standard (non-CJK) exemplar characters, see a chart of the standard exemplar characters.
 For more information, please see Section 5.6 Character Elements in UTS#35: Locale Data Markup Language (LDML).
 There are different categories:
-Examplar Examples
+Category	English Example	Meaning
+standard	a b c d e f g h i j k l m n o p q r s t u v w x y z	The minimal characters required for your language (other than punctuation).
+The test to see whether or not a letter belongs in the main set is based on whether it is acceptable in your language to always use spellings that avoid that character. For example, English characters do not contain the accented letters that are sometimes seen in words like résumé or naïve , because it is acceptable in common practice to spell those words without the accents.
+If your language has both upper and lowercase letters, only include the lowercase (and İ for Turkish and similar languages).
+punctuation	‐ – — , ; : ! ? . … ‘ ' ’ ′ ″ “ " ” ( ) [ ] / @ & # § † ‡ *	The punctuation characters customarily used with your language.
+For example, compared to the English list, Arabic might remove ; , ? /, and add ؟ \ ، ؛.
+Don't include purely math symbols such as +, =, ±, and so on.
+auxiliary	á à ă â å ä ã ā æ ç é è ĕ ê ë ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ú ù ŭ û ü ū ÿ	Additional letters and punctuation (beyond the minimal set) used in foreign or technical words found in typical magazines, newspapers, &c.
+For example, you could see the name Schröder in English in a magazine, so ö is in the set. However, it is very uncommon to see ł , so that isn't in the auxiliary set for English. Publication style guides, such as The Economist Style Guide for English, are useful for this.
+If your language has both upper and lowercase letters, only include the lowercase (and İ for Turkish and similar languages).
+index	A B C D E F G H I J K L M N O P Q R S T U V W X Y Z	The “shortcut” letters for quickly jumping to sections of a sorted, indexed list (for an example, see mu.edu).
+The choice of letters should be appropriate for your language. Unlike the minimal or additional characters, it should have either uppercase or lowercase, depending on what is typical for your language (typically uppercase).
 Parse Characters
 These are sets of characters that are treated as equivalent in parsing. In the Code column you'll see a description of the characters with a sample in parentheses. For example, the following indicates that in date/time parsing, when someone types any of the characters in the Winning column, they should be treated as equivalent to ":".
 Note that if your language doesn't use any of these characters in date and times, the value doesn't really matter, and you can simply vote for the default value. For example, if a time is represented by "3.20" instead of "3:20", then it doesn't matter which characters are equivalent to ":".

diff --git a/docs/site/TEMP-TEXT-FILES/numbering-systems.txt b/docs/site/TEMP-TEXT-FILES/numbering-systems.txt
@@ -6,6 +6,11 @@ The default numbering system for a locale is the numbering system that is normal
 The native numbering system for a locale is the numbering system used for native digits, and is normally in the script for the locale's language. Native numbering systems can only use numeric positional decimal digits, like for Latin numbers (0123456789). If the numbering system in your language uses an algorithm to spell out numbers in the language's script, label it as a traditional numbering system instead. The traditional numbering system does not need to be specified if it is the same as the native numbering system.
 The default, native and traditional numbering systems for a locale may be different. For example, in Tamil the default numbering system is latn, the native numbering system is tamldec and the traditional numbering system is taml.
 Codes are used to represent numbering systems in the Survey tool. Below are some examples of common codes:
+Code	Description	Digits
+arab	Arabic-Indic digits	٠١٢٣٤٥٦٧٨٩
+fullwide	Full width digits	０１２３４５６７８９
+hant	Traditional Chinese numerals — non-decimal	algorithmic
+latn	Latin digits	0123456789
 For further reference, see the complete list of numbering system codes and their corresponding rules.
 Minimum digits for grouping
 In some languages, the grouping separator is suppressed in certain cases. For example, see china-auf-wachstumskurs.gif, where there is a grouping separator in "12 080" but not in "4720". The minimumGroupingDigits determines what the default for a locale is. In this case the value should be "2" to illustrate that the separator only appears once the number of thousands goes into the double-digits (i.e. 10 thousand or above) and not for single-digit thousands (i.e. anything below 10 thousand).

diff --git a/docs/site/translation/core-data/characters.md b/docs/site/translation/core-data/characters.md
@@ -51,7 +51,7 @@ The English value is “?”, but another character might be better for your lan
 
 ## Delimiters
 
-The delimiters are the characters used for quoting text. For example, for English they are the “curly” right and left forms as in **“this phrase.”** The alternate forms are for embedded quotations, such as **“**He yelled **‘Stop!’**, and turned around.”
+The delimiters are the characters used for quoting text. For example, for English they are the “curly” right and left forms as in **“this phrase.”** The alternate forms are for embedded quotations, such as “He yelled **‘Stop!’**, and turned around.”
 
 *BIDI languages (Arabic, Hebrew,…):*
 

diff --git a/docs/site/translation/core-data/exemplars.md b/docs/site/translation/core-data/exemplars.md
@@ -62,7 +62,7 @@ Certain fields have _**sets**_ of characters (and strings) as values, called **U
 
 In the info panel, a mouse hover over the non-winning values shows a comparison to the Winning value. The ➕ { } indicates that { and } are additions to the Winning value, and ➖ ‐ – … ' ‘ ’ " “ ” § @ \* / & # † ′ ″ indicates that ➖, ‐. –. …. and so on are subtractions from the Winning value. That makes it much easier to see what the difference in the outcome would be.
 
-The very last line shows an internal UnicodeSet format. You can normally ignore this. However, if you want more details about the characters you can copy the [...] from that line in the Info Panel and paste that into the Input box on [UnicodeSet](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp) (and hit Show Set) to see more information about the characters, such as [[!(),-.\:;?\[\]\{\}‑]](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B!%28%29,-.%5C:;?%5C%5B%5C%5D%5C%7B%5C%7D%E2%80%91%5D).
+The very last line shows an internal UnicodeSet format. You can normally ignore this. However, if you want more details about the characters you can copy the [...] from that line in the Info Panel and paste that into the Input box on [UnicodeSet](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp) (and hit Show Set) to see more information about the characters, such as [[!(),-.\\:;?\\[\\]\\{\\}‑]](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B!%28%29,-.%5C:;?%5C%5B%5C%5D%5C%7B%5C%7D%E2%80%91%5D).
 
 ![image](../../images/core-data/Screenshot-2024-06-27-at-3.59.26.png)
 
@@ -101,6 +101,7 @@ For example:
 
 - Suppose the currency code XAF is translated as "Φράγκο BEAC CFA" in Greek. That raises a warning because the "BEAC CFA" are not in the Greek exemplars.
 - Suppose that a currency symbol contains ৲ (BENGALI RUPEE MARK). That also raises a warning, even though it is a symbol and not a letter, because it has a script (Bengali).
+
 Three possible solutions:
 
 1. If the character really is used in the language, add it to the appropriate exemplar set (**standard, auxiliary,…**).

diff --git a/docs/site/translation/core-data/numbering-systems.md b/docs/site/translation/core-data/numbering-systems.md
@@ -25,7 +25,7 @@ Codes are used to represent numbering systems in the Survey tool. Below are some
 | hant | Traditional Chinese numerals — non-decimal | algorithmic |
 | latn |  Latin digits |  0123456789 |
 
-For further reference, see the [complete list](http://www.unicode.org/repos/cldr/trunk/common/bcp47/number.xml) of numbering system codes and their corresponding[rules](http://www.unicode.org/repos/cldr/trunk/common/supplemental/numberingSystems.xml).
+For further reference, see the [complete list](http://www.unicode.org/repos/cldr/trunk/common/bcp47/number.xml) of numbering system codes and their corresponding [rules](http://www.unicode.org/repos/cldr/trunk/common/supplemental/numberingSystems.xml).
 
 ## Minimum digits for grouping