From 50d1c44cbb30ef0ed6dc32696b6e20a86ca58bef Mon Sep 17 00:00:00 2001 From: Chris Pyle Date: Thu, 4 Jul 2024 13:00:31 -0400 Subject: [PATCH] CLDR-17566 removing text diffs --- docs/site/TEMP-TEXT-FILES/change-to-sites.txt | 38 ---- ...support-intercalary-months-year-cycles.txt | 183 ------------------ .../TEMP-TEXT-FILES/consistent-casing.txt | 79 -------- .../TEMP-TEXT-FILES/coverage-revision.txt | 34 ---- .../currency-code-fallback.txt | 17 -- 5 files changed, 351 deletions(-) delete mode 100644 docs/site/TEMP-TEXT-FILES/change-to-sites.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/chinese-and-other-calendar-support-intercalary-months-year-cycles.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/consistent-casing.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/coverage-revision.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/currency-code-fallback.txt diff --git a/docs/site/TEMP-TEXT-FILES/change-to-sites.txt b/docs/site/TEMP-TEXT-FILES/change-to-sites.txt deleted file mode 100644 index a44dffd5759..00000000000 --- a/docs/site/TEMP-TEXT-FILES/change-to-sites.txt +++ /dev/null @@ -1,38 +0,0 @@ -Change to Sites? -We are using Sites (http://cldr.unicode.org/) to host all of the CLDR development web pages. I took a look at the cldr pages, and the following are not in Sites. The question is, should we move them (or some of them) to Sites? -Advantages -Removing a bottleneck. Even though we have them in CVS (so the project people can edit the repository copies of them), we have a bottleneck in that only a small number of people (Rick, Steven, and me) can actually post them up publicly. -Editing doesn't require an HTML editor. -Disadvantages -The look and feel is somewhat different than the regular Unicode site (although we should be able to make it closer over time). -Files -http://www.unicode.org/cldr/beta.html -http://www.unicode.org/cldr/comparison_charts.html -http://www.unicode.org/cldr/corrigenda.html -http://www.unicode.org/cldr/index.html -http://www.unicode.org/cldr/filing_bug_reports.html -http://www.unicode.org/cldr/locale_faq.html -http://www.unicode.org/cldr/process.html -http://www.unicode.org/cldr/transliteration_guidelines.html -http://www.unicode.org/cldr/terms.html -http://www.unicode.org/cldr/survey_tool.html -http://www.unicode.org/cldr/repository_access.html -http://www.unicode.org/cldr/version/ (version pages) -Here are the other files: -Special purpose, for redirecting. -http://www.unicode.org/cldr/header.html -Old or temporary page: -http://www.unicode.org/cldr/readme.txt -http://www.unicode.org/cldr/press.html -Already redirect, some to Docs pages -http://www.unicode.org/cldr/big_red_switch.html -http://www.unicode.org/cldr/charts.html -http://www.unicode.org/cldr/data_formats.html -http://www.unicode.org/cldr/errata.html -http://www.unicode.org/cldr/procedures.html -http://www.unicode.org/cldr/tr35.html -http://www.unicode.org/cldr/timezone_ids.html -http://www.unicode.org/cldr/survey_tool_known_bugs.html -http://www.unicode.org/cldr/vetting.html -http://www.unicode.org/cldr/xmlGuide.html -Possible Bug - needs investigation. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/chinese-and-other-calendar-support-intercalary-months-year-cycles.txt b/docs/site/TEMP-TEXT-FILES/chinese-and-other-calendar-support-intercalary-months-year-cycles.txt deleted file mode 100644 index 96ae9b55261..00000000000 --- a/docs/site/TEMP-TEXT-FILES/chinese-and-other-calendar-support-intercalary-months-year-cycles.txt +++ /dev/null @@ -1,183 +0,0 @@ -Chinese (and other) calendar support, intercalary months, year cycles -Author Peter Edberg, with info and ideas from many others -Date 2011-11-20 through 2011-11-30, more 2012-01-10 -Status Proposal -Feedback to pedberg (at) apple (dot) com -Bugs See list of tickets at the end of this document -Currently the ICU Calendar object has basic support for the Chinese calendar (can determine era, year number, month, etc.). However, real date formatting using this calendar is blocked until CLDR adds necessary support for formatting Chinese calendar dates. In doing this, we need to take into account other calendars that may have similar issues, which we should support in a unified way. The intent here is to provide the minimum change necessary to support the Chinese calendar (and other luni-solar calendars) at the same level as other calendars are currently supported; support for additional special calendar features requiring significant enhancements to the ICU Calendar object (see below) is for future enhancements. -A. Relevant calendar features -Salient features of the Chinese calendar, and related features of other calendars: -1. Chinese luni-solar calendar -Months begin at a new moon and are 29 or 30 days long. -A year consists of 12 or 13 months (determined by the number of new moons between winter solstices). Months are numbered 1-12. When an extra intercalary month is needed, it might be inserted after any of the standard months 2-11 (after 11 is unusual); it repeats the numbering of the preceding month, with an extra marker to indicate that it is a leap month (in Chinese this marker ‘闰’ precedes the month number). An astronomical rule determines whether and where it gets inserted in a given year. The winter solstice always occurs during month 11, so the new year (and month 1) usually begins on the second new moon after that (Unless month 11 has a leap month added). -Astronomical calculations are based on a meridian of 120° (near Beijing). -Years are named using a 60-year cycle. The year name is formed by combining a celestial stem from a 10-year cycle and an earthly branch from a 12-year cycle. The 12 earthly branches correspond to, but do not have the same names as, the 12 zodiac animals associated with them. For example: -Celestial stems: 甲 jiǎ, 乙 yǐ, 丙 bǐng, 丁 dīng, … -Earthly branches: 子 zǐ, 丑 chǒu, 寅 yín, 卯 mǎo, … -Zodiac animals: 鼠 Rat, 牛 Ox, 虎 Tiger, 兔 Rabbit, … -First years of 60-year cycle: 甲子 jiǎ-zǐ, 乙丑 yǐ-chǒu, 丙寅 bǐng-yín, 丁卯 dīng-mǎo, … -In principle each cycle can be treated as a separate era. However, such eras are not normally ever used in formatted dates, leading to potential ambiguity about which date is being represented. Traditionally this ambiguity could be resolved by also displaying a regnal period or regnal year along with the Chinese calendar date. In modern times this ambiguity is normally resolved by always displaying a Chinese calendar date in conjunction with a date (or at least a year) in at least one other calendar. In Taiwan this other calendar is typically the Minguo/ROC calendar; in Japan it is typically the Japanese calendar; in mainland China and elsewhere it is typically the Gregorian calendar (for a format like “y年U年MMMd日” where y is the Gregorian year and U is the stem-branch name). Note that the year transitions of the associated calendar do not occur at the same time as the year transitions of the Chinese calendar. -There are at least two standard conventions for the epoch of the Chinese calendar — i.e. when was year 1 of era 1. Both are associated with the legendary emperor Huangdi 黃帝, hence the "Huangdi era" 黃帝紀元. The most common convention is to use the beginning of Huangdi's reign, commonly specified as 2697 BCE; a somewhat less common convention (and the one used by ICU) is to use the year when he supposedly invented the Chinese calendar, 2637 BCE. Since the latter is 60 years later, the stem-branch names associated with years do not change, but the cycle number is different. For some usages among calendar specialists Chinese calendar years may be numbered continuously from the beginning of the epoch, in which case Gregorian 2012 Jan. 23 is the beginning of Chinese calendar year 4650 or 4710 depending on which convention is used. However this kind of year numbering is not widely known. -In Chinese the days of the month have special numbering. Days 1-10 use 初一, 初二, … 初十. For days 21-29 the number is formed using 廿 instead of 二十 to indicate 20. The first month is designated 正月 instead of 一月. -2. Other calendars related to the Chinese calendar (Japanese, Korean, Vietnamese) -Similar luni-solar calendars are used in Japanese, Korean, and Vietnamese, with the computations based respectively on meridians near Tokyo, Seoul, and Hanoi. For the Japanese version, the date typically used for disambiguation would be a Japanese calendar date, not a Gregorian date. The Vietnamese calendar uses a different set of animals for the branch names in years, and the marker for intercalary month is inserted *after* the month name, not before. -3. Hebrew calendar -The Hebrew calendar is another luni-solar calendar, with months of 29 or 30 days beginning at a new moon. Intercalary months are inserted during specific years of a 19-year cycle, always by doubling the month of Adar: A leap year has months Adar I and Adar II (Adar I is considered to be the extra inserted month). -Month numbering is interesting. Traditionally, the month of Nisan was numbered 1, and Adar was 12 (thus Adar I and II were 12 and 13). However, this puts the month of Tishri, which begins with the new year (Rosh HaShanah), as month 7. A more modern numbering has Tishri as month 1 (to coincide with the new year) which leads to different schemes for numbering Adar and the subsequent months (see discussion below on what ICU does). -4. Coptic and Ethiopic solar calendars -These always have 13 months; 12 months of 30 days each and a 13th month of 5 days (6 in a leap year). There is no leap month per se. -5. Hindu luni-solar calendar (old or new, with several variants): -Months are 29 or 30 days, beginning at new moon (south India) or full moon (north India). Months are named based on which zodiac sign the sun transits into during the course of the lunar month. An intercalary month occurs when the sun does not transit into a zodiac sign during the lunar month, and it takes the name for the zodiac transit of the following month with a marker to indicate “extra”/“added”; the following month *also* takes a marker to indicate “original”/”regular”/”clean” (a bit like Adar I and Adar II, except that it can apply to any month). If the sun transits into two zodiac signs during a lunar month, then two months are collapsed into one; the resulting month takes the name associated with both zodiac signs, with a marker indicating “lost”. A year when this occurs must also have at least one added month, since the year must have 12 or 13 lunar months. Occasionally an added month with no transits is immediately followed by a collapsed month with two; in this case the first month takes the name of the first transit in the second month plus the marker “extra”/“added”, while the second month takes the names of both transits plus the marker “lost”. -This calendar also uses a 60-year cycle of year names, but they are not derived as combinations of sub-cycle names (as with the Chinese calendar). -6. The Tibetan luni-solar calendar -The Tibetan luni-solar calendar handles months like the Hindu calendar. Two different 60-year naming cycles are in use, one derived from the Chinese calendar and one derived from the Hindu calendar. In addition, three different cardinal year numbering schemes are used, with three different epochs (like the distinction between ethiopic and ethiopic-amete-alem calendars). -B. Other features of the Chinese calendar, not for this proposal -The Chinese calendar divides the solar year into 24 solar terms— 12 major terms and 12 minor terms—each associated with divisions along the sun’s course through the zodiac. These are usually shown on printed calendars, and are used for agriculture and astrological purposes. The data could be derived from existing calendar fields, or a new field could be added. -Months and days are also named in cycles of 60 using the stem-branch names, and days are subdivided into 12 two-hour periods named according to the earthly branches. The combination of year name, month, day name and day period name (年月日時) is important for many purposes, including picking children’s names and arranging weddings, moves, travel, and funerals. This data could also be derived from existing calendar fields, or a new field added. -Festivals and holidays are shown on printed Chinese calendars, as well as on many other calendars. ICU4J has a preliminary framework for holiday support. ICU4C does not, and there is currently no commitment in ICU to move this along. Support for marking festivals and holidays is thus beyond the scope of this proposal. -Nothing in this proposal prevents or makes more difficult adding any of these other features later on; this proposal just focuses on features that can be implemented in the near term. -C. ICU behavior -Here is how ICU currently handles the calendar behaviors above: -1. Chinese calendar -Months are numbered 0-11 (the zero-based value of UCAL_MONTH). When an intercalary month is added, it has the same number as the preceding month, but the value of UCAL_IS_LEAP_MONTH is 1 instead of 0 (this seems to be the only supported calendar that ever sets UCAL_IS_LEAP_MONTH to anything other than 0). -For purposes of add and set operations, month is treated as a tuple represented by UCAL_MONTH and UCAL_IS_LEAP_MONTH. If UCAL_IS_LEAP_MONTH is 0 for a month that has a leap month following, then adding 1 month, or setting UCAL_IS_LEAP_MONTH to 1, sets the calendar to the leap month (which has the same value for UCAL_MONTH). If a month does not have a leap month following, then a set of UCAL_IS_LEAP_MONTH to 1 is ignored. -Years are numbered 1-60 (the value of UCAL_YEAR) for each 60-year cycle. The era is incremented for each 60-year cycle, so we are currently in era 78. -Current ICU4C formatting for the Chinese calendar is completely broken. For example, the short date format in root and zh is currently “y'x'G-Ml-d”; the result this produces for Chinese era 78, year 29, month 4 (non-leap or leap), day 2 is “29x-4-”: There is no era value or leap month indicator, and non-literal fields after the ‘l’ pattern character are skipped. -In ICU4J the existing situation is bit better. Via data in data/xml/main/root.xml, ICU inserts its own "isLeapMonth" resource into the calendar bundle for "chinese"; this provides a leapMonthMarker of "*". There is a public ChineseDateFormatSymbols subclass of DateFormatSymbols which uses the "isLeapMonth" resource, and a public ChineseDateFormat of SimpleDateFormat; using ChineseDateFormat, Chinese calendar date formats using 'G' and 'l' can be formatted and parsed successfully. -2. Hebrew calendar -In a non-leap year, months run 0-4 (for months Tishri-Shevat), skip 5 (“Adar I”), then continue 6-12 (Adar-Elul). In a leap year, 5 is not skipped (“Adar I”), and CLDR data provides an alternate “leap” name for month 6 as “Adar II”. -3. Coptic and Ethiopic calendars -Months are numbered 0-12. -4. Other calendars listed above -ICU does not currently support the Hindu, Vietnamese, or Tibetan calendars (it does support the quite different Indian Civil calendar). -D. Problems with the current ICU behavior: -For the Chinese and Hebrew calendars, there is no a priori way to know for a given year whether it is a leap year. You have to run through the dates in the year to check the behavior (and the way you have to do this depends on the calendar). -The current model for UCAL_IS_LEAP_MONTH (ICU4J) IS_LEAP_MONTH) as a boolean cannot directly indicate the "normal-month-after-leap-month" and "compressed-month" used in Hindu and Tibetan calendars. However, those special months can be inferred from looking at month data before and after the month of interest. Another alternative might be to re-interpret the IS_LEAP_MONTH field to take more than two values. -It is a bit too bad that completely different models are used for leap months in the Hebrew and Chinese calendars. It would have been nice to have a more unified model that could also support the usage in Hindu and Tibetan calendars. -Calendar::add (ucal_add) for UCAL_MONTH gives different strange results for the Hebrew and Chinese calendars. For the Hebrew calendar, in a non-leap year, adding 1 month to month 4 produces month 6. For the Chinese calendar, in a leap year, adding 1 month to month n (before a leap month) produces month n (but with IS_LEAP_MONTH set). This is similar to what happens to hours around daylight savings time transitions, except in that case there is no IS_EXTRA_HOUR field to provide disambuguation (we should add one, see below). -E. Current CLDR support -CLDR currently provides the following: -1. yeartype attribute -The yeartype attribute for month name elements allows an alternate month name to be selected for leap years (current legal values are just “standard”—the default—and “leap”). It is only used for the Hebrew calendar, as follows: -Shevat Adar I Adar Adar II -This works with the normal MMM+/LLL+ pattern characters for months; the choice of which name to use is managed by ICU date formatting code. -Note that this yeartype month is currently mapped into ICU month name data as the 14th element in the array of Hebrew month names, which seems a bit hacky. -2. special pattern character ‘l’ -The special pattern character ‘l’ (small L) is described as: “Special symbol for Chinese leap month, used in combination with M. Only used with the Chinese calendar.” It is intended to indicate where the leap month marker (when needed) should go in a date format. This is a bit odd: -It is not clear how (or whether) this is supposed to work with availableFormats items and DateTImePatternGenerator. -There is currently no structure in CLDR to provide the value for ‘l’. But assuming we added it… -It is not clear how a client who wants month symbol names can get the name for a leap month - do they need to assemble it from two pieces? How would they know what order to use? -It is not clear why this mechanism needs to be different than the mechanism used for the Hebrew calendar. -It seems unnecessary; the month naming could just be handled via the MMM+/LLL+ pattern, and CLDR data could provide complete month names both with and without the marker (distinguished using the something like the yeartype attribute). This would fit more smoothly into existing mechanisms. -F. Proposal -Items 1-2 and 5-8 below are probably do-able for CLDR 21 and ICU 49. The others may come later. -1. ICU behavior for months -The Hebrew model of explicitly numbering all month names and skipping leap months in non-leap years does not work well for calendars like Chinese and Hindu that may insert leap months anywhere (and may combine months, etc.). The use of the UCAL_IS_LEAP_MONTH field is better suited to this. -For choosing the correct month name variant, I had proposed the idea of enhancing the UCAL_IS_LEAP_MONTH field to have 4 values, and adding an enum for these values: -normal month, this is currently value 0 for UCAL_IS_LEAP_MONTH -leap month (for Chinese, this has the same month number as the month before; for Hindu & TIbetan, it has the same number as the month after), this is currently value 1 for UCAL_IS_LEAP_MONTH -normal month after leap month (needed for Hindu & Tibetan); this could be value -1 for UCAL_IS_LEAP_MONTH (it is not a leap month, but does need a special name) -compressed month (needed for Hindu & Tibetan); this could be value 2 for UCAL_IS_LEAP_MONTH -While this was agreed in ICU PMC on 2011-11-09, I now think this idea should be withdrawn (agreed in PMC). For purposes of determining the variant month names, there are other approaches, e.g. for relevant calendars we can see whether subtracting a month gives the same month number (in which case we have a normal month after leap), or adding a month skips a month number (in which case we have a combined month). For calendrical calculations, however, the current UCAL_IS_LEAP_MONTH values of 0 and 1 are adequate (since that is all that is needed to disambiguate month numbering); and in fact the extra values would complicate the calendrical calculations: if we set a month to be compressed, what does that mean? -For a unified model we could also change the Hebrew calendar to use this approach (since in a leap year it inserts Adar I before Adar, whose name then changes to Adar II - the form for normal after leap), but that might be a compatibility issue. We can at least set UCAL_IS_LEAP_MONTH appropriately, even if we do not change the month numbering. -2. CLDR data for leap months -The yeartype attribute for month names cannot support different month name types for each month in a year, or for different months in a year. -Old ideas -The first version of this proposal suggested defining for the month name element a new attribute “monthtype” which could have the values “standard”, “leap”, “standardAfterLeap”, or “combined”, and then supplying explicit names for each needed type for each month (rather than a mechanism to combing markers). The thought was that this would permit handling of special forms for e.g. the first month of the year. However, it is only the first month of the lunar year that may have a special form in the Chinese calendar, and that can never have a leap month anyway. -The second idea was to permit inside each element (i.e at the same level as the elements) zero or more elements, which could have a type attribute of "leap", "standardAfterLeap", or "combined", and whose value would be a a pattern showing how to combine a marker with a month name {0} (and possibly {1} for combined months) - e.g. "闰{0}" or "kshay {0}-{1}". This was approved in CLDR 2011-11-16. However, it does not address the problem of specifying a month type marker with numeric months as well. For this we need a separate structure that parallels monthContext… -Current idea -(approved in CLDR meeting 2011-11-30) -Alongside the element, permit an optional parallel element (only present for calendars that need it). The structure under this is similar to that for , except that: -The element's type attribute that takes one of three values: "format", "stand-alone", or the added "numeric" (pattern to use with numeric months). -The element's type attribute can take an additional value "all" for use with the "numeric" context (since there is no width distinction for numeric months). -The elements can have type "leap", "standardAfterLeap", or "combined"; the value is the pattern used for modifying the month name(s) to indicate that month type. A Chinese calendar example (marker before the month name) in root: - (default alias to format/wide) (default alias to stand-alone/narrow) {0}bis (default alias to format/abbreviated) {0}bis (default alias to format/wide) {0}bis -And in the Chinese locale: - 闰{0} 闰{0} 闰{0} -For other calendars, the elements above could be replaced by others such as the following: -For the Hebrew calendar, in the Hebrew locale, one could have (for Adar I and II): -{0} א׳ {0} ב׳ -For the Hindu calendar, in root (for a combined month, the name will be an affix plus a combination of two month names): -adhik {0} nija {0} kshay {0}-{1} -For the time being, at least, I don't think that we need to present this in the Survey Tool, and that may prove too complex and confusing anyway. -3. Month name styles -(mostly about data, some ideas for future structure requirements): -Japanese locale month name styles, all for either Gregorian or lunar calendar (except as noted). The distinction among them is not just format vs standalone. -The style 1月, 2月, 3月... is almost always used for horizontal text and for yMd formats. This is by far the most common. -The style 一月, 二月, 三月... can be used for vertical text, as a special style e.g. on New Year cards, and rarely for government documents. -The traditional naming, which is still used sometimes for titles on calendar pages: 睦月, 如月, 弥生, 卯月, 皐月, 水無月, 文月, 葉月, 長月, ... -The name 正月 is formally an alternate name for Gregorian January, but in common usage means just the New Year holidays (first few days of Jan.). -Chinese locale month names and alternates (applies to both traditional/simplified, mainland/Taiwan unless noted): -Gregorian calendar: The style 1月, 2月, 3月 is preferred for complete yMd dates (all of whose components should use 0-9 digits), especially when Gregorian dates are shown together with Chinese calendar dates. The style 一月, 二月, 三月 can also be used for month names by themselves, either in running text or as an isolated element on a calendar page. -Lunar calendar: The first month is always designated 正月. For the remaining months, the style 二月, 三月, 四月 is preferred (especially when Chinese calendar dates are shown together with Gregorian dates), except that 冬月 and 腊月 are sometimes used for months 11 and 12. Chinese numerals should be used for the other portions of complete dates as well. -The “monthType” attribute in the first version of this proposal might have also provided a means to address variants such as some of the above, as well as the following: -For parsing, it would be useful to have multiple forms for month names—e.g. “Sep.” and “Sept.” -4. Day names -Will need some way to specify the special day numbering forms used in Chinese for the Chinese calendar - TBD, can be a future enhancement. -5. Deprecate the pattern character ‘l’ (small L). -If it occurs in a pattern it should be ignored. -6. CLDR data for year names -Option 1, element -(The following was originally agreed in CLDR 2011-11-16; however, it has been superseded by option 2, which was approved on 2011-11-30). -Add a element and sub-elements parallel to the current structure for , , and , as follows (with similar structure in ICU): - Jia-Zi Yi-Chou … Gui-Hai (defaults to abbreviated) (defaults to abbreviated) -Only the “format” context would be supported initially; other contexts could be added if needed. -Option 2, element -(approved in CLDR meeting 2011-11-30) -As noted above, the cycle of 60 stem-branch names is used for months and days as well as years. Years as are also known according to the cycle of 12 zodiac animals associated with the branch portion of the stem-branch name. A cycle of 12 branch names is also used for subdivisions of a day. Thus, it would be beneficial to have a more general representation of such name cycles, even though cyclic names for months, days, and day subdivisions are not part of the current proposal. -In one of his comments on #1507, Philippe Verdy mentions that the cycle of 60 names is also used for some non-calendrical enumerations in Chinese such as measurement of angles, and suggests that data for this should be independent of the calendar structure. These notions are specific to the Chinese locale, and are not notions that CLDR would support across multiple locales (unlike the Chinese calendar, which is supported across multiple locales), so it probably does not make sense to add CLDR structure for them. -The following proposes a ways to support cyclic names for years, zodiac mappings, months, days, and dayParts (not really the same as dayPeriods), with the currently-known cycles of length 60 or 12 (for the Chinese, Hindu, and related calendars); this structure would be just below the element: - jia-zi yi-chou … gui-hai < cyclicNameWidth type=”narrow”> (defaults to abbreviated) < cyclicNameWidth type=”wide”> (defaults to abbreviated) (root aliases to years) (root aliases to dayParts) …data for branch names... (root aliases to dayParts, some locales will supply separate data) -As with the leap month data, this may not be appropriate for the Survey Tool. -7. New pattern character(s) -We would need to add a pattern character to indicate year name. A natural choice is ‘U’ since it is currently unused and ‘u’ is already used for a different year type. -8. ICU implementation changes -Formatting... (to be supplied) -Parsing (month names, year names)... (to be supplied) -ICU4J ChineseDateFormat class, move relevant behaviors into SimpleDateFormat, leaving this as mostly a shell. Remove ChineseDateFormatSymbols use of "isLeapMonth" resource; instead derive the necessary data (needed only for backwards compatibility) from the monthPatterns data. -9. ICU API enhancements -Add a calendar field IS_EXTRA_HOUR or IS_REPEATED_HOUR to disambiguate the hour added/repeated during DST transitions that set the clock back. -Work out how and whether to map the modified month names (for leap month types) onto APIs that get date format symbols — use additional options to specify month symbol types? What about symbols for year names? -Add Calendar API to answer the following questions for a given year and era: -Is it a leap year? And if so… -Of what type - does it adjust days or months? -When are the beginnings and ends (perhaps expressed as UDates) of the portions of the year that are affected by the adjustments? Note that calendars like Hindu could have in a single year up to two nonadjacent added months plus one combined month. -10. Supporting the Vietnamese / Korean / Japanese variants of the Chinese lunar calendar -These variants behave in a similar way, using different ways of designating leap months and different names for the stem-branch cycle, the branch cycle, and the zodiac cycle, and using a different meridian as the basis for astronomical calculations. We could support these in several ways: -Treat them as separate calendars with different names, and potentially support all of them in each locale. -Treat them each as the locale-specific variant of the Chinese lunar calendar. In that case the meridian for calculation needs to be supplied as part of the locale data. Need to clarify that the "Chinese" calendar means the locale-adapted version, not the historic imported one. -An in-between approach in which there is just one locale-specific set of data for the Chinese-style lunar calendar, but the meridian for computation is specified independently—perhaps as another locale tag value. -A combination of the above two approaches: Have locale data specify the default median, but also allow specification of meridian in the locale name to override this. This is the recommended approach. -11. Chinese calendar ambiguous dates, and handling of 'y' pattern character -For the Chinese calendar, the value within a Calendar object's YEAR file is the year number within a 60-year cycle. However, this year is never displayed numerically in a Chinese calendar date format; it is always displayed using the cyclic name, i.e. using pattern character 'U'. The Calendar object's ERA field is the cycle number, but this also is never used is a formatted date. Hence formatted dates that use only elements from the Chinese calendar itself are ambiguous as to which era/cycle they are associated with. For real-world usage, that is not a problem; the Chinese calendar is not intended to unambiguously represent a date, and is normally displayed in association with a date (at least a year) in one or more additional calendars that do provide that disambiguation. -As noted above, in Taiwan this other calendar is typically the Minguo/ROC calendar; in Japan it is typically the Japanese calendar; in mainland China and elsewhere it is typically the Gregorian calendar, often with additional calendars such as Islamic. -In the long run, CLDR calendar data for the Chinese calendar should specify which other calendar should be used as the associated calendar. Then it may be that for formatting and parsing Chinese calendar dates, the 'y' and 'G' pattern characters would be interpreted according to this associated calendar, rather than the Chinese calendar. -In the short term, ICU should specify that parse methods that do not take an associated Calendar object may not produce the expected results for the Chinese calendar. Such methods create a work Calendar object and then clear() it, which for the Chinese calendar will set it to era 1; since there is no era in the format, the parsed result will have era 1, producing a date in the range of Gregorian 2600 BCE (probably not what is expected). -Note that the convention of using a secondary calendar associated with a traditional calendar is not unique to the Chinese calendar. Real-world Japanese conventions for formatting dates often use both a Gregorian and Japanese Emperor year, e.g. "2012(平成24)年1月". -G. Tickets -The old CLDR and ICU tickets related to this are: -CLDR #1507: intercalary marker missing from chinese calendar (refile) [includes detailed comments from Philippe Verdy, some anticipating ideas above] -CLDR #2430: Illegal date-time field "l". -CLDR #2558: Chinese calendar formatting does not look right -CLDR #3672: Value inherited from "root" is in error -ICU #6049: Intercalary markers -New tickets related to this, which supersede the above, are: -CLDR #4230: Add element for Chinese calendar support -CLDR #4231: Add cyclic year name support for Chinese calendar -CLDR #4232: Deprecate pattern character 'l', add pattern character 'U' -CLDR #4237: Change Chinese calendar formats to use 'U' pattern char [must wait for ICU format/parse support] -CLDR #4259: Chinese calendar data updates -CLDR #4277: Remove processing from LDML2ICUConverter -CLDR #4282: Chinese calendar formats need era, Gregorian year, or ? -CLDR #4302: Fix spurious errors with chinese calendar (monthPatterns narrow, prettyPaths) -CLDR #4321: Skip some date pattern tests for Chinese calendar ('U' and other differences from std) -CLDR #4322: Fix test errors for monthPatterns -CLDR #4325: Fix test errors for date patterns using U (availableFormats...) -ICU #8958: Use CLDR data to format/parse Chinese cal dates -ICU #8977: Use CLDR data to format/parse Chinese cal dates (J) -ICU #8959: Support new 'U' pattern char for formatting/parsing Chinese cal dates -ICU #9034: Delete obsolete in data/xml/main/root.xml -ICU #9035: More ICU4C chinese calendar format/parse fixes -ICU #9043: Chinese cal dates can't always be parsed - design 2-cal solution -ICU #9044: Chinese cal dates can't always be parsed - document & fix tests -ICU #9055: Integrate Chinese cal pattern updates (cldrbug 4237), update tests \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/consistent-casing.txt b/docs/site/TEMP-TEXT-FILES/consistent-casing.txt deleted file mode 100644 index 48bb22f35dc..00000000000 --- a/docs/site/TEMP-TEXT-FILES/consistent-casing.txt +++ /dev/null @@ -1,79 +0,0 @@ -Consistent Casing -Rough Draft -We know that we need to improve the way we do casing in CLDR. We want the casing to be consistent, so that we don't see, for example, some language names with titlecase and some with lowercase. -Current Status -We have the inText and inList items, but they are not consistently applied - and we haven't had tests for problems. Here is some text from http://unicode.org/reports/tr35 (I added notes in italic): - -The following element controls whether display names (language, territory, etc) are title cased in GUI menu lists and the like. It is only used in languages where the normal display is lower case, but title case is used in lists. There are two options: - - -In both cases, the title case operation is the default title case function defined by Chapter 3 of [Unicode]. In the second case, only the first word (using the word boundaries for that locale) will be title cased. The results can be fine-tuned by using alt="list" on any element where titlecasing as defined by the Unicode Standard will produce the wrong value. For example, suppose that "turc de Crimée" is a value, and the title case should be "Turc de Crimée". Then that can be expressed using the alt="list" value. -Note: we have inList items currently for: -cs.xml -da.xml -es.xml -hr.xml -hu.xml -nl.xml -ro.xml -root.xml -ru.xml -sk.xml -uk.xml - -This element indicates the casing of the data in the category identified by the inText type attribute, when that data is written in text or how it would appear in a dictionary. For example : -lowercase-words -indicates that language names embedded in text are normally written in lower case. The possible values and their meanings are : -titlecase-words : all words in the phrase should be title case -titlecase-firstword : the first word should be title case -lowercase-words : all words in the phrase should be lower case -mixed : a mixture of upper and lower case is permitted. generally used when the correct value is unknown. -Note: we have inText items currently in: -cs.xml -da.xml (20 matches) -es.xml (9 matches) -hr.xml (11 matches) -hu.xml (7 matches) -nl.xml (8 matches) -ro.xml (4 matches) -root.xml (13 matches) -uk.xml (6 matches) -For example, for Dutch we have (excluding draft items): -1,043: lowercase-words -1,045: titlecase-firstword -1,047: titlecase-firstword -1,049: titlecase-firstword -... -In certain circumstances, one or more elements do not follow the rule of the majority. as indicated by the inText element. In this case, the allow attribute is used: -The example below indicates that variant names are normally lower case with one exception. -lowercase-words - - ortografia tradizionale tedesca - ortografia tedesca del 1996 - dialetto del Natisone - -Improved Testing -As a part of bug http://www.unicode.org/cldr/bugs-private/locale-bugs-private/data?id=2227, I added a consistency test for casing. It just generates warnings for now, and the test is very simple: given a bucket of translations (eg language names), verify that everything have the same first-letter casing as the first item. Although simple (and not bulletproof!), it is revealing... -cs [Czech] warning names|language|lb 〈Luxembourgish〉 【】 〈Lucemburština〉 «=» 【】 Warning: First letter case of =upper doesn't match that of =lower (names|language|aa). -cs [Czech] warning names|language|om 〈Oromo〉 【】 〈Oromo (Afan)〉 «=» 【】 Warning: First letter case of =upper doesn't match that of =lower (names|language|aa). -cs [Czech] warning names|language|ps 〈Pashto〉 【】 〈Pashto (Pushto)〉 «=» 【】 Warning: First letter case of =upper doesn't match that of =lower (names|language|aa). -I didn't use the inText or inList data, because I don't think we have enough data, nor that it has been vetted enough, to be reliable. Moreover, I don't think the buckes it uses are fine-grained enough.. I put the test output in 3 different files in http://www.unicode.org/cldr/data/dropbox/casing/ -The code is at http://www.unicode.org/cldr/data/tools/java/org/unicode/cldr/test/CheckConsistentCasing.java. Note that the buckets I used are defined in the code in typesICareAbout in the code. -Feedback and Open Issues -It would be useful to get people's feedback on how the tests can be improved. -In particular, whether the "buckets" should be done differently. For example, it would reduce the warnings if we put the abbreviated format months in a different bucket than the wide format months. But I don't know whether it is right to suppress this warning, or whether it indicates a true problem. -az [Azerbaijani] warning calendar-gregorian|day|sunday:format-wide 〈Sunday〉 【】 〈bazar〉 «=» 【】 Warning: First letter case of =lower doesn't match that of =upper (calendar-buddhist|day|sunday:format-abbreviated). -A second issue is how to pick the "paradigm" casing for each bucket. The algorithm I use now is to just use the first item in each bucket. -A third issue is how to "turn off" the warning; some way for the user to add data that says "it is ok for this item to have different case" (This is a more general issue regarding errors/warnings.) -A broader issue is what we should do with inText and inList in order to deal with casing, and how to deal with the fact that sometimes items in a bucket should undergo a case transformation in a particular environment (eg should be titlecased in menus but otherwise lowercased). -Determining whether current structure is sufficient -(from Peter E, 2009 Nov 18) -The attached "CasingContexts.pdf" is a first draft of a doc providing examples of various contexts for usage of date formats and date elements, language names, region names, and names of various other CLDR keys. This document is somewhat oriented to Mac OS X (since those were the examples at hand), so I would like to solicit other type of examples that may cover situations not depicted here. Suggestions welcome! -What I hope to do with this (and very soon) is to send it out to localizers to solicit either sample translations for each item in context (so I can infer the various grammatical cases and capitalization cases that CLDR may need to support), or better yet, have the localizers let me know about all of the cases that are necessary. -With that information I hope to: -Determine what additional types (beyond form,at and standalone) may be necessary for date formatting, and -See whether inText/inList are adequate to cover the capitalization cases, and if not, try to come up with something better. -(from Peter E., 2009 Nov 22) -I have attached "CasingContextsV2.pdf" which fixes the calendar menu example (thanks Kent!) and adds examples of currencies in text and various examples of units in text. I still need to add an example of currency in a dialog, along with overall instructions. -(from Peter E., 2009 Nov 25) -Updated to "CasingContextsV3.pdf" which adds an overall explanation of the purpose of this document as well as instructions for localizers to provide feedback. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/coverage-revision.txt b/docs/site/TEMP-TEXT-FILES/coverage-revision.txt deleted file mode 100644 index e5d06534ee5..00000000000 --- a/docs/site/TEMP-TEXT-FILES/coverage-revision.txt +++ /dev/null @@ -1,34 +0,0 @@ -Coverage Revision -Propose changing CoverageLevel to be data driven. Rough thoughts. -Have a list of paths -- ​/​/ldml​/identity​/version => NONE -- ... -+ /​/ldml​/localeDisplayNames​/languages​/language[@type="*"] language:​type=​[en, und] => Minimal -+ ... -- ... -Have special variables for -this language -this language's countries -this language's territories -... -Current coverage: needs review -I put the tentative results in http://spreadsheets.google.com/pub?key=t5UzIpaSqcYBksSMtZp-f7Q&output=html -The more detailed files are in http://unicode.org/repos/cldr-tmp/trunk/dropbox/mark/coverage/. In particular: -http://unicode.org/repos/cldr-tmp/trunk/dropbox/mark/coverage/summary.txt -http://unicode.org/repos/cldr-tmp/trunk/dropbox/mark/coverage/samples.txt -http://unicode.org/repos/cldr-tmp/trunk/dropbox/mark/coverage/fullpaths.txt -There is more to do, but I wanted to give a snapshot -- tune the weighting -- weight by draft level -- add collations, rbnf, plural rules, transforms (if non-Latin), etc. -From John -I've been doing some more thinking about how to deal with coverage in CLDR. It seems to me that we already have the notion that every XPath in CLDR should have some predefined number associated with it between 0 and 100 that denotes it's relative importance in terms ofcoverage, with 0 being absolutely critical, and 100 being not critical at all. See Appendix M of TR35 for a brief description of the levels. I think that if we could accurately quantify this using metadata, then it would be relatively easy for us to accomplish two things: -1). Filter out fields from the ST that we don't really need, since everything we do would be filtered based on a desired coverage level. -2). Allow individual users to set the filtering in the survey tool based on one of the predefinedcoverage levels as we already have in the spec, or actually any other numeric coverage level that they desire. -So with this in mind, I would like to propose the following structure to be added to the supplemental metadata: - -  -  -.... - -Finding the appropriate coverage level value would then be a matter of searching the coverageLevel entries in numeric order by value looking for a match of the path vs. "//ldml/" + "regular expression". In other words, we would not specifically include "//ldml" in the expressions, since they would all start with that. Once a given xpath's coverage level value was determined, it shouldn't be too hard for us to simply filter out fields whose coverage level was higher then the requested. I suppose that we will need some wildcards similar to what Mark has started working on in his path filtering proposal. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/currency-code-fallback.txt b/docs/site/TEMP-TEXT-FILES/currency-code-fallback.txt deleted file mode 100644 index fab2b6c6485..00000000000 --- a/docs/site/TEMP-TEXT-FILES/currency-code-fallback.txt +++ /dev/null @@ -1,17 +0,0 @@ -Currency Code Fallback -The basic problem is that we can't use currency codes that our users won't have fonts for. It was fine to have, say, the shekel sign in Hebrew, as in CLDR 1.6, because we could presume that anyone using the Hebrew locale would have a Hebrew font, and that Hebrew fonts would have a shekel sign. But by putting it into root, we are presuming that *everybody* would have that character, which is not true. Our users/customers think there is something wrong when they scan a list of currencies, and some of them are black boxes. -Here are some examples of behavior I think we'd like to support. -We show £ if the font is available, otherwise £, otherwise currency code. -We show ₧ if the font is available, otherwise Pts, otherwise currency code. -For reference, here are the currency signs in Unicode: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3Asc%3A]. Also see http://en.wikipedia.org/wiki/Currency_sign and http://www.unicode.org/cldr/data/charts/by_type/names.currency.html. -For more details on the problem, see the email thread titled "Problems with currency codes". -Here are some recommendations. -We don't have the character fallback element for any currency symbol that is used for different currency codes. That is, it is ok to have EUR for €, but not ok to use KRW for ₩, since ₩ is also used for KPW, and not to have JAY for ¥, since ¥ is also used for CNY. -Even with this, we don't really want to use character fallback elements for currency substitution in general, since it is too coarse. -We should try to remove all the currency symbols that use Unicode symbol characters from the locales, except where they have special plurals, or where we have symbol reversals (eg in the US, $ for USD and C$ for CAD, while in CA, $ for CAD and US$ for USD). -Options -We then just make sure that all currency symbols in root are widely understood and in common fonts (eg in Windows Arial), or -We enhance the currency symbols so that we have a fallback list. We put the symbols that are in typical fonts in each locale in the currencySymbols exemplar list for that locale. When formatting, we walk through the fallback list until we hit one that works. If we don't get any, we use the currency code. If a smart client has font information, then he could also walk the fallback list using the font information instead of the currencySymbol exemplars. -We have something like "commonly used" that lets the application provider choose to force the symbol; otherwise only the commonly used symbols appear. So in root, commonly used would be on EUR, etc. Could be turned on in locales, or by application. -I'm leaning towards #1, just for simplicity. -However, see also: http://www.unicode.org/cldr/bugs/locale-bugs?findid=2244 \ No newline at end of file