From 822451ce9ef73c0e82043d186da10ca750280ffd Mon Sep 17 00:00:00 2001 From: Chris Pyle Date: Thu, 4 Jul 2024 18:15:58 -0400 Subject: [PATCH] CLDR-17566 removing text files --- .../TEMP-TEXT-FILES/fractional-plurals.txt | 3 - .../TEMP-TEXT-FILES/generic-calendar-data.txt | 65 ----------- ...forms-for-datetime-elements-and-others.txt | 106 ------------------ docs/site/TEMP-TEXT-FILES/grapheme-usage.txt | 40 ------- docs/site/TEMP-TEXT-FILES/hebrew-months.txt | 46 -------- 5 files changed, 260 deletions(-) delete mode 100644 docs/site/TEMP-TEXT-FILES/fractional-plurals.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/generic-calendar-data.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/grammar-capitalization-forms-for-datetime-elements-and-others.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/grapheme-usage.txt delete mode 100644 docs/site/TEMP-TEXT-FILES/hebrew-months.txt diff --git a/docs/site/TEMP-TEXT-FILES/fractional-plurals.txt b/docs/site/TEMP-TEXT-FILES/fractional-plurals.txt deleted file mode 100644 index 517bb560c6a..00000000000 --- a/docs/site/TEMP-TEXT-FILES/fractional-plurals.txt +++ /dev/null @@ -1,3 +0,0 @@ -Fractional Plurals -Fraction Samples -Fractional Plurals Design Doc \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/generic-calendar-data.txt b/docs/site/TEMP-TEXT-FILES/generic-calendar-data.txt deleted file mode 100644 index 6efc0b91686..00000000000 --- a/docs/site/TEMP-TEXT-FILES/generic-calendar-data.txt +++ /dev/null @@ -1,65 +0,0 @@ -Generic calendar data -Authors Peter Edberg, Mark Davis -Date 2012-Dec-10, 2012-Dec-18 update from 2012-Dec-12 TC discussion -Status Proposal -Feedback to pedberg (at) apple (dot) com -Bugs #5385 , Add generic calendar as the base for all non-gregorian calendars -(and related bugs to at least partly address at the same time:) -#5421 , Fix era positions -#5490 , Clean up stock date/time formats -Problem: The non-Gregorian calendars never receive the same attention from translators as the Gregorian one. This has led to a number of problems: gaps in the data, and consistency in the data. The problem with the gaps are that the root locale data “leaks through”, giving incorrect results in many cases. The consistency problems are because there is a large amount of data for people to review, and often the data entered is from different people. So there are inconsistencies within the same calendar, and gratuitous differences across calendars. -Yet a very large part of the data for different calendars could be shared, because the calendar format data tends to basically differ only by month and era names, and whether the era is included or not. -Proposal: I propose to use under as a way to provide generic calendar format data; all other calendars will inherit most data (directly or indirectly) from this calendar. The main benefit of this is the ability to have a single place for a locale to specify date format patterns (standard formats, availableFormat, intervalFormats) for (nearly) all non-Gregorian calendars, to minimize effort and to ensure consistency. -Inheritance -The inheritance relationships among the various calendars are as follows; these are specified in root but are set up to follow paths starting in each locale and the proceeding up to root. For example, lookup for Japanese calendar data in the German locale will first check for the data in calendar "japanese"; if not there, it will check for month names in the German data for calendar "gregorian", and for other items in the German data for calendar "generic"; if the latter is not present, it will look for the items in the root locale. -In a given locale, vetters should provide: -Most data items for the generic calendar—except month and era names—using formats with eras. -Selected data items for the gregorian calendar: month and era names, and formats without eras. They may customize other items but generally will not need to. -For the chinese calendar, if they choose to support it, they will need to supply formats with U, plus month names, cyclicNameSets, and monthPatterns. -For most other calendars they choose to support, they should only need to provide month and era names (if the root names are inappropriate). For buddhist/japanese/roc, they can skip the month names (which come from gregorian). Of course they can choose to customize other items for the calendar. -For the generic calendar, the month and era name data in root should truly be generic, with wide names such as “month1”…“month12” (and “era0”, “era1”. The weekday names should be non-generic and can mirror the types: "sun".."sat". -Per TC discussion: weekdays should inherit directly from generic, not from gregorian; in fact everything should inherit as much as possible directly from generic. We assume that every locale will provide generic calendar data. -Special considerations for "generic" calendar -The generic calendar data is not intended to be associated with a calendar type in ICU, and is not intended to be used in a locale specification. Thus: -No entry for it should be added in common/bcp47/calendar.xml -There should be no entry for it under in English or any other locale, and the Survey Tool should not allow providing a localized name for it (coverage = 101 ?) -ICU should not create a Calendar subclass "Generic". -This begs the question: Instead of making "generic" another calendar type, should we instead have special structure for it? -Adding "generic" in : The best argument for this is that it can be implemented easily with no additional structure and (probably) no change to the Survey Tool. Unless there is a strong argument for adding special structure, I think this is what we should do. If we decide later to add special structure, it should be relatively easy to move the "generic" calendar data into it. -Adding special structure for generic calendar: The advantage of this is that it would make clear that the "generic" calendar data is special. While CLDR inheritance could probably be made to work without much difficulty, this would require some effort to allow Survey Tool editing of the data. One concern is that various places in ICU code may currently assume that all CLDR calendar data is accessible in the "calendars" bundle as one type or another. -Agreed in TC that no special structure is necessary, just use . -Handling generic calendar in Survey Tool -I think that all we need to do is adjust coverage levels appropriately. We need to ensure that items for the "generic" calendar have a reasonably low coverage level, except for month and era names which should not be editable. Then for most non-Gregorian calendars, everything other than the month and era names should have a high coverage level (100)? so they are less likely to be seen. -Agreed in TC. -The role of calendar data in root -There are two ways to approach data for calendars in root. -Treat root as a valid locale in its own right, so that the formats in root are appropriate for "root" behavior. In this case we assume that other locales always override the root behavior. -Treat root data primarily as a fallback. In this case root should provide the best fallback behavior for the majority of locales, since locales may not always override the root behavior. -I think the intent for the "gregorian" calendar data in root is #1 above. The formats for this calendar are intended to match ISO 8601 style, with GyMdHms order, hyphen for separator, etc. (This is certainly not the most common order in CLDR locales, but is found in CJK locales). If that is the intent, it fails in a couple of ways; changes for these are suggested in section E below: -Currently the weekday name is first, e.g. "EEEE, G y MMMM dd". While the ISO 8601 order does not include weekday name, it is generally big-endian which would put EEEE next to d, more like "G y MMMM dd, EEEE", which is also the ordering most common in CJK locales. It also leads to clearer intervalFormats. -Currently all widths of month and weekday names are purely numeric. When formats—especially intervalFormats—include month, day number and weekday name, this can make them very cryptic and impossible to understand. I think we should have non-numeric wide/abbreviated month and day names in root. Yes, these would be in English, but we already have other root "gregorian" data in English such as , and without this it is difficult to use the gregorian calendar with root as a valid locale. -Currently the weekday numeric name strings use "1" for Sunday, "2' for Monday, etc. This is inconsistent with ISO 8601, which uses "7" for Sunday, "1" for Monday. Supplemental data also has firstDay=mon for 001, which would include root. -I think the intent for the "generic" calendar data in root should be #2 above. Since many locales will only localize the gregorian data and not customize the generic calendar data, we should provide the best possible fallback in root. This will also lead to some data reduction, since many locales may then not need to customize the generic calendar format data anyway. To this end, the "generic" calendar formats should use the ordering "EdMyG Hms" or subsets thereof, which is by far the most common ordering among all CLDR locales. This would mean some inconsistency in root between the "gregorian" and "generic" formats, which could make error checking more difficult. -The consensus from TC discussion was that neither #1 nor #2 apply. That is, root should not be treated as a locale in its own right, though some felt that the formats should still be tied to a standard such as ISO 8601. At the same time, root data will not be treated primarily as a fallback, we assume that all locales will override both "generic" data and the specific "gregorian" data items. However, we should use a consistent ordering for root data: "generic" and "gregorian" should use the same order, and if that order is GyMd..., then E should come after d: GyMdE. Also, we should provide month and weekday names in root that are not just numeric strings; the weekdays can be "sun".."sat", the months can be "month1".."month12". -Specific changes/suggestions for root data -Based on the discussion above, the following changes/suggestions are proposed for root calendar data: -"gregorian" formats should change from "EGyMd Hms" order to "GyMdE Hms" order (and same for formats that use subsets of this). Note that this would be more consistent with the "d E" order already used for the Ed skeleton. -"generic" formats should use "EdMyG Hms" order and subsets thereof ("chinese" formats should use "EdMU Hms" etc.). -"generic" should provide non-numeric wide/abbreviated weekday names, probably "sunday"/"sun".."saturday"/"sat". -If "generic" provides numeric strings for e.g. narrow weekday names, it should probably use "1" for Monday to be consistent with ISO 8601. -"gregorian" should provide non-numeric wide/abbreviated month names (it inherits the weekday names from "generic"). These could be e.g. "month1".."month12"/"mo1".."mo12" or "january".."december"/"jan".."dec". This will vastly improve legibility of some formats. -"generic" should have generic but non-numeric wide/abbreviated month and era names, e.g. "month1"/"mo1".."month12"/"mo12", "era0".."era1". -"chinese" should also provide generic but non-numeric month names. -No formats in root should use "yyyy", any such occurrences should be changed to "y". And in fact, for the "generic" calendar, we probably want availableFormats items that convert a skeleton with "yyyy" into a pattern with just "y". -For "generic" and "gregorian" I am adding availableFormats entries to cover several skeletons with G as in #5421 "Fix era positions". -Currently most non-gregorian calendars in root do not provide at all—not as an alias to "gregorian" data, and not as real data. I think for these we should now add an alias to the "generic" data. -Should all calendars inherit from *gregorian* weekday names? This will produce better results if locales only localize gregorian. -Based on the TC discussion: -For #1-2 above, root "generic" and "gregorian" formats should use the same ordering, either "GyMdE Hms" (8601) or "EdMyG Hms" (common). Does not impact other locales since all locales should override both "generic" and "gregorian". -For #3-7 above, root data for "generic" should provide weekday names of just "sun".."sat"; root data for "generic", "gregorian", and "chinese" should provide month names of "month1".."month12". -For #8-9 abover, agreed. -For #10 above, decided to move to top level of ; this is a dtd change covered by new bug #5512 -For #11 above, agreed that all calendars should inherit weekday names directly from "generic" -Also, agreed that the following should be addressed as much as possible while implementing the above: -#5421, Fix era positions -#5490, Clean up stock date/time formats \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/grammar-capitalization-forms-for-datetime-elements-and-others.txt b/docs/site/TEMP-TEXT-FILES/grammar-capitalization-forms-for-datetime-elements-and-others.txt deleted file mode 100644 index a6ca44183c0..00000000000 --- a/docs/site/TEMP-TEXT-FILES/grammar-capitalization-forms-for-datetime-elements-and-others.txt +++ /dev/null @@ -1,106 +0,0 @@ -Grammar & capitalization forms for date/time elements and others -Author Peter Edberg, comments from others as noted -Date 2011-11-17 through 2011-11-30, 2012-01-10to17 -Status Proposal -Feedback to pedberg (at) apple (dot) com -Bugs See list of tickets in section D -A. Issues -There are at least 4 axes of variation for choosing the correct form of a date (or fragment thereof). Many of these issues also apply to choosing to other tasks such as choosing the correct form for the name of a language or region, or the correct form for a plural unit. -In order to format a date/time (for example) properly, CLDR must support the necessary forms, a CLDR client such as ICU must have adequate information (e.g. from API options), and it must have a way to map this information to the appropriate CLDR data. -1. Capitalization context -Several capitalization contexts can be distinguished for which different languages use different capitalization behavior for dates, date elements, names of languages/regions/currencies -In the middle of normal running text (complete sentences). -In a UI list or menu. -As an isolated UI element. For dates, this might further break down as (a) an isolated complete date, or (b) an isolated single date element (e.g. a month or day name), though I don’t have clear evidence for this yet. -At the beginning of complete sentences. As far as I know, all bicameral writing systems capitalize the first word in a complete sentence, so no separate data is needed; this can be done programmatically given the necessary context information. -Certainly for month and day names we need separate data for cases 1, 2, and 3 above, since they may behave independently in some languages, though it may be the case that for other types of data such as language & region names, cases 2 & 3 above can be merged. Here are some examples of desired behavior for capitalization of month & day names: -contexts → languages ↓ 1, running text 2, UI list or menu 3b, isolated name (e.g. calendar heading) -de, el, en, ms capitalized capitalized capitalized -da, fi, nb lower lower lower -cs, hr lower capitalized lower -fr lower lower (after a menu/list title ending in colon, e.g. “First day of week:”) capitalized -Mark: We need to know what the current data has. We have told vetters to use the style for a UI list or menu, but many have not followed this approach [Peter: A table in section E at the end of this document provides some data]. Mechanically it might be easiest to go from the middle-of-text style to the others. -Steven: For collecting the data, if we had good scenarios, we could present current data on contexts: 1, 2, 3 and ask vetters which are correct/incorrect. -2. Grammatical case (noun case) -Slavic languages, Greek, Finnish /Hungarian, and probably many other languages have multiple noun cases for day name and especially for month name (as well as multiple forms for time zone names, plural units, language names, etc.). -For month and day names, one dimension of variation is whether the name is used by itself (nominative form) or is part of an expression like “on Monday” or “in January” (some variety of locative/ablative form). These forms are quite important. Expressions like “before January” or “about January” require additional grammatical forms of month names in some languages (e.g. Polish), but those are probably not as important and won’t be considered here. -For month names, a second dimension is whether a day number is used with the month; all of the languages mentioned above have different grammatical forms for month name (genitive, partitive, locative, etc. depending on the language) when used with a day number. -In current CLDR data, standalone is used for the nominative form, and format is used for one of the other possibilities. Here are examples for Finnish and Czech (bold forms are currently in CLDR): -EEEE MMMM MMMMd -“Saturday” = “ lauantai ” (fi, nominative) " sobota " (cs, nominative) “January” = “ tammikuu ” (fi, nominative) “ leden ” (cs, nominative) “January 31” = “31. tammikuuta ” (fi, partitive) “31. ledna ” (cs, genitive) -“on Saturday” = “ lauantaina ” (fi, essive) "v sobotu" (cs, abla./locative) “in January” = “tammikuussa” (fi, inessive) “v lednu” (cs, abla./locative) “on January 31” = “tammikuun 31.” (fi, genitive) “31. ledna ” (cs, genitive) -Many of these languages also have two grammatical forms for a time zone name, depending on whether it occurs by itself or with an actual time. -Finally, for language names, some of these languages have many noun forms. Most of these are not relevant for the types of usage that CLDR supports. However, for some languages like Czech, names in a list of languages need to be either in noun form or adjective form depending on how the list is being used (e.g. adjective form in a list of keyboard languages). Mark: The adjective form is ugly, since it will need to agree with the noun. -Note that there are similar issues for plural units, e.g. “3 hours” versus “in 3 hours”. Mark: Need to ensure that CLDR vetters and clients understand that the plural units are for durations (e.g. of a video), not a relative time. -3. Particles -In some languages such as Greek, French, and Catalan, day and month names in running text need an associated article or other particle, and this may depend on gender and/or spelling of the name. In Greek, Saturday is neuter but other weekdays are feminine, so in text one has “η Δευτέρα” (Monday) but “το Σάββατο” (Saturday). For “on Monday” etc., the particle for the feminine days is either “τη” or “την” depending on the spelling of the day. Lack of support for these articles is considered a major problem. -Note that current CLDR Catalan data uses format/standalone to distinguish month names with and without articles. -4. Degree of ambiguity allowed -Narrow day and month names may be ambiguous. Depending on how there are used, this may be acceptable. For example, if narrow weekday names are shown at the tops of columns on a calendar page, there is enough context to disambiguate which day is which. However, a narrow weekday name shown in an isolated date format may not be adequate. ICU needs to provide enough context information to allow use of ambiguous names only when appropriate. -Mark: This is really a distinction in width, not between format and standalone. -B. Current solution -1. Format vs standalone -The current choice between “format” and “standalone” forms conflates all of the distinct issues above (except perhaps degree of ambiguity) into a single choice. CLDR vetters need to pick two points in the 4-D space as representing the “format” and “standalone” forms. For Slavic month names, typically the standalone form is a capitalized nominative (for use without day number), while the format form is in lowercase and intended for use with day number. This results in incorrect capitalization for many situations, and also does not address the important “on/in Xxx” forms. -2. inList, inText - “controls whether display names (language, territory, etc) are title cased in GUI menu lists and the like,” for languages that normally use lowercase for these; the options are "titlecase-words" or "titlecase-firstword", with special handling supposedly available by using alt="list" on elements that need it (what it does is not specified). - “indicates the casing of the data in the category identified by the ... type attribute, when that data is written in text or how it would appear in a dictionary. For example, shows how to lettercase language names. The possible values are “lowercase-words”, “titlecase-words”, “titlecase-firstword”, and “mixed” (unknown). -3. Related docs -Mark had some discussion and ideas in #2269. -C. Outline of proposed solution -1. Grammatical form for month, weekday -The basic idea: Have additional and types, e.g. “monthNoDate”, “monthWithDate”, “inMonthNoDate”, “onMonthWithDate”, etc. In root, alias these to “format” or “standalone” as appropriate. Any locale can provide explicit data to override these if necessary (i.e. for most locales there would be no change, but Slavic locales (for example) could provide the necessary additional forms. ICU would have new APIs that would take additional usage parameters, and would map to the appropriate context when retrieving data. There would not be additional pattern characters for these contexts. -Note, this still does not address the multiple forms needed in some languages for time zone names and other types of names. -2. Letter casing / capitalization -(This section originally proposed a element, with subelements. This was changed to to allow for the possibility of supporting other types of context-based transforms in the future.) -The basic idea: Have a new element, with subelements. The latter takes a "type" attribute whose values include (for now) “uiListOrMenu” or “stand-alone”. There is no need for a "text" context since that should be the default form for the actual data (see section 4 below). The values are analogous to those for the element; however, the only one currently needed is “titlecase-firstword” ("lowercase" should be the default form for the data, and “titlecase-words” does not seem to be useful). -Note that the “titlecase-firstword” behavior does not exactly match any current ICU behavior; it should probably do something like capitalize the first non-punctuation non-symbol character in a string (if that character can be capitalized). Whether it should skip digits and capitalize the first letter after initial digits needs to be discussed (and may need to be an option). -If no context-based name transforms are needed, the element can be absent. -My initial thought was to include these elements (as many as necessary) inside each relevant name element: , , , etc. As an example for Czech: - titlecase-firstword -This would involve additions to the DTD everywhere we wanted to add these, which is a bit cumbersome. An initial list of where these should be added: - - - - - - - - - -A better alternative is to collect all of these elements in one place, analogous to the element in the Casing Structure proposal (though that is meant to describe existing data, not desired behavior in different contexts), as implemented per cldrbug 4151 (r6311 and later). In that case we need an additional element - say - with a “type” attribute that indicates the set of items to which it applies, or the type "all" to indicate that it applies to all usages. The other sets for the type attribute would be the same as the buckets used for the element mentioned above, as described in this spreadsheet (with some fixes) - for example, "language", "month-format-except-narrow", "calendar-field" (the list of these, and the mapping between them and paths, is currently hardcoded in tools/java/org/unicode/cldr/test/CheckConsistentCasing.java). This would just require a single addition to the DTD and be more expandable: - titlecase-firstword titlecase-firstword titlecase-firstword titlecase-firstword -The data could be located in the same files as the data: common/casing/xx.xml. Note that the data is currently under a element that cannot co-exist with other LDML elements; this needs to be changed, and the element should be renamed something like . -As part of this, the existing inText and inList elements would be deprecated; they do not cover enough contexts and are not easily extensible (as would be by adding more type attribute values). -Again, ICU would need APIs that would take additional usage parameters indicating capitalization context, and would transform the data as appropriate for formatting. -3. ICU changes -We would need new forms of date formatting APIs, with additional parameters that indicated -grammatical context (primarily whether there is a prepositional notion such as “on/in” associated with the date; eventually these might be expanded to include notions such as “before” or “after”. -capitalization context (whether the date was for usage in running text, a UI list or menu, or as an isolated UI element). -Date parsing APIs would have to parse all of the relevant forms. -We would also need new forms of APIs that retrieve names such as language, script, region, etc.; these would also need an additional parameter indicating capitalization context. -4. CLDR data changes and capitalization guidance -In the locales I looked at (see section E), all of the language and region names already have the appropriate capitalization for running text (rather than for a UI list/menu), and nearly all of the format forms for month and day names have the appropriate capitalization for running text. Thus I think we should change our guidance to suggest that this is the preferred form for these, change the existing data as necessary, and programmatically capitalize as necessary for other contexts. -Some languages currently use the standalone month/day context to provide capitalized forms. If the language does not need different grammatical forms for these contexts, and if all standalone forms should always be capitalized, then we can leave that data alone. Otherwise (as with Russian) we should switch to using standalone for grammatical distinctions, and perform any necessary capitalization programmatically. -D. Bugs -Old: -CLDR #1665, Case Used To Translate Languages, Scripts, Territories, Currencies, [closed, has some background] -CLDR #1678, Review and fix inText as part of casing revision -CLDR #1855, Case used to translate languages, scripts,... -CLDR #1924, InList casing is deprecated, but tech standard still uses it -CLDR #2265, Need design proposal: "stand alone" is overloaded -CLDR #2269, Consistent Casing [has proposal from Mark] -CLDR #2383, Have model language as a preference and do casing test on a per item basis [Kent] -CLDR #2392, LDML2ICUConverter - convert ldml / layout / inList & intext elements [comment from Kent about these elements] -CLDR #2701, Consider having 3 forms for Months/Days -CLDR #3347, Update capitalization guidance -CLDR #3406, Language Titlecasing -CLDR #3733, Consider changing the case of "Yesterday", "In {0} days", etc. -ICU #4836, RFE: ICU needs API to access inList CLDR data -New: -CLDR #4284, Add elements, deprecate / -E. Capitalization data -For several languages, the table below shows the desired capitalization behavior in various contexts, and how the current data might need to be altered (statically or programmatically) to match the desired behavior. -Second row title key: M = middle of running text, U = UI list or menu, I = isolated UI element which can be (a) part of a complete isolated date, or (b) isolated single date symbol name (month or day name) -Table body cells, key for desired casing (upper section of cell for language): L = lower case in this context, U = capitalized, U1 = capitalize just the first word of the date. The rest of the cell may have notes on case of CLDR data, whether an article must be used, or whether both noun & adjective forms are needed. -Table body cells, color code: -Here is the table: \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/grapheme-usage.txt b/docs/site/TEMP-TEXT-FILES/grapheme-usage.txt deleted file mode 100644 index c12ba5e517c..00000000000 --- a/docs/site/TEMP-TEXT-FILES/grapheme-usage.txt +++ /dev/null @@ -1,40 +0,0 @@ -Grapheme Usage -Draft -The goal is to allow the use of the appropriate grapheme clusters for given tasks, for a given language. See http://unicode.org/cldr/trac/ticket/2142. Please leave any feedback as comments on that ticket. -The idea is that we have explicit boundaries that represent certain common behaviors (codepoint breaks, or legacy grapheme cluster breaks), and we also have associations for a given language between a particular function and the explicit boundaries that should be used in that language for that function. -Here is a proposal for the structure in LDML: - -... -extended -legacy -aksara -codepoint -extended -... - -The above would be tailorable per locale. -In segments/root.xml we have GraphemeClusterBreak. We interpret that as extended grapheme clusters for compatibility. We then add rules for: -LegacyGraphemeClusterBreak // as per UAX#29 -AksaraGraphemeClusterBreak // the virama character connects extended clusters -CodepointGraphemeClusterBreak // constant, trivial, probably usually implemented in code -ExemplarGraphemeClusterBreak // uses the CLDR exemplar set in addition to extended clusters. -These would also be tailorable per locale (except CodePoint), but should be more rarely done. -Clients like ICU would add new constants for getting BreakIterators (or equivalents). These would be both corresponding to the new explicit rules: -legacy -extended = 'user-character' -aksara -codepoint -exemplar -And to the new 'function-based' breaks: -character_count -character_drop_cap -character_selection -character_backspace -character_delete -Related bugs -#2142, Alternate Grapheme Clusters (pedberg, 2.0) -#2975, Support legacy grapheme break (pedberg, 2.0) -#2825, Add aksha grapheme break (pedberg, 2.0) -#2992, Grapheme Clusters or a new break type - TR29 vs TR18? [about language-specific treatment of digraphs as clusters - ] -#2406, Add locale keywords to specify the type (or variant) of word & grapheme break (pedberg, 2.0) -There is also the suggestion to add another type which is beyond the scope of CLDR - a cluster type that treats ligatures as single clusters. This depends on font behavior. \ No newline at end of file diff --git a/docs/site/TEMP-TEXT-FILES/hebrew-months.txt b/docs/site/TEMP-TEXT-FILES/hebrew-months.txt deleted file mode 100644 index 28bb9846075..00000000000 --- a/docs/site/TEMP-TEXT-FILES/hebrew-months.txt +++ /dev/null @@ -1,46 +0,0 @@ -Hebrew Months -Here's what our Hebrew contacts are telling us about month names and numbers in the Hebrew calendar: -From an end-user's point of view, the numbering of Hebrew months is always consecutive. Even though the numbers are seldom ( if ever ) used in practice. That is to say, in a non-leap year: -Shevat = month 5, Adar = month 6, Nisan = Month 7 -while in a leap year: -Shevat = month 5, Adar I = month 6, Adar II = month 7, and Nisan = month 8. -According to Wikipedia, "Adar II" in a leap year is the "real" Adar, and "Adar I" is considered to be the "extra" month. -I think we can get the desired representation without having to make it overly complex. -To sum up. Currently we have: - - Tishri - Heshvan - Kislev - Tevet - Shevat - Adar I - Adar - Nisan - Iyar - Sivan - Tamuz - Av - Elul - -I propose that we add a distinguishing attribute called "yeartype" to the month element, and then simply add "Adar II" as follows: - - Tishri - Heshvan - Kislev - Tevet - Shevat - Adar I - Adar - Adar II - Nisan - Iyar - Sivan - Tamuz - Av - Elul - -This approach has a number of advantages: -a). It is only a one line change from the existing data, which means minimal disruption to anyone using the existing data. -b). It is technically more accurate according to the Wikipedia, since "Adar II" in a leap year is considered the equivalent month as "Adar" in a non-leap year. That is to say, "Adar II" is the "real" Adar, not "Adar I". -c). Calendaring applications have a relatively easy way to go through the data in numeric order. In a non-leap year, just use 1-5 and 7-12. In a leap year, use 1-6, + 7 alt + 8-12. -The new attribute "yeartype" was chosed as opposed to using "alt", since ICU's build process excludes all "@alt" data by default. \ No newline at end of file