From 007e68eabd1c444f5a378a5d46922e0975b36f92 Mon Sep 17 00:00:00 2001
From: Chris Pyle <cpyle@unicode.org>
Date: Wed, 4 Sep 2024 16:12:25 -0400
Subject: [PATCH 1/3] CLDR-17566 initial txt and md files

---
 .../cldr-data-retention-policy.txt            |   5 +
 .../TEMP-TEXT-FILES/collation-guidelines.txt  | 152 +++++++++++
 .../site/TEMP-TEXT-FILES/currency-process.txt |   3 +
 docs/site/TEMP-TEXT-FILES/definitions.txt     |   9 +
 .../TEMP-TEXT-FILES/faq-and-known-bugs.txt    |  34 +++
 docs/site/TEMP-TEXT-FILES/references.txt      |  45 ++++
 .../index/cldr-spec/collation-guidelines.md   | 248 ++++++++++++++++++
 docs/site/index/cldr-spec/currency-process.md |  18 ++
 docs/site/index/cldr-spec/definitions.md      |  23 ++
 .../process/cldr-data-retention-policy.md     |  14 +
 .../index/survey-tool/faq-and-known-bugs.md   |  72 +++++
 .../translation-guide-general/references.md   |  73 ++++++
 12 files changed, 696 insertions(+)
 create mode 100644 docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
 create mode 100644 docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
 create mode 100644 docs/site/TEMP-TEXT-FILES/currency-process.txt
 create mode 100644 docs/site/TEMP-TEXT-FILES/definitions.txt
 create mode 100644 docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
 create mode 100644 docs/site/TEMP-TEXT-FILES/references.txt
 create mode 100644 docs/site/index/cldr-spec/collation-guidelines.md
 create mode 100644 docs/site/index/cldr-spec/currency-process.md
 create mode 100644 docs/site/index/cldr-spec/definitions.md
 create mode 100644 docs/site/index/process/cldr-data-retention-policy.md
 create mode 100644 docs/site/index/survey-tool/faq-and-known-bugs.md
 create mode 100644 docs/site/translation/translation-guide-general/references.md

diff --git a/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt b/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
new file mode 100644
index 00000000000..abf2bf2f7e3
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
@@ -0,0 +1,5 @@
+CLDR Data Retention Policy
+Certain types of CLDR data can become obsolete, often due to political reorganization or changes in policy within the various countries.  When such changes occur, we leave the obsolete data in CLDR for a certain period of time in order to make it easier for applications to migrate to the newer codes.  However, eventually it becomes necessary to remove obsolete data from the CLDR in order to keep the data from growing uncontrollably.
+The following guidelines have been discussed by the CLDR technical committee and serve as the basis for decision making about when obsolete codes and data are to be removed from the CLDR.
+1). Territory Names ( //ldml/localeDisplayNames/territories/territory[@type="XX"] ) - Data is to remain in the CLDR for a period of 5 years after the territory code for territory "XX" is deprecated in the IANA Subtag Registry.
+2). Metazone Names ( //ldml/dates/timeZoneNames/metazone[@type="ZoneName"] - Data is to remain in the CLDR for a period of 20 years after the metazone becomes "inactive" ( i.e. The zone name is not used in ANY country ).  A spreadsheet listing the Inactive Metazones in CLDR and the dates when they became inactive can be found here.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt b/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
new file mode 100644
index 00000000000..42a8e62c486
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
@@ -0,0 +1,152 @@
+Collation Guidelines
+Collation sequences can be quite tricky to specify.
+The locale-based collation rules in Unicode CLDR specify customizations of the standard data for UTS #10: Unicode Collation Algorithm (UCA). Requests to change the collation order for a given locale, or to supply additional variants, need to follow the guidelines in this document.
+Filing a Request
+Requests to change the collation order for a given locale, or to supply additional variants should be filed as CLDR bug tickets. See CLDR Change Requests
+Rules
+The request should present the precise change expressed as rules. The rules must be supplied in the syntax as specified in http://www.unicode.org/reports/tr35/tr35-collation.html#Rules. (This used to be called the "basic syntax".) The rules must also be Minimal Rules as described below: only differences from http://unicode.org/charts/uca/ should be specified.
+& c < cs
+& cs <<< ccs / cs
+Normally CLDR does not accept submissions that reorder particular digits, punctuation, or other symbols, following instead the UCA ordering for those characters. However, if punctuation, general symbols, currency symbols, or digits as a class all sort after letters, that change can be accommodated. Similarly, if the letters in a particular script sort ahead of others (such as Greek characters ahead of Latin), that can also be accommodated. Both of these are done with a reorder setting. Note: For a given language, CLDR normally sorts the language's native script before other scripts, via the reorder setting.
+Test Data
+Please supply short test cases that illustrate the correct sorting behavior as a list of lines in sorted order. Try to include cases that show the boundary behavior by including suffixes, such as the following to illustrate that "cs" and "ccs" sort specially.
+c
+cy
+cs
+cscs
+ccs
+cscsy
+ccsy
+csy
+d
+Justification
+Provide justification for your change. Citations should be to authoritative pages on the web, in English.
+Testing Your Request
+Please test out any suggested rules before filing a bug.
+Go to the ICU Collation Demo.
+Pick the language for which you want to change the rules, or keep it on "und" (root) if you want to start from the Unicode/CLDR default sort order.
+Put your rules into the "Append rules" box.
+Put an interesting list of strings into the Input box.
+Click "sort" and verify the sort order and levels of differences.
+Or
+Go to the ICU Locale Explorer.
+Pick the appropriate locale.
+Follow the instructions at the bottom to use your suggested rules on your suggested test data.
+Verify that the proper order results.
+Determining the Order
+The exact collation sequence for a given language may be difficult to determine. The base ordering of characters can be fairly straightforward, but there are quite a few other complications involved.
+Most standards that specify collation, such as DIN or CS, are not targeted at algorithmic sorting, and are not complete algorithmic specifications. For example, CSN 97 6030 requires transliteration of foreign scripts, but there are many choices as to how to transliterate, and the exact mechanism is not specified. It also specifies that geometric shapes are sorted by the number of vertices and edges, which is, at a minimum, difficult to determine; and are subject to variation in glyphs.
+The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering. For more information, see UTS #10: Unicode Collation Algorithm (UCA).
+Determining Level Differences
+It is often tricky to determine the exact relationship between characters. In the UCA, case and similar variant differences are at a third (tertiary) level, while accent and similar differences are at a second (secondary) level, and base letter differences are at the first (primary) level. That results in an order like the following:
+cina
+Cina
+çina
+Çina
+dina
+That is, the difference between c and C is weaker than the difference between c and ç, which in turn is weaker than the difference between c and d. For any two characters α and β, it may be very clear that α < β, but not be clear what the right level difference is. To establish this, see if you can find examples of two words that of the following form.
+Primary Test
+...α...Z
+...β...A
+That is, the words are identical except for α, β, A, and Z, and you know that A and Z have a clear primary difference. If we get the above ordering in dictionaries and other sources, you know that the difference between α and β is a primary difference. If we get the opposite ordering than 1,2 above, then you only know that the difference between α and β is not a primary difference: it may be secondary or tertiary.
+You now need to distinguish which of the non-primary level differences you could have. So try again, this time seeing if you can find examples of two words that of the following form, where you know that A and Á have a clear secondary difference in the script.
+Secondary Test1
+...α...Á
+...β...A
+Now the ordering of these two strings tells you whether the difference between α and β is a secondary difference, or not. Alternatively, you can look for words of the form:
+Secondary Test2
+...B...α
+...b...β
+where b < B at a tertiary level. If you get the above ordering for the secondary test2, you also know that the difference between α and β is at a secondary level. The Test2 form is often easier to find examples for.
+If you have established that the characters have neither a primary nor secondary difference, the following can be used in a similar fashion to test whether the difference is at a tertiary level or not.
+Tertiary Test
+...α...B
+...β...b
+If there is no tertiary difference, then the difference is not significant enough for CLDR to take it into account, so they will be treated as equals (unless someone sorts with a final, codepoint level).
+Contractions
+Characters may behave differently in different contexts. For example, "ch" in Slovak sorts after H. A sequence of characters that behaves that way is called a contraction. Another common case of contractions is in the case of syllabaries, where a sequence of characters forming a syllable collates as a unit.
+Note that contractions are typically rather expensive in implementations: they take more storage, and are much slower to compare. So they should be avoided where possible. For example, suppose that we have the following sequence in a dictionary (where the uppercase characters represent characters in the target script):
+KB
+... // combinations of K with consonants
+KZ
+KA
+KE
+KI
+KO
+KU
+LB
+...
+There are two ways to produce this ordering. One is to have KA, KE, KI, etc be contractions. The other is to order all the vowels after all the consonants. Where the latter is sufficient, it is strongly preferred.
+Minimal Rules
+The goal is always specify the minimal differences from the DUCET. For example, take the case of Slovak, where everything sorts as in DUCET except for certain characters. The following rules place the characters ä, č, đ, and the sequence "ch" (and their case variants) at the appropriate positions in the sorting sequence, and with the appropriate strengths:
+Minimal Rules
+& A
+< ä <<< Ä
+& C
+< č <<< Č
+& D
+< đ <<< Đ
+& H
+< ch <<< cH <<< Ch <<< CH
+...
+It would be possible instead to have rules that list every letter used by Slovak [a á ä b c č d ď e é f-h {ch} i í j-l ĺ ľ m n ň o ó ô p-r ŕ s š t ť u ú v-y ý z ž], looking something like the following.
+Maximal Rules
+& A << á <<< Á
+< ä <<< Ä
+< b <<< B
+< c <<< C
+< č <<< Č
+< d
+...
+The Maximal Rules format is not accepted in CLDR. The reasons are:
+Every time a character is tailored, the data for that character takes up more room in typical implementations. That means that the data for collation is larger, downloads of collation libraries with that data are slower, sort keys are longer, and performance is slower; sometimes very much so.
+Related characters in the same script are in a peculiar order. For example, if the Slovak tailoring omits ƀ, then it would show up as after z.
+You can see what the UCA currently does with a given script by looking at the charts at Unicode Collation Charts, or at the UCA in ICU-style rules. For example, suppose that U+0D89 SINHALA LETTER IYANNA and U+0D8A SINHALA LETTER IIYANNA needed to come after U+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
+& ඖ # U+0D96 SINHALA LETTER AUYANNA
+< ඉ # U+0D89 SINHALA LETTER IYANNA
+< ඊ # U+0D8A SINHALA LETTER IIYANNA
+Pitfalls
+There are a number of pitfalls with collation, so be careful. In some cases, such as Hungarian or Japanese, the rules can be fairly complicated (of course, reflecting that the sorting sequence for those languages is complicated).
+Only tailor expected data. We focus on the required collation sequence for a given language with normal data. So we don't include full-width characters for a European collation sequence, such as
+... CSCS <<< ＣＳＣＳ ...
+...  CSCS <<< \uFF23\uFF33\uFF23\uFF33 ... (equivalently)
+Tailor trailing contractions. If a sequence of characters is treated as a unit for collation, it should be entered as a contraction.
+& c < ch
+One might think that sequence like "dz" doesn't require that, since it would always come after "d" followed by any other letter; it is a "trailing contraction". But in unusual cases, that wouldn't be true; if "dz" is a unit sorted as if it were a distinct letter after "d", one should get the ordering "dα" < "dz". The correct behavior will only happen if "dz" is a contraction, such as
+& d < dz
+Watch out for Expansions. If you have a rule like &cs < d, and "cs" has not occurred in a previous rule as a contraction, then this is automatically considered to be the same as &c < d / s; that is, the d expands as if it were a "cs" (actually, primary greater than a "cs", since we wrote "<"). This expansion takes effect until the next primary difference.
+So suppose that "ccs" is to behave as if it were "cscs", and take case differences into account. You might try to do this with the rules on the left:
+Rules (Wrong)
+& C < cs <<< Cs <<< CS
+& cscs <<< ccs
+<<< Cscs <<< Ccs
+<<< CSCS <<< CCS
+Actual Effect
+& C < cs <<< Cs <<< CS
+& cs <<< ccs / cs
+<<< Cscs  / cs <<< Ccs  / cs
+<<< CSCS  / cs <<< CCS / cs
+But since the CSCS has not been made a contraction in previous rules, this produces an automatic expansion, one that continues through the entire sequence of non-primary differences, as shown on the right. This is not what is wanted: each item acts like it expands compared to the previous item. So CCS, for example, will act like it expands to CSCScs!
+What you actually want is the following:
+Rules (Right)
+& C < cs <<< Cs <<< CS
+& cscs <<< ccs
+& Cscs <<< Ccs
+& CSCS <<< CCS
+Actual Effect
+& C < cs <<< Cs <<< CS
+& cs <<< ccs / cs
+& Cs <<< Ccs / cs
+& CS <<< CCS / CS
+In short, when you have expansions, it is always safer and clearer to express them with separate resets. There are only a few exceptions to this, notably when CJK characters are interleaved with Hangul Syllables.
+Minimal Rules. Example: Maltese was sorting character sequences before a base character using the following style:
+& B
+< ċ
+<<<Ċ
+< c
+<<<C
+The correct rules should be the minimal ones.
+& [before 1] c < ċ <<< Ċ
+This finds the highest primary (that's what the 1 is for) character less than c, and uses that as the reset point. For Maltese, the same technique needs to be used for ġ and ż.
+Blocking Contractions. Contractions can be blocked with CGJ, as described in the Unicode Standard and in the Characters and Combining Marks FAQ.
+Case Combinations. The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if ch is a contraction, then you would have the rules ... ch <<< Ch <<< CH. Other case variants such as cH are excluded because they are unlikely to represent the contraction, for example in McHugh. (Therefore, mchugh and McHugh will be primary different if ch adds a primary difference.) [#8248]
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/currency-process.txt b/docs/site/TEMP-TEXT-FILES/currency-process.txt
new file mode 100644
index 00000000000..5bf003accbd
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/currency-process.txt
@@ -0,0 +1,3 @@
+Currency Process
+There are three stages for new currency symbols (such as the recent Russian, Indian, and Turkish symbols). The following shows the stage and the disposition in CLDR data:
+For more information, see Currency Symbols & Names.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/definitions.txt b/docs/site/TEMP-TEXT-FILES/definitions.txt
new file mode 100644
index 00000000000..3424c8e1a89
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/definitions.txt
@@ -0,0 +1,9 @@
+Definitions
+literate percent - indicates the percentage of the country's population that is literate, based on literacy information from the World Bank, CIA Factbook, and others.
+language population - the number of people fluent in that language in that country, including both first and second language speakers. The level of fluency is that necessary to use a UI on a computer, smartphone, or similar devices. Reliable information is difficult to obtain; the information in CLDR is an estimate culled from different sources.
+writing percent (writingPercent) - percentage of the population fluent in that language in that country who regularly read or write a significant amount in that language. Ideally, the regularity would be measured as "7-day actives". Reliable information is difficult to obtain; the information in CLDR is a best estimate culled from different sources. If it is know that the language is not widely written, but there are no solid figures, the value is typically given 1%-5%.
+customary modern usage - The terms or characters commonly used in modern contexts: newspapers, journals, lay publications, street signs, commercial signage, common geographic names, company names, and so on. It does not include terms or characters that are only commonly used in technical or academic contexts such as mathematical expressions, archaic or historic texts, citations of archaic words, liturgical texts, or pedagogical use.
+official language - as used in CLDR, a language that can generally be used in communications with a central government. That is, people can expect that essentially all communication from the government is available in that language (ballots, information pamphlets, legal documents, …) and that they can use that language in communicating to the central government (petitions, forms, …).
+Official languages for a country are not necessarily the same as those with official legal status in the country. For example, Irish is declared to be an official language in Ireland, but English has no such formal status in the United States. Languages such as the latter are called de facto official languages. As another example, German has legal status in Italy, but cannot be used in all communications with the central government, and is thus not an official language of Italy for CLDR purposes. Such languages are official regional or official minority languages.
+official regional language - a language that is official (de jure or de facto) in a major region within a country, but does not qualify as an official language of the country as a whole. For example, it can be used in an official petition to a provincial government, but not the central government. The term "major" is meant to distinguish from smaller-scale usage, such as for a town or village.
+official minority language - a language that has some official governmental status, but is not an official language of the country or of a substantial region.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt b/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
new file mode 100644
index 00000000000..b0e87302297
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
@@ -0,0 +1,34 @@
+FAQ and Known Bugs
+Survey Tool | Accounts | Guide | FAQ and Known Bugs
+FAQ (Frequently Asked Questions)
+Q. Should I preserve the case of English words, like names of languages?
+A. Beginning with CLDR 22, the new guidance is that names of items such as languages, regions, calendar and collation types, as well as names of months and weekdays in calendar data, should be capitalized as appropriate for the middle of body text. For more information, see the Capitalization section in the Translation Guidelines.
+Q. What about the warning about parentheses being discouraged in cases such as "(other)"
+A. You need to remove "(other)" or the equivalent from language names. In general, you should avoid using parentheses in the names of languages, scripts, or regions if at all possible. There is more information about this in the zoomed view.
+Q. Why is the tool slow?
+A. The performance of the Survey Tool has been greatly improved compared to previous versions. However, we are constantly striving to improve performance and our ability to accommodate a larger user base.
+If you feel a task is taking an unusual amount of time, and it is a consistent problem, please please file a bug at newticket. In the ticket, describe exactly what operation is being attempted and approximately how long it is taking to receive a response.
+Q. How are votes weighted and the "best" item picked?
+A. You basically want to get multiple organizations to agree on the best value. For details on the voting process, see Resolution Procedure.
+Q. In the key, it says that the red box is a fallback. What does that mean?
+A. The Unicode CLDR data uses inheritance. That means that if you are looking at English (United Kingdom) (a "sublocale") most of the data is inherited from English (which contains data for the US), called the "parent locale". Such data will show up as red. You only need to have different data in the sublocale where there are important differences in usage from the parent locale.
+Data in a sublocale may be spuriously different; that is, the parent's data may be perfectly acceptable in the sublocale, but somehow a difference has crept in. In that case, you should vote for the parent's data to reduce the gratuitous differences.
+Q. But what I see is a funny symbol like Zxxx?
+A. If there is no other translation available, what you will see is a "neutral" code, typically an ISO code. In cases where there is no such code available, such as for labels like "Month", then you may see English -- which needs to be translated.
+Q. How do I delete an item?
+A. You can only delete an item if you yourself have entered it, and there are no other votes. Click on the "Abstain" button for that row.
+To remove a spurious difference in a sublocale, vote for the red fallback item.
+Q. What if I can't delete it?
+A. It doesn't really matter much. What is really important is to make sure the the right item is voted for; so try to get consensus as described above. If all the alternatives are really wrong, and you really don't know what the right item would be, vote for the red fallback item.
+Q. What if I want to just try out some changes, but don't want to affect the data?
+A. Everyone can add data to "Unknown or Invalid Language" (und), so you can try out the Survey tool there without worry.
+Q. What if I have questions?
+A. You should click on the items you have questions about, and read the information in the right-hand information panel.
+In many cases, even seemingly straightforward translations like the language, script, and territory names have issues.
+You can also go directly to the Translation Guidelines.
+If you have further questions, or problems with the Survey Tool, send a message to cldr-users@unicode.org.
+Known Bugs, Issues, Restrictions
+The following are general known bugs and issues. For known issues in the current release, see Translation Guidelines.
+The description of bulk uploading (http://cldr.unicode.org/index/survey-tool/upload) has not yet been updated for the new UI.
+The description of managing users (http://cldr.unicode.org/index/survey-tool/managing-users) has not yet been updated for the new UI.
+If you find additional problems, please file a ticket.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/references.txt b/docs/site/TEMP-TEXT-FILES/references.txt
new file mode 100644
index 00000000000..8f741fdf6f1
--- /dev/null
+++ b/docs/site/TEMP-TEXT-FILES/references.txt
@@ -0,0 +1,45 @@
+References
+Sources and references may be standards or can also be dictionaries, journal style guides (such as The Economist Style Guide for English), and other available sources that provide guidance as to common practice. Online sources are preferred where available, since they can be more easily checked.
+The goal is to follow common, customary practice. For example, language or territory display names should use the most recognizable name in common usage. This is generally not the official name. For example, one would use "Switzerland" not "Swiss Confederation".
+Here are some possible resources for comparison of locale data. This is not an endorsement of the sources, merely a collation of possibly-useful links. To suggest additions to this list, file a Bug Report.
+General
+CIA World Factbook
+For English, The Economist Style Guide (unfortunately only hard copy):
+http://www.amazon.com/exec/obidos/tg/detail/-/186197535X
+For other languages, there should be similar guides for major publications.
+Exemplar Characters
+https://developer.mimer.com
+http://www.eki.ee/letter/
+http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
+http://www.omniglot.com/writing/
+http://www.geonames.de/alphab.html
+UNGEGN: Working Group on Romanization Systems
+Language Names
+http://www.geonames.de/
+Territory / Region Names
+http://unstats.un.org/unsd/geoinfo/
+http://www.eki.ee/knn/lingid2.htm#WRLD
+http://www.p.lodz.pl/I35/personal/jw37/EUROPE/europe.html
+http://www.geonames.de/
+http://www.worldlanguage.com/Arabic/Countries/ (Use the links at the top switch languages)
+Currencies
+http://publications.eu.int/code/es/es-5000500.htm (Replace es with desired language code)
+http://publications.eu.int/code/es/es-5000700.htm
+http://publications.eu.int/
+http://www.geonames.de/
+http://www.globalfindata.com/gh/index.html
+Collation
+http://www.omniglot.com/writing/
+http://www.alphabets-world.com/
+https://developer.mimer.com
+Dates and Times
+https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu
+http://www.geonames.de/
+Transliteration
+UNGEGN: Working Group on Romanization Systems
+Transliteration of Non-Roman Alphabets and Scripts
+Standards for Archival Description: Romanization
+ISO-15915 (Hindi)
+ISO-15915 (Gujarati)
+ISO-15915 (Kannada)
+ISCII-91
\ No newline at end of file
diff --git a/docs/site/index/cldr-spec/collation-guidelines.md b/docs/site/index/cldr-spec/collation-guidelines.md
new file mode 100644
index 00000000000..54624617124
--- /dev/null
+++ b/docs/site/index/cldr-spec/collation-guidelines.md
@@ -0,0 +1,248 @@
+---
+title: Collation Guidelines
+---
+
+# Collation Guidelines
+
+Collation sequences can be quite tricky to specify. 
+
+The locale\-based collation rules in Unicode CLDR specify customizations of the standard data for [UTS \#10: Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Introduction) (UCA). Requests to change the collation order for a given locale, or to supply additional variants, need to follow the guidelines in this document. 
+
+## Filing a Request
+
+Requests to change the collation order for a given locale, or to supply additional variants should be filed as CLDR bug tickets. See [CLDR Change Requests](https://cldr.unicode.org/index/bug-reports)
+
+### Rules
+
+The request should present the precise change expressed as rules. The rules must be supplied in the syntax as specified in [http://www.unicode.org/reports/tr35/tr35\-collation.html\#Rules](http://www.unicode.org/reports/tr35/tr35-collation.html#Rules). (This used to be called the "basic syntax".) The rules must also be [Minimal Rules](https://cldr.unicode.org/index/cldr-spec/collation-guidelines) as described below: *only* differences from [http://unicode.org/charts/uca/](http://unicode.org/charts/uca/) should be specified.
+
+*\& c \< cs*
+
+\& cs \<\<\< ccs / cs
+
+Normally CLDR does not accept submissions that reorder *particular* digits, punctuation, or other symbols, following instead the UCA ordering for those characters. However, if punctuation, general symbols, currency symbols, or digits *as a class* all sort after letters, that change can be accommodated. Similarly, if the letters in a particular script sort ahead of others (such as Greek characters ahead of Latin), that can also be accommodated. Both of these are done with a reorder setting. Note: For a given language, CLDR normally sorts the language's native script before other scripts, via the reorder setting.
+
+### Test Data
+
+Please supply short test cases that illustrate the correct sorting behavior as a list of lines in sorted order. Try to include cases that show the boundary behavior by including suffixes, such as the following to illustrate that "cs" and "ccs" sort specially.
+
+c
+
+cy
+
+cs
+
+cscs
+
+ccs
+
+cscsy
+
+ccsy
+
+csy
+
+d
+
+### Justification
+
+Provide justification for your change. Citations should be to authoritative pages on the web, in English.
+
+### Testing Your Request
+
+Please test out any suggested rules before filing a bug.
+
+1. Go to the[ICU Collation Demo](http://demo.icu-project.org/icu-bin/collation.html).
+2. Pick the language for which you want to change the rules, or keep it on "und" (root) if you want to start from the Unicode/CLDR default sort order.
+3. Put your rules into the "Append rules" box.
+4. Put an interesting list of strings into the Input box.
+5. Click "sort" and verify the sort order and levels of differences.
+
+Or
+
+1. Go to the[ICU Locale Explorer](http://demo.icu-project.org/icu-bin/locexp).
+2. Pick the appropriate locale.
+3. Follow the instructions at the bottom to use your suggested rules on your suggested test data.
+4. Verify that the proper order results.
+
+## Determining the Order
+
+The exact collation sequence for a given language may be difficult to determine. The base ordering of characters can be fairly straightforward, but there are quite a few other complications involved.
+
+Most standards that specify collation, such as DIN or CS, are not targeted at algorithmic sorting, and are not complete algorithmic specifications. For example, CSN 97 6030 requires transliteration of foreign scripts, but there are many choices as to how to transliterate, and the exact mechanism is not specified. It also specifies that geometric shapes are sorted by the number of vertices and edges, which is, at a minimum, difficult to determine; and are subject to variation in glyphs.
+
+The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering. For more information, see [UTS \#10: Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Introduction) (UCA).
+
+### Determining Level Differences
+
+It is often tricky to determine the exact relationship between characters. In the UCA, case and similar variant differences are at a third (tertiary) level, while accent and similar differences are at a second (secondary) level, and base letter differences are at the first (primary) level. That results in an order like the following:
+
+1. cina
+2. Cina
+3. çina
+4. Çina
+5. **d**ina
+
+That is, the difference between c and C is weaker than the difference between c and ç, which in turn is weaker than the difference between c and d. For any two characters α and β, it may be very clear that α \< β, but not be clear what the right level difference is. To establish this, see if you can find examples of two words that of the following form.
+
+Primary Test
+
+1. ...α...Z
+2. ...β...A
+
+That is, the words are identical except for α, β, A, and Z, *and* you know that A and Z have a clear primary difference. If we get the above ordering in dictionaries and other sources, you know that the difference between α and β is a primary difference. If we get the opposite ordering than 1,2 above, then you only know that the difference between α and β is *not* a primary difference: it may be secondary or tertiary. 
+
+You now need to distinguish which of the non\-primary level differences you could have. So try again, this time seeing if you can find examples of two words that of the following form, where you know that A and Á have a clear secondary difference in the script.
+
+Secondary Test1
+
+1. ...α...Á
+2. ...β...A
+
+Now the ordering of these two strings tells you whether the difference between α and β is a secondary difference, or not. Alternatively, you can look for words of the form:
+
+Secondary Test2
+
+1. ...B...α
+2. ...b...β
+
+where b \< B at a tertiary level. If you get the above ordering for the secondary test2, you also know that the difference between α and β is at a secondary level. The Test2 form is often easier to find examples for.
+
+If you have established that the characters have neither a primary nor secondary difference, the following can be used in a similar fashion to test whether the difference is at a tertiary level or not.
+
+Tertiary Test
+
+1. ...α...B
+2. ...β...b
+
+If there is no tertiary difference, then the difference is not significant enough for CLDR to take it into account, so they will be treated as equals (unless someone sorts with a final, codepoint level).
+
+### Contractions
+
+Characters may behave differently in different contexts. For example, "ch" in Slovak sorts after H. A sequence of characters that behaves that way is called a contraction. Another common case of contractions is in the case of syllabaries, where a sequence of characters forming a syllable collates as a unit.
+
+Note that contractions are typically rather expensive in implementations: they take more storage, and are much slower to compare. So they should be avoided where possible. For example, suppose that we have the following sequence in a dictionary (where the uppercase characters represent characters in the target script):
+
+KB
+
+... // combinations of K with consonants
+
+KZ
+
+KA
+
+KE
+
+KI
+
+KO
+
+KU
+
+LB
+
+...
+
+There are two ways to produce this ordering. One is to have KA, KE, KI, etc be contractions. The other is to order all the vowels after all the consonants. Where the latter is sufficient, it is strongly preferred.
+
+## Minimal Rules
+
+The goal is always specify the ***minimal*** differences from the DUCET. For example, take the case of Slovak, where everything sorts as in DUCET except for certain characters. The following rules place the characters ä, č, đ, and the sequence "ch" (and their case variants) at the appropriate positions in the sorting sequence, and with the appropriate strengths:
+
+**Minimal Rules**
+
+\& A
+
+\< ä \<\<\< Ä
+
+\& C
+
+\< č \<\<\< Č
+
+\& D
+
+\< đ \<\<\< Đ
+
+\& H
+
+\< ch \<\<\< cH \<\<\< Ch \<\<\< CH
+
+...
+
+It would be possible instead to have rules that list every letter used by Slovak \[a á ä b c č d ď e é f\-h {ch} i í j\-l ĺ ľ m n ň o ó ô p\-r ŕ s š t ť u ú v\-y ý z ž], looking something like the following.
+
+**Maximal Rules**
+
+\& A \<\< á \<\<\< Á
+
+\< ä \<\<\< Ä
+
+\< b \<\<\< B
+
+\< c \<\<\< C
+
+\< č \<\<\< Č
+
+\< d
+
+...
+
+***The Maximal Rules format is not accepted in CLDR.*** The reasons are:
+
+1. Every time a character is tailored, the data for that character takes up more room in typical implementations. That means that the data for collation is larger, downloads of collation libraries with that data are slower, sort keys are longer, and performance is slower; sometimes very much so.
+2. Related characters in the same script are in a peculiar order. For example, if the Slovak tailoring omits ƀ, then it would show up as after z.
+
+You can see what the UCA currently does with a given script by looking at the charts at[Unicode Collation Charts](http://www.unicode.org/charts/collation/), or at the[UCA in ICU\-style rules](http://unicode.org/cldr/data/diff/collation/UCA.txt). For example, suppose that U\+0D89 SINHALA LETTER IYANNA and U\+0D8A SINHALA LETTER IIYANNA needed to come after U\+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
+
+\& ඖ \# U\+0D96 SINHALA LETTER AUYANNA
+
+\< ඉ \# U\+0D89 SINHALA LETTER IYANNA
+
+\< ඊ \# U\+0D8A SINHALA LETTER IIYANNA
+
+## Pitfalls
+
+There are a number of pitfalls with collation, so be careful. In some cases, such as Hungarian or Japanese, the rules can be fairly complicated (of course, reflecting that the sorting sequence for those languages is complicated).
+
+1. **Only tailor expected data.** We focus on the required collation sequence for a given language with normal data. So we don't include full\-width characters for a European collation sequence, such as
+	- ... CSCS \<\<\< ＣＳＣＳ ...
+	- ... CSCS \<\<\< \\uFF23\\uFF33\\uFF23\\uFF33 ... (equivalently)
+2. **Tailor trailing contractions.** If a sequence of characters is treated as a unit for collation, it should be entered as a contraction.
+	1. \& c \< ch
+	2. One might think that sequence like "dz" doesn't require that, since it would always come after "d" followed by any other letter; it is a "trailing contraction". But in unusual cases, that wouldn't be true; if "dz" is a unit sorted as if it were a distinct letter after "d", one should get the ordering "dα" \< "dz". The correct behavior will only happen if "dz" is a contraction, such as
+	3. \& d \< dz
+3. **Watch out for Expansions.** If you have a rule like \&cs \< d, and "cs" has not occurred in a previous rule as a contraction, then this is automatically considered to be the same as \&c \< d / s; that is, the d *expands* as if it were a "cs" (actually, primary greater than a "cs", since we wrote "\<"). This expansion takes effect until the next primary difference.
+	1. So suppose that "ccs" is to behave as if it were "cscs", and take case differences into account. You might try to do this with the rules on the left:
+
+| Rules (Wrong) | Actual Effect |
+|---|---|
+| \& C \< cs \<\<\< Cs \<\<\< CS | \& C \< cs \<\<\< Cs \<\<\< CS |
+| \& cscs \<\<\< ccs | **\& cs \<\<\< ccs / cs** |
+| \<\<\< Cscs \<\<\< Ccs | **\<\<\< Cscs / cs \<\<\< Ccs / cs** |
+| \<\<\< CSCS \<\<\< CCS | **\<\<\< CSCS / cs \<\<\< CCS / cs** |
+
+1. But since the CSCS has not been made a contraction in previous rules, this produces an automatic expansion, one that continues through the entire sequence of non\-primary differences, as shown on the right. This is *not* what is wanted: each item acts like it expands compared to the previous item. So CCS, for example, will act like it expands to CSCScs!
+2. What you actually want is the following:
+
+| Rules (Right) | Actual Effect |
+|---|---|
+| \& C \< cs \<\<\< Cs \<\<\< CS | \& C \< cs \<\<\< Cs \<\<\< CS |
+| \& cscs \<\<\< ccs | \& cs \<\<\< ccs / cs |
+| \& Cscs \<\<\< Ccs | \& Cs \<\<\< Ccs / cs |
+| \& CSCS \<\<\< CCS | \& CS \<\<\< CCS / CS |
+
+1. In short, when you have expansions, it is always safer and clearer to express them with separate resets. There are only a few exceptions to this, notably when CJK characters are interleaved with Hangul Syllables.
+
+1. **Minimal Rules.** Example: Maltese was sorting character sequences *before* a base character using the following style:
+	1. \& B
+	2. \< ċ
+	3. \<\<\<Ċ
+	4. \< c
+	5. \<\<\<C
+	6. The correct rules should be the minimal ones.
+	7. \& \[before 1] c \< ċ \<\<\< Ċ
+	8. This finds the highest primary (that's what the 1 is for) character less than c, and uses that as the reset point. For Maltese, the same technique needs to be used for ġ and ż.
+2. **Blocking Contractions.** Contractions can be blocked with CGJ, as described in the Unicode Standard and in the [Characters and Combining Marks FAQ](http://www.unicode.org/faq/char_combmark.html).
+3. **Case Combinations.** The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if *ch* is a contraction, then you would have the rules `... ch \<\<\< Ch \<\<\< CH`. Other case variants such as *cH* are excluded because they are unlikely to represent the contraction, for example in *McHugh*. (Therefore, *mchugh* and *McHugh* will be primary different if *ch* adds a primary difference.) \[[\#8248](http://unicode.org/cldr/trac/ticket/8248)]
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file
diff --git a/docs/site/index/cldr-spec/currency-process.md b/docs/site/index/cldr-spec/currency-process.md
new file mode 100644
index 00000000000..c721c516b7c
--- /dev/null
+++ b/docs/site/index/cldr-spec/currency-process.md
@@ -0,0 +1,18 @@
+---
+title: Currency Process
+---
+
+# Currency Process
+
+There are three stages for new currency symbols (such as the recent Russian, Indian, and Turkish symbols). The following shows the stage and the disposition in CLDR data:
+
+|  |  |  |
+|---|---|---|
+| 1 | Not widely adopted. | It is added as an alt value in the relevant locales (based on language and country codes). That means that it won't be the "stock" symbol for those locales, but will be accessible to implementations that support alt values. |
+| 2 | Adopted widely in fonts and keyboards used in the relevant locales. | It is added it to the relevant locales as the standard version. |
+| 3 | Widely recognized outside of the locales, and in most operating systems (Android, iOS, Windows, Mac — not just the latest versions, but also older ones that have significant market share). | Added to root as the standard version. |
+
+For more information, see [Currency Symbols \& Names](https://cldr.unicode.org/translation/currency-names-and-symbols/currency-names).
+
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file
diff --git a/docs/site/index/cldr-spec/definitions.md b/docs/site/index/cldr-spec/definitions.md
new file mode 100644
index 00000000000..a3b45a895ea
--- /dev/null
+++ b/docs/site/index/cldr-spec/definitions.md
@@ -0,0 +1,23 @@
+---
+title: Definitions
+---
+
+# Definitions
+
+***literate percent*** \- indicates the percentage of the country's population that is literate, based on literacy information from the World Bank, CIA Factbook, and others.
+
+***language population*** \- the number of people fluent in that language in that country, including both first and second language speakers. The level of fluency is that necessary to use a UI on a computer, smartphone, or similar devices. Reliable information is difficult to obtain; the information in CLDR is an estimate culled from different sources.
+
+***writing percent*** (writingPercent) \- percentage of the population fluent in that language in that country who regularly read or write a significant amount in that language. Ideally, the regularity would be measured as "7\-day actives". Reliable information is difficult to obtain; the information in CLDR is a best estimate culled from different sources. If it is know that the language is not widely written, but there are no solid figures, the value is typically given 1%\-5%.
+
+***customary modern usage*** \- The terms or characters commonly used in modern contexts: newspapers, journals, lay publications, street signs, commercial signage, common geographic names, company names, and so on. It does not include terms or characters that are only commonly used in technical or academic contexts such as mathematical expressions, archaic or historic texts, citations of archaic words, liturgical texts, or pedagogical use.
+
+***official language*** \- as used in CLDR, a language that can generally be used in communications with a central government. That is, people can expect that essentially all communication from the government is available in that language (ballots, information pamphlets, legal documents, …) and that they can use that language in communicating to the central government (petitions, forms, …).
+
+Official languages for a country are not necessarily the same as those with official legal status in the country. For example, Irish is declared to be an official language in Ireland, but English has no such formal status in the United States. Languages such as the latter are called *de facto* official languages. As another example, German has legal status in Italy, but cannot be used in all communications with the central government, and is thus not an official language of Italy for CLDR purposes. Such languages are *official regional* or *official minority languages*.
+
+***official regional language*** \- a language that is official (de jure or de facto) in a major region within a country, but does not qualify as an official language of the country as a whole. For example, it can be used in an official petition to a provincial government, but not the central government. The term "major" is meant to distinguish from smaller\-scale usage, such as for a town or village.
+
+***official minority language*** \- a language that has some official governmental status, but is not an official language of the country or of a substantial region.
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file
diff --git a/docs/site/index/process/cldr-data-retention-policy.md b/docs/site/index/process/cldr-data-retention-policy.md
new file mode 100644
index 00000000000..e18d9065c24
--- /dev/null
+++ b/docs/site/index/process/cldr-data-retention-policy.md
@@ -0,0 +1,14 @@
+---
+title: CLDR Data Retention Policy
+---
+
+# CLDR Data Retention Policy
+
+Certain types of CLDR data can become obsolete, often due to political reorganization or changes in policy within the various countries. When such changes occur, we leave the obsolete data in CLDR for a certain period of time in order to make it easier for applications to migrate to the newer codes. However, eventually it becomes necessary to remove obsolete data from the CLDR in order to keep the data from growing uncontrollably.
+
+The following guidelines have been discussed by the CLDR technical committee and serve as the basis for decision making about when obsolete codes and data are to be removed from the CLDR.
+
+1. Territory Names ( //ldml/localeDisplayNames/territories/territory\[@type\="XX"] ) \- Data is to remain in the CLDR for a period of 5 years after the territory code for territory "XX" is deprecated in the IANA Subtag Registry.
+2. Metazone Names ( //ldml/dates/timeZoneNames/metazone\[@type\="ZoneName"] \- Data is to remain in the CLDR for a period of 20 years after the metazone becomes "inactive" ( i.e. The zone name is not used in ANY country ). A spreadsheet listing the Inactive Metazones in CLDR and the dates when they became inactive can be found [here](https://docs.google.com/spreadsheets/d/1Oj1IVo2Vg6wtAhk0Xd3HcA04HKZmSPxksIpvduvSYw8/edit#gid=0).
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file
diff --git a/docs/site/index/survey-tool/faq-and-known-bugs.md b/docs/site/index/survey-tool/faq-and-known-bugs.md
new file mode 100644
index 00000000000..a64956a3471
--- /dev/null
+++ b/docs/site/index/survey-tool/faq-and-known-bugs.md
@@ -0,0 +1,72 @@
+---
+title: FAQ and Known Bugs
+---
+
+# FAQ and Known Bugs
+
+[**Survey Tool**](http://st.unicode.org/cldr-apps/survey) **\|** [**Accounts**](https://cldr.unicode.org/index/survey-tool/survey-tool-accounts) **\|** [**Guide**](https://cldr.unicode.org/translation/getting-started/guide) **\|** [**FAQ and Known Bugs**](https://cldr.unicode.org/index/survey-tool/faq-and-known-bugs)
+
+## FAQ (Frequently Asked Questions)
+
+***Q. Should I preserve the case of English words, like names of languages?***
+
+A. Beginning with CLDR 22, the new guidance is that names of items such as languages, regions, calendar and collation types, as well as names of months and weekdays in calendar data, should be capitalized as appropriate for the middle of body text. For more information, see the [Capitalization](https://cldr.unicode.org/translation/translation-guide-general/capitalization) section in the [Translation Guidelines](http://cldr.unicode.org/translation/).
+
+***Q. What about the warning about parentheses being discouraged in cases such as "(other)"***
+
+A. You need to remove "(other)" or the equivalent from language names. In general, you should avoid using parentheses in the names of languages, scripts, or regions if at all possible. There is more information about this in the zoomed view.
+
+***Q. Why is the tool slow?***
+
+A. The performance of the Survey Tool has been greatly improved compared to previous versions. However, we are constantly striving to improve performance and our ability to accommodate a larger user base.
+
+If you feel a task is taking an unusual amount of time, and it is a consistent problem, please please file a bug at [newticket](http://unicode.org/cldr/trac/newticket). In the ticket, describe exactly what operation is being attempted and approximately how long it is taking to receive a response.
+
+***Q. How are votes weighted and the "best" item picked?***
+
+A. You basically want to get multiple organizations to agree on the best value. For details on the voting process, see [Resolution Procedure](https://cldr.unicode.org/index/process).
+
+***Q. In the key, it says that the red box is a fallback. What does that mean?***
+
+A. The Unicode CLDR data uses inheritance. That means that if you are looking at *English (United Kingdom)* (a "sublocale") most of the data is inherited from *English* (which contains data for the US), called the "parent locale". Such data will show up as red. You only need to have different data in the sublocale where there are important differences in usage from the parent locale.
+
+Data in a sublocale may be *spuriously different*; that is, the parent's data may be perfectly acceptable in the sublocale, but somehow a difference has crept in. In that case, you should vote for the parent's data to reduce the gratuitous differences.
+
+***Q. But what I see is a funny symbol like Zxxx?***
+
+A. If there is no other translation available, what you will see is a "neutral" code, typically an ISO code. In cases where there is no such code available, such as for labels like "Month", then you may see English \-\- which needs to be translated.
+
+***Q. How do I delete an item?***
+
+A. You can only delete an item if you yourself have entered it, and there are no other votes. Click on the "Abstain" button for that row.
+
+To remove a spurious difference in a sublocale, vote for the red fallback item.
+
+***Q. What if I can't delete it?***
+
+A. It doesn't really matter much. What is really important is to make sure the the *right* item is voted for; so try to get consensus as described above. If all the alternatives are really wrong, and you really don't know what the right item would be, vote for the red fallback item.
+
+***Q. What if I want to just try out some changes, but don't want to affect the data?***
+
+A. Everyone can add data to "**Unknown or Invalid Language**" (und), so you can try out the Survey tool there without worry.
+
+**Q. What if I have questions?**
+
+A. You should click on the items you have questions about, and read the information in the right\-hand information panel.
+
+*In many cases, even seemingly straightforward translations like the language, script, and territory names have issues.*
+
+You can also go directly to the [Translation Guidelines](https://cldr.unicode.org/translation).
+
+If you have further questions, or problems with the Survey Tool, send a message to [cldr\-users@unicode.org](mailto:cldr-users@unicode.org).
+
+## Known Bugs, Issues, Restrictions
+
+The following are general known bugs and issues. For known issues in the current release, see [Translation Guidelines](https://cldr.unicode.org/translation). 
+
+1. The description of bulk uploading (http://cldr.unicode.org/index/survey-tool/upload) has not yet been updated for the new UI.
+2. The description of managing users (http://cldr.unicode.org/index/survey-tool/managing-users) has not yet been updated for the new UI.
+
+If you find additional problems, please [file a ticket](http://unicode.org/cldr/trac/newticket).
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file
diff --git a/docs/site/translation/translation-guide-general/references.md b/docs/site/translation/translation-guide-general/references.md
new file mode 100644
index 00000000000..78b38bfe7eb
--- /dev/null
+++ b/docs/site/translation/translation-guide-general/references.md
@@ -0,0 +1,73 @@
+---
+title: References
+---
+
+# References
+
+Sources and references may be standards or can also be dictionaries, journal style guides (such as *The Economist Style Guide for English*), and other available sources that provide guidance as to common practice. Online sources are preferred where available, since they can be more easily checked.
+
+The goal is to follow common, customary practice. For example, language or territory display names should use the most recognizable name in common usage. *This is generally not the official name.* For example, one would use "Switzerland" not "Swiss Confederation".
+
+Here are some possible resources for comparison of locale data. *This is* ***not*** *an endorsement of the sources, merely a collation of possibly\-useful links.* To suggest additions to this list, file a [Bug Report](http://www.unicode.org/cldr/filing_bug_reports.html).
+
+### General
+
+- [CIA World Factbook](https://www.cia.gov/the-world-factbook/)
+
+*For English, The Economist Style Guide* (unfortunately only hard copy):
+
+- http://www.amazon.com/exec/obidos/tg/detail/-/186197535X
+
+For other languages, there should be similar guides for major publications.
+
+### Exemplar Characters
+
+- https://developer.mimer.com
+- http://www.eki.ee/letter/
+- http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
+- http://www.omniglot.com/writing/
+- http://www.geonames.de/alphab.html
+- [UNGEGN: Working Group on Romanization Systems](http://www.eki.ee/wgrs/)
+
+### Language Names
+
+- http://www.geonames.de/
+
+### Territory / Region Names
+
+- http://unstats.un.org/unsd/geoinfo/
+- http://www.eki.ee/knn/lingid2.htm#WRLD
+- http://www.p.lodz.pl/I35/personal/jw37/EUROPE/europe.html
+- http://www.geonames.de/
+	- http://www.worldlanguage.com/Arabic/Countries/ (Use the links at the top switch languages)
+
+### Currencies
+
+- http://publications.eu.int/code/es/es-5000500.htm (Replace es with desired language code)
+- http://publications.eu.int/code/es/es-5000700.htm
+- http://publications.eu.int/
+- http://www.geonames.de/
+- http://www.globalfindata.com/gh/index.html
+
+### Collation
+
+- http://www.omniglot.com/writing/
+	- http://www.alphabets-world.com/ 
+- https://developer.mimer.com
+
+### Dates and Times
+
+- https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu
+- http://www.geonames.de/
+
+### Transliteration
+
+- [UNGEGN: Working Group on Romanization Systems](http://www.eki.ee/wgrs/)
+- [Transliteration of Non\-Roman Alphabets and Scripts](http://www.eki.ee/transliteration/)
+- [Standards for Archival Description: Romanization](http://www.archivists.org/catalog/stds99/chapter8.html)
+- [ISO\-15915 (Hindi)](http://ee.www.ee/transliteration/pdf/Hindi-Marathi-Nepali.pdf)
+- [ISO\-15915 (Gujarati)](http://ee.www.ee/transliteration/pdf/Gujarati.pdf)
+- [ISO\-15915 (Kannada)](http://ee.www.ee/transliteration/pdf/Kannada.pdf)
+- [ISCII\-91](http://www.cdacindia.com/html/gist/down/iscii_d.asp)
+
+![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file

From 521519e4a82496116058ef549e62e1bbd0e759c0 Mon Sep 17 00:00:00 2001
From: Chris Pyle <cpyle@unicode.org>
Date: Wed, 4 Sep 2024 16:17:46 -0400
Subject: [PATCH 2/3] CLDR-17566 txt diffs and minor changes

---
 .../cldr-data-retention-policy.txt            |  6 ++--
 .../TEMP-TEXT-FILES/collation-guidelines.txt  | 32 +++++++------------
 .../site/TEMP-TEXT-FILES/currency-process.txt |  3 ++
 .../index/cldr-spec/collation-guidelines.md   |  8 ++---
 4 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt b/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
index abf2bf2f7e3..8a8914b2a32 100644
--- a/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
+++ b/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
@@ -1,5 +1,5 @@
 CLDR Data Retention Policy
-Certain types of CLDR data can become obsolete, often due to political reorganization or changes in policy within the various countries.  When such changes occur, we leave the obsolete data in CLDR for a certain period of time in order to make it easier for applications to migrate to the newer codes.  However, eventually it becomes necessary to remove obsolete data from the CLDR in order to keep the data from growing uncontrollably.
+Certain types of CLDR data can become obsolete, often due to political reorganization or changes in policy within the various countries. When such changes occur, we leave the obsolete data in CLDR for a certain period of time in order to make it easier for applications to migrate to the newer codes. However, eventually it becomes necessary to remove obsolete data from the CLDR in order to keep the data from growing uncontrollably.
 The following guidelines have been discussed by the CLDR technical committee and serve as the basis for decision making about when obsolete codes and data are to be removed from the CLDR.
-1). Territory Names ( //ldml/localeDisplayNames/territories/territory[@type="XX"] ) - Data is to remain in the CLDR for a period of 5 years after the territory code for territory "XX" is deprecated in the IANA Subtag Registry.
-2). Metazone Names ( //ldml/dates/timeZoneNames/metazone[@type="ZoneName"] - Data is to remain in the CLDR for a period of 20 years after the metazone becomes "inactive" ( i.e. The zone name is not used in ANY country ).  A spreadsheet listing the Inactive Metazones in CLDR and the dates when they became inactive can be found here.
\ No newline at end of file
+Territory Names ( //ldml/localeDisplayNames/territories/territory[@type="XX"] ) - Data is to remain in the CLDR for a period of 5 years after the territory code for territory "XX" is deprecated in the IANA Subtag Registry.
+Metazone Names ( //ldml/dates/timeZoneNames/metazone[@type="ZoneName"] - Data is to remain in the CLDR for a period of 20 years after the metazone becomes "inactive" ( i.e. The zone name is not used in ANY country ). A spreadsheet listing the Inactive Metazones in CLDR and the dates when they became inactive can be found here.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt b/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
index 42a8e62c486..7b4d2d07b6d 100644
--- a/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
+++ b/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
@@ -109,35 +109,25 @@ Pitfalls
 There are a number of pitfalls with collation, so be careful. In some cases, such as Hungarian or Japanese, the rules can be fairly complicated (of course, reflecting that the sorting sequence for those languages is complicated).
 Only tailor expected data. We focus on the required collation sequence for a given language with normal data. So we don't include full-width characters for a European collation sequence, such as
 ... CSCS <<< ＣＳＣＳ ...
-...  CSCS <<< \uFF23\uFF33\uFF23\uFF33 ... (equivalently)
+... CSCS <<< \uFF23\uFF33\uFF23\uFF33 ... (equivalently)
 Tailor trailing contractions. If a sequence of characters is treated as a unit for collation, it should be entered as a contraction.
 & c < ch
 One might think that sequence like "dz" doesn't require that, since it would always come after "d" followed by any other letter; it is a "trailing contraction". But in unusual cases, that wouldn't be true; if "dz" is a unit sorted as if it were a distinct letter after "d", one should get the ordering "dα" < "dz". The correct behavior will only happen if "dz" is a contraction, such as
 & d < dz
 Watch out for Expansions. If you have a rule like &cs < d, and "cs" has not occurred in a previous rule as a contraction, then this is automatically considered to be the same as &c < d / s; that is, the d expands as if it were a "cs" (actually, primary greater than a "cs", since we wrote "<"). This expansion takes effect until the next primary difference.
 So suppose that "ccs" is to behave as if it were "cscs", and take case differences into account. You might try to do this with the rules on the left:
-Rules (Wrong)
-& C < cs <<< Cs <<< CS
-& cscs <<< ccs
-<<< Cscs <<< Ccs
-<<< CSCS <<< CCS
-Actual Effect
-& C < cs <<< Cs <<< CS
-& cs <<< ccs / cs
-<<< Cscs  / cs <<< Ccs  / cs
-<<< CSCS  / cs <<< CCS / cs
+Rules (Wrong)	Actual Effect
+& C < cs <<< Cs <<< CS	& C < cs <<< Cs <<< CS
+& cscs <<< ccs	& cs <<< ccs / cs
+<<< Cscs <<< Ccs	<<< Cscs / cs <<< Ccs / cs
+<<< CSCS <<< CCS	<<< CSCS / cs <<< CCS / cs
 But since the CSCS has not been made a contraction in previous rules, this produces an automatic expansion, one that continues through the entire sequence of non-primary differences, as shown on the right. This is not what is wanted: each item acts like it expands compared to the previous item. So CCS, for example, will act like it expands to CSCScs!
 What you actually want is the following:
-Rules (Right)
-& C < cs <<< Cs <<< CS
-& cscs <<< ccs
-& Cscs <<< Ccs
-& CSCS <<< CCS
-Actual Effect
-& C < cs <<< Cs <<< CS
-& cs <<< ccs / cs
-& Cs <<< Ccs / cs
-& CS <<< CCS / CS
+Rules (Right)	Actual Effect
+& C < cs <<< Cs <<< CS	& C < cs <<< Cs <<< CS
+& cscs <<< ccs	& cs <<< ccs / cs
+& Cscs <<< Ccs	& Cs <<< Ccs / cs
+& CSCS <<< CCS	& CS <<< CCS / CS
 In short, when you have expansions, it is always safer and clearer to express them with separate resets. There are only a few exceptions to this, notably when CJK characters are interleaved with Hangul Syllables.
 Minimal Rules. Example: Maltese was sorting character sequences before a base character using the following style:
 & B
diff --git a/docs/site/TEMP-TEXT-FILES/currency-process.txt b/docs/site/TEMP-TEXT-FILES/currency-process.txt
index 5bf003accbd..6b1a380eda6 100644
--- a/docs/site/TEMP-TEXT-FILES/currency-process.txt
+++ b/docs/site/TEMP-TEXT-FILES/currency-process.txt
@@ -1,3 +1,6 @@
 Currency Process
 There are three stages for new currency symbols (such as the recent Russian, Indian, and Turkish symbols). The following shows the stage and the disposition in CLDR data:
+1	Not widely adopted.	It is added as an alt value in the relevant locales (based on language and country codes). That means that it won't be the "stock" symbol for those locales, but will be accessible to implementations that support alt values.
+2	Adopted widely in fonts and keyboards used in the relevant locales.	It is added it to the relevant locales as the standard version.
+3	Widely recognized outside of the locales, and in most operating systems (Android, iOS, Windows, Mac — not just the latest versions, but also older ones that have significant market share).	Added to root as the standard version.
 For more information, see Currency Symbols & Names.
\ No newline at end of file
diff --git a/docs/site/index/cldr-spec/collation-guidelines.md b/docs/site/index/cldr-spec/collation-guidelines.md
index 54624617124..e9c65468728 100644
--- a/docs/site/index/cldr-spec/collation-guidelines.md
+++ b/docs/site/index/cldr-spec/collation-guidelines.md
@@ -52,7 +52,7 @@ Provide justification for your change. Citations should be to authoritative page
 
 Please test out any suggested rules before filing a bug.
 
-1. Go to the[ICU Collation Demo](http://demo.icu-project.org/icu-bin/collation.html).
+1. Go to the [ICU Collation Demo](http://demo.icu-project.org/icu-bin/collation.html).
 2. Pick the language for which you want to change the rules, or keep it on "und" (root) if you want to start from the Unicode/CLDR default sort order.
 3. Put your rules into the "Append rules" box.
 4. Put an interesting list of strings into the Input box.
@@ -60,7 +60,7 @@ Please test out any suggested rules before filing a bug.
 
 Or
 
-1. Go to the[ICU Locale Explorer](http://demo.icu-project.org/icu-bin/locexp).
+1. Go to the [ICU Locale Explorer](http://demo.icu-project.org/icu-bin/locexp).
 2. Pick the appropriate locale.
 3. Follow the instructions at the bottom to use your suggested rules on your suggested test data.
 4. Verify that the proper order results.
@@ -192,7 +192,7 @@ It would be possible instead to have rules that list every letter used by Slovak
 1. Every time a character is tailored, the data for that character takes up more room in typical implementations. That means that the data for collation is larger, downloads of collation libraries with that data are slower, sort keys are longer, and performance is slower; sometimes very much so.
 2. Related characters in the same script are in a peculiar order. For example, if the Slovak tailoring omits ƀ, then it would show up as after z.
 
-You can see what the UCA currently does with a given script by looking at the charts at[Unicode Collation Charts](http://www.unicode.org/charts/collation/), or at the[UCA in ICU\-style rules](http://unicode.org/cldr/data/diff/collation/UCA.txt). For example, suppose that U\+0D89 SINHALA LETTER IYANNA and U\+0D8A SINHALA LETTER IIYANNA needed to come after U\+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
+You can see what the UCA currently does with a given script by looking at the charts at [Unicode Collation Charts](http://www.unicode.org/charts/collation/), or at the [UCA in ICU\-style rules](http://unicode.org/cldr/data/diff/collation/UCA.txt). For example, suppose that U\+0D89 SINHALA LETTER IYANNA and U\+0D8A SINHALA LETTER IIYANNA needed to come after U\+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
 
 \& ඖ \# U\+0D96 SINHALA LETTER AUYANNA
 
@@ -243,6 +243,6 @@ There are a number of pitfalls with collation, so be careful. In some cases, suc
 	7. \& \[before 1] c \< ċ \<\<\< Ċ
 	8. This finds the highest primary (that's what the 1 is for) character less than c, and uses that as the reset point. For Maltese, the same technique needs to be used for ġ and ż.
 2. **Blocking Contractions.** Contractions can be blocked with CGJ, as described in the Unicode Standard and in the [Characters and Combining Marks FAQ](http://www.unicode.org/faq/char_combmark.html).
-3. **Case Combinations.** The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if *ch* is a contraction, then you would have the rules `... ch \<\<\< Ch \<\<\< CH`. Other case variants such as *cH* are excluded because they are unlikely to represent the contraction, for example in *McHugh*. (Therefore, *mchugh* and *McHugh* will be primary different if *ch* adds a primary difference.) \[[\#8248](http://unicode.org/cldr/trac/ticket/8248)]
+3. **Case Combinations.** The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if *ch* is a contraction, then you would have the rules `... ch <<< Ch <<< CH`. Other case variants such as *cH* are excluded because they are unlikely to represent the contraction, for example in *McHugh*. (Therefore, *mchugh* and *McHugh* will be primary different if *ch* adds a primary difference.) \[[\#8248](http://unicode.org/cldr/trac/ticket/8248)]
 
 ![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
\ No newline at end of file

From e38bcf97fdcf3f592dfdc8a0d4ef4993318183d4 Mon Sep 17 00:00:00 2001
From: Chris Pyle <cpyle@unicode.org>
Date: Wed, 4 Sep 2024 16:18:37 -0400
Subject: [PATCH 3/3] CLDR-17566 removing txt files

---
 .../cldr-data-retention-policy.txt            |   5 -
 .../TEMP-TEXT-FILES/collation-guidelines.txt  | 142 ------------------
 .../site/TEMP-TEXT-FILES/currency-process.txt |   6 -
 docs/site/TEMP-TEXT-FILES/definitions.txt     |   9 --
 .../TEMP-TEXT-FILES/faq-and-known-bugs.txt    |  34 -----
 docs/site/TEMP-TEXT-FILES/references.txt      |  45 ------
 6 files changed, 241 deletions(-)
 delete mode 100644 docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
 delete mode 100644 docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
 delete mode 100644 docs/site/TEMP-TEXT-FILES/currency-process.txt
 delete mode 100644 docs/site/TEMP-TEXT-FILES/definitions.txt
 delete mode 100644 docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
 delete mode 100644 docs/site/TEMP-TEXT-FILES/references.txt

diff --git a/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt b/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
deleted file mode 100644
index 8a8914b2a32..00000000000
--- a/docs/site/TEMP-TEXT-FILES/cldr-data-retention-policy.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-CLDR Data Retention Policy
-Certain types of CLDR data can become obsolete, often due to political reorganization or changes in policy within the various countries. When such changes occur, we leave the obsolete data in CLDR for a certain period of time in order to make it easier for applications to migrate to the newer codes. However, eventually it becomes necessary to remove obsolete data from the CLDR in order to keep the data from growing uncontrollably.
-The following guidelines have been discussed by the CLDR technical committee and serve as the basis for decision making about when obsolete codes and data are to be removed from the CLDR.
-Territory Names ( //ldml/localeDisplayNames/territories/territory[@type="XX"] ) - Data is to remain in the CLDR for a period of 5 years after the territory code for territory "XX" is deprecated in the IANA Subtag Registry.
-Metazone Names ( //ldml/dates/timeZoneNames/metazone[@type="ZoneName"] - Data is to remain in the CLDR for a period of 20 years after the metazone becomes "inactive" ( i.e. The zone name is not used in ANY country ). A spreadsheet listing the Inactive Metazones in CLDR and the dates when they became inactive can be found here.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt b/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
deleted file mode 100644
index 7b4d2d07b6d..00000000000
--- a/docs/site/TEMP-TEXT-FILES/collation-guidelines.txt
+++ /dev/null
@@ -1,142 +0,0 @@
-Collation Guidelines
-Collation sequences can be quite tricky to specify.
-The locale-based collation rules in Unicode CLDR specify customizations of the standard data for UTS #10: Unicode Collation Algorithm (UCA). Requests to change the collation order for a given locale, or to supply additional variants, need to follow the guidelines in this document.
-Filing a Request
-Requests to change the collation order for a given locale, or to supply additional variants should be filed as CLDR bug tickets. See CLDR Change Requests
-Rules
-The request should present the precise change expressed as rules. The rules must be supplied in the syntax as specified in http://www.unicode.org/reports/tr35/tr35-collation.html#Rules. (This used to be called the "basic syntax".) The rules must also be Minimal Rules as described below: only differences from http://unicode.org/charts/uca/ should be specified.
-& c < cs
-& cs <<< ccs / cs
-Normally CLDR does not accept submissions that reorder particular digits, punctuation, or other symbols, following instead the UCA ordering for those characters. However, if punctuation, general symbols, currency symbols, or digits as a class all sort after letters, that change can be accommodated. Similarly, if the letters in a particular script sort ahead of others (such as Greek characters ahead of Latin), that can also be accommodated. Both of these are done with a reorder setting. Note: For a given language, CLDR normally sorts the language's native script before other scripts, via the reorder setting.
-Test Data
-Please supply short test cases that illustrate the correct sorting behavior as a list of lines in sorted order. Try to include cases that show the boundary behavior by including suffixes, such as the following to illustrate that "cs" and "ccs" sort specially.
-c
-cy
-cs
-cscs
-ccs
-cscsy
-ccsy
-csy
-d
-Justification
-Provide justification for your change. Citations should be to authoritative pages on the web, in English.
-Testing Your Request
-Please test out any suggested rules before filing a bug.
-Go to the ICU Collation Demo.
-Pick the language for which you want to change the rules, or keep it on "und" (root) if you want to start from the Unicode/CLDR default sort order.
-Put your rules into the "Append rules" box.
-Put an interesting list of strings into the Input box.
-Click "sort" and verify the sort order and levels of differences.
-Or
-Go to the ICU Locale Explorer.
-Pick the appropriate locale.
-Follow the instructions at the bottom to use your suggested rules on your suggested test data.
-Verify that the proper order results.
-Determining the Order
-The exact collation sequence for a given language may be difficult to determine. The base ordering of characters can be fairly straightforward, but there are quite a few other complications involved.
-Most standards that specify collation, such as DIN or CS, are not targeted at algorithmic sorting, and are not complete algorithmic specifications. For example, CSN 97 6030 requires transliteration of foreign scripts, but there are many choices as to how to transliterate, and the exact mechanism is not specified. It also specifies that geometric shapes are sorted by the number of vertices and edges, which is, at a minimum, difficult to determine; and are subject to variation in glyphs.
-The CLDR goals are to match the sorting of exemplar letters and common punctuation and leave everything else to the standard UCA ordering. For more information, see UTS #10: Unicode Collation Algorithm (UCA).
-Determining Level Differences
-It is often tricky to determine the exact relationship between characters. In the UCA, case and similar variant differences are at a third (tertiary) level, while accent and similar differences are at a second (secondary) level, and base letter differences are at the first (primary) level. That results in an order like the following:
-cina
-Cina
-çina
-Çina
-dina
-That is, the difference between c and C is weaker than the difference between c and ç, which in turn is weaker than the difference between c and d. For any two characters α and β, it may be very clear that α < β, but not be clear what the right level difference is. To establish this, see if you can find examples of two words that of the following form.
-Primary Test
-...α...Z
-...β...A
-That is, the words are identical except for α, β, A, and Z, and you know that A and Z have a clear primary difference. If we get the above ordering in dictionaries and other sources, you know that the difference between α and β is a primary difference. If we get the opposite ordering than 1,2 above, then you only know that the difference between α and β is not a primary difference: it may be secondary or tertiary.
-You now need to distinguish which of the non-primary level differences you could have. So try again, this time seeing if you can find examples of two words that of the following form, where you know that A and Á have a clear secondary difference in the script.
-Secondary Test1
-...α...Á
-...β...A
-Now the ordering of these two strings tells you whether the difference between α and β is a secondary difference, or not. Alternatively, you can look for words of the form:
-Secondary Test2
-...B...α
-...b...β
-where b < B at a tertiary level. If you get the above ordering for the secondary test2, you also know that the difference between α and β is at a secondary level. The Test2 form is often easier to find examples for.
-If you have established that the characters have neither a primary nor secondary difference, the following can be used in a similar fashion to test whether the difference is at a tertiary level or not.
-Tertiary Test
-...α...B
-...β...b
-If there is no tertiary difference, then the difference is not significant enough for CLDR to take it into account, so they will be treated as equals (unless someone sorts with a final, codepoint level).
-Contractions
-Characters may behave differently in different contexts. For example, "ch" in Slovak sorts after H. A sequence of characters that behaves that way is called a contraction. Another common case of contractions is in the case of syllabaries, where a sequence of characters forming a syllable collates as a unit.
-Note that contractions are typically rather expensive in implementations: they take more storage, and are much slower to compare. So they should be avoided where possible. For example, suppose that we have the following sequence in a dictionary (where the uppercase characters represent characters in the target script):
-KB
-... // combinations of K with consonants
-KZ
-KA
-KE
-KI
-KO
-KU
-LB
-...
-There are two ways to produce this ordering. One is to have KA, KE, KI, etc be contractions. The other is to order all the vowels after all the consonants. Where the latter is sufficient, it is strongly preferred.
-Minimal Rules
-The goal is always specify the minimal differences from the DUCET. For example, take the case of Slovak, where everything sorts as in DUCET except for certain characters. The following rules place the characters ä, č, đ, and the sequence "ch" (and their case variants) at the appropriate positions in the sorting sequence, and with the appropriate strengths:
-Minimal Rules
-& A
-< ä <<< Ä
-& C
-< č <<< Č
-& D
-< đ <<< Đ
-& H
-< ch <<< cH <<< Ch <<< CH
-...
-It would be possible instead to have rules that list every letter used by Slovak [a á ä b c č d ď e é f-h {ch} i í j-l ĺ ľ m n ň o ó ô p-r ŕ s š t ť u ú v-y ý z ž], looking something like the following.
-Maximal Rules
-& A << á <<< Á
-< ä <<< Ä
-< b <<< B
-< c <<< C
-< č <<< Č
-< d
-...
-The Maximal Rules format is not accepted in CLDR. The reasons are:
-Every time a character is tailored, the data for that character takes up more room in typical implementations. That means that the data for collation is larger, downloads of collation libraries with that data are slower, sort keys are longer, and performance is slower; sometimes very much so.
-Related characters in the same script are in a peculiar order. For example, if the Slovak tailoring omits ƀ, then it would show up as after z.
-You can see what the UCA currently does with a given script by looking at the charts at Unicode Collation Charts, or at the UCA in ICU-style rules. For example, suppose that U+0D89 SINHALA LETTER IYANNA and U+0D8A SINHALA LETTER IIYANNA needed to come after U+0D96 SINHALA LETTER AUYANNA, in primary order, and that otherwise DUCET was ok. Then you would give the following rules:
-& ඖ # U+0D96 SINHALA LETTER AUYANNA
-< ඉ # U+0D89 SINHALA LETTER IYANNA
-< ඊ # U+0D8A SINHALA LETTER IIYANNA
-Pitfalls
-There are a number of pitfalls with collation, so be careful. In some cases, such as Hungarian or Japanese, the rules can be fairly complicated (of course, reflecting that the sorting sequence for those languages is complicated).
-Only tailor expected data. We focus on the required collation sequence for a given language with normal data. So we don't include full-width characters for a European collation sequence, such as
-... CSCS <<< ＣＳＣＳ ...
-... CSCS <<< \uFF23\uFF33\uFF23\uFF33 ... (equivalently)
-Tailor trailing contractions. If a sequence of characters is treated as a unit for collation, it should be entered as a contraction.
-& c < ch
-One might think that sequence like "dz" doesn't require that, since it would always come after "d" followed by any other letter; it is a "trailing contraction". But in unusual cases, that wouldn't be true; if "dz" is a unit sorted as if it were a distinct letter after "d", one should get the ordering "dα" < "dz". The correct behavior will only happen if "dz" is a contraction, such as
-& d < dz
-Watch out for Expansions. If you have a rule like &cs < d, and "cs" has not occurred in a previous rule as a contraction, then this is automatically considered to be the same as &c < d / s; that is, the d expands as if it were a "cs" (actually, primary greater than a "cs", since we wrote "<"). This expansion takes effect until the next primary difference.
-So suppose that "ccs" is to behave as if it were "cscs", and take case differences into account. You might try to do this with the rules on the left:
-Rules (Wrong)	Actual Effect
-& C < cs <<< Cs <<< CS	& C < cs <<< Cs <<< CS
-& cscs <<< ccs	& cs <<< ccs / cs
-<<< Cscs <<< Ccs	<<< Cscs / cs <<< Ccs / cs
-<<< CSCS <<< CCS	<<< CSCS / cs <<< CCS / cs
-But since the CSCS has not been made a contraction in previous rules, this produces an automatic expansion, one that continues through the entire sequence of non-primary differences, as shown on the right. This is not what is wanted: each item acts like it expands compared to the previous item. So CCS, for example, will act like it expands to CSCScs!
-What you actually want is the following:
-Rules (Right)	Actual Effect
-& C < cs <<< Cs <<< CS	& C < cs <<< Cs <<< CS
-& cscs <<< ccs	& cs <<< ccs / cs
-& Cscs <<< Ccs	& Cs <<< Ccs / cs
-& CSCS <<< CCS	& CS <<< CCS / CS
-In short, when you have expansions, it is always safer and clearer to express them with separate resets. There are only a few exceptions to this, notably when CJK characters are interleaved with Hangul Syllables.
-Minimal Rules. Example: Maltese was sorting character sequences before a base character using the following style:
-& B
-< ċ
-<<<Ċ
-< c
-<<<C
-The correct rules should be the minimal ones.
-& [before 1] c < ċ <<< Ċ
-This finds the highest primary (that's what the 1 is for) character less than c, and uses that as the reset point. For Maltese, the same technique needs to be used for ġ and ż.
-Blocking Contractions. Contractions can be blocked with CGJ, as described in the Unicode Standard and in the Characters and Combining Marks FAQ.
-Case Combinations. The lowercase, titlecase, and uppercase variants of contractions need to be supplied, with tertiary differences in that order (regardless of the caseFirst setting). That is, if ch is a contraction, then you would have the rules ... ch <<< Ch <<< CH. Other case variants such as cH are excluded because they are unlikely to represent the contraction, for example in McHugh. (Therefore, mchugh and McHugh will be primary different if ch adds a primary difference.) [#8248]
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/currency-process.txt b/docs/site/TEMP-TEXT-FILES/currency-process.txt
deleted file mode 100644
index 6b1a380eda6..00000000000
--- a/docs/site/TEMP-TEXT-FILES/currency-process.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-Currency Process
-There are three stages for new currency symbols (such as the recent Russian, Indian, and Turkish symbols). The following shows the stage and the disposition in CLDR data:
-1	Not widely adopted.	It is added as an alt value in the relevant locales (based on language and country codes). That means that it won't be the "stock" symbol for those locales, but will be accessible to implementations that support alt values.
-2	Adopted widely in fonts and keyboards used in the relevant locales.	It is added it to the relevant locales as the standard version.
-3	Widely recognized outside of the locales, and in most operating systems (Android, iOS, Windows, Mac — not just the latest versions, but also older ones that have significant market share).	Added to root as the standard version.
-For more information, see Currency Symbols & Names.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/definitions.txt b/docs/site/TEMP-TEXT-FILES/definitions.txt
deleted file mode 100644
index 3424c8e1a89..00000000000
--- a/docs/site/TEMP-TEXT-FILES/definitions.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-Definitions
-literate percent - indicates the percentage of the country's population that is literate, based on literacy information from the World Bank, CIA Factbook, and others.
-language population - the number of people fluent in that language in that country, including both first and second language speakers. The level of fluency is that necessary to use a UI on a computer, smartphone, or similar devices. Reliable information is difficult to obtain; the information in CLDR is an estimate culled from different sources.
-writing percent (writingPercent) - percentage of the population fluent in that language in that country who regularly read or write a significant amount in that language. Ideally, the regularity would be measured as "7-day actives". Reliable information is difficult to obtain; the information in CLDR is a best estimate culled from different sources. If it is know that the language is not widely written, but there are no solid figures, the value is typically given 1%-5%.
-customary modern usage - The terms or characters commonly used in modern contexts: newspapers, journals, lay publications, street signs, commercial signage, common geographic names, company names, and so on. It does not include terms or characters that are only commonly used in technical or academic contexts such as mathematical expressions, archaic or historic texts, citations of archaic words, liturgical texts, or pedagogical use.
-official language - as used in CLDR, a language that can generally be used in communications with a central government. That is, people can expect that essentially all communication from the government is available in that language (ballots, information pamphlets, legal documents, …) and that they can use that language in communicating to the central government (petitions, forms, …).
-Official languages for a country are not necessarily the same as those with official legal status in the country. For example, Irish is declared to be an official language in Ireland, but English has no such formal status in the United States. Languages such as the latter are called de facto official languages. As another example, German has legal status in Italy, but cannot be used in all communications with the central government, and is thus not an official language of Italy for CLDR purposes. Such languages are official regional or official minority languages.
-official regional language - a language that is official (de jure or de facto) in a major region within a country, but does not qualify as an official language of the country as a whole. For example, it can be used in an official petition to a provincial government, but not the central government. The term "major" is meant to distinguish from smaller-scale usage, such as for a town or village.
-official minority language - a language that has some official governmental status, but is not an official language of the country or of a substantial region.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt b/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
deleted file mode 100644
index b0e87302297..00000000000
--- a/docs/site/TEMP-TEXT-FILES/faq-and-known-bugs.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-FAQ and Known Bugs
-Survey Tool | Accounts | Guide | FAQ and Known Bugs
-FAQ (Frequently Asked Questions)
-Q. Should I preserve the case of English words, like names of languages?
-A. Beginning with CLDR 22, the new guidance is that names of items such as languages, regions, calendar and collation types, as well as names of months and weekdays in calendar data, should be capitalized as appropriate for the middle of body text. For more information, see the Capitalization section in the Translation Guidelines.
-Q. What about the warning about parentheses being discouraged in cases such as "(other)"
-A. You need to remove "(other)" or the equivalent from language names. In general, you should avoid using parentheses in the names of languages, scripts, or regions if at all possible. There is more information about this in the zoomed view.
-Q. Why is the tool slow?
-A. The performance of the Survey Tool has been greatly improved compared to previous versions. However, we are constantly striving to improve performance and our ability to accommodate a larger user base.
-If you feel a task is taking an unusual amount of time, and it is a consistent problem, please please file a bug at newticket. In the ticket, describe exactly what operation is being attempted and approximately how long it is taking to receive a response.
-Q. How are votes weighted and the "best" item picked?
-A. You basically want to get multiple organizations to agree on the best value. For details on the voting process, see Resolution Procedure.
-Q. In the key, it says that the red box is a fallback. What does that mean?
-A. The Unicode CLDR data uses inheritance. That means that if you are looking at English (United Kingdom) (a "sublocale") most of the data is inherited from English (which contains data for the US), called the "parent locale". Such data will show up as red. You only need to have different data in the sublocale where there are important differences in usage from the parent locale.
-Data in a sublocale may be spuriously different; that is, the parent's data may be perfectly acceptable in the sublocale, but somehow a difference has crept in. In that case, you should vote for the parent's data to reduce the gratuitous differences.
-Q. But what I see is a funny symbol like Zxxx?
-A. If there is no other translation available, what you will see is a "neutral" code, typically an ISO code. In cases where there is no such code available, such as for labels like "Month", then you may see English -- which needs to be translated.
-Q. How do I delete an item?
-A. You can only delete an item if you yourself have entered it, and there are no other votes. Click on the "Abstain" button for that row.
-To remove a spurious difference in a sublocale, vote for the red fallback item.
-Q. What if I can't delete it?
-A. It doesn't really matter much. What is really important is to make sure the the right item is voted for; so try to get consensus as described above. If all the alternatives are really wrong, and you really don't know what the right item would be, vote for the red fallback item.
-Q. What if I want to just try out some changes, but don't want to affect the data?
-A. Everyone can add data to "Unknown or Invalid Language" (und), so you can try out the Survey tool there without worry.
-Q. What if I have questions?
-A. You should click on the items you have questions about, and read the information in the right-hand information panel.
-In many cases, even seemingly straightforward translations like the language, script, and territory names have issues.
-You can also go directly to the Translation Guidelines.
-If you have further questions, or problems with the Survey Tool, send a message to cldr-users@unicode.org.
-Known Bugs, Issues, Restrictions
-The following are general known bugs and issues. For known issues in the current release, see Translation Guidelines.
-The description of bulk uploading (http://cldr.unicode.org/index/survey-tool/upload) has not yet been updated for the new UI.
-The description of managing users (http://cldr.unicode.org/index/survey-tool/managing-users) has not yet been updated for the new UI.
-If you find additional problems, please file a ticket.
\ No newline at end of file
diff --git a/docs/site/TEMP-TEXT-FILES/references.txt b/docs/site/TEMP-TEXT-FILES/references.txt
deleted file mode 100644
index 8f741fdf6f1..00000000000
--- a/docs/site/TEMP-TEXT-FILES/references.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-References
-Sources and references may be standards or can also be dictionaries, journal style guides (such as The Economist Style Guide for English), and other available sources that provide guidance as to common practice. Online sources are preferred where available, since they can be more easily checked.
-The goal is to follow common, customary practice. For example, language or territory display names should use the most recognizable name in common usage. This is generally not the official name. For example, one would use "Switzerland" not "Swiss Confederation".
-Here are some possible resources for comparison of locale data. This is not an endorsement of the sources, merely a collation of possibly-useful links. To suggest additions to this list, file a Bug Report.
-General
-CIA World Factbook
-For English, The Economist Style Guide (unfortunately only hard copy):
-http://www.amazon.com/exec/obidos/tg/detail/-/186197535X
-For other languages, there should be similar guides for major publications.
-Exemplar Characters
-https://developer.mimer.com
-http://www.eki.ee/letter/
-http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
-http://www.omniglot.com/writing/
-http://www.geonames.de/alphab.html
-UNGEGN: Working Group on Romanization Systems
-Language Names
-http://www.geonames.de/
-Territory / Region Names
-http://unstats.un.org/unsd/geoinfo/
-http://www.eki.ee/knn/lingid2.htm#WRLD
-http://www.p.lodz.pl/I35/personal/jw37/EUROPE/europe.html
-http://www.geonames.de/
-http://www.worldlanguage.com/Arabic/Countries/ (Use the links at the top switch languages)
-Currencies
-http://publications.eu.int/code/es/es-5000500.htm (Replace es with desired language code)
-http://publications.eu.int/code/es/es-5000700.htm
-http://publications.eu.int/
-http://www.geonames.de/
-http://www.globalfindata.com/gh/index.html
-Collation
-http://www.omniglot.com/writing/
-http://www.alphabets-world.com/
-https://developer.mimer.com
-Dates and Times
-https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu
-http://www.geonames.de/
-Transliteration
-UNGEGN: Working Group on Romanization Systems
-Transliteration of Non-Roman Alphabets and Scripts
-Standards for Archival Description: Romanization
-ISO-15915 (Hindi)
-ISO-15915 (Gujarati)
-ISO-15915 (Kannada)
-ISCII-91
\ No newline at end of file