Skip to content

Commit

Permalink
CLDR-10478 Add latest data from Macau census
Browse files Browse the repository at this point in the history
This updates the Macau population language data with up-to-date census information.

CLDR-10478 Update GenerateLikelyTestData

 java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelyTestData

CLDR-10478 Fix official languages for Macau

CLDR-10478 Remove `cmn` from Macau because of overlap with `zh`

Even though `cmn` knowledge is at 45% of Macau, since `zh` is implied to be `cmn` it ends up being double counted. Potentially we can separate `zh` from `cmn` -- but that's a whole new discussion that's best saved for later.

CLDR-10478 Add Cantonese (Macau) locale xml

Since I added a new locale that has "de_facto_official" status I need to add a new xml -- easy enough, I'll just have it inherit from root for now.

I also re-generated the test data with `java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelyTestData`

CLDR-10478 Add latest data from Macau census

This updates the Macau population language data with up-to-date census information.

CLDR-10478 Fix official languages for Macau

CLDR-10478 Remove `cmn` from Macau because of overlap with `zh`

Even though `cmn` knowledge is at 45% of Macau, since `zh` is implied to be `cmn` it ends up being double counted. Potentially we can separate `zh` from `cmn` -- but that's a whole new discussion that's best saved for later.
  • Loading branch information
conradarcturus committed Oct 29, 2024
1 parent 8ed7fe4 commit 78a78b0
Show file tree
Hide file tree
Showing 6 changed files with 65 additions and 12 deletions.
15 changes: 15 additions & 0 deletions common/main/yue_Hant_MO.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ldml SYSTEM "../../common/dtd/ldml.dtd">
<!-- Copyright © 1991-2024 Unicode, Inc.
For terms of use, see http://www.unicode.org/copyright.html
SPDX-License-Identifier: Unicode-3.0
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
-->
<ldml>
<identity>
<version number="$Revision$"/>
<language type="yue"/>
<script type="Hant"/>
<territory type="MO"/>
</identity>
</ldml>
2 changes: 1 addition & 1 deletion common/supplemental/likelySubtags.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1202,7 +1202,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_Latn_MA" to="fr_Latn_MA"/> <!--?‧Latin‧Morocco ➡ French‧Latin‧Morocco-->
<likelySubtag from="und_Latn_MK" to="sq_Latn_MK"/> <!--?‧Latin‧North Macedonia ➡ Albanian‧Latin‧North Macedonia-->
<likelySubtag from="und_Latn_MM" to="kac_Latn_MM"/> <!--?‧Latin‧Myanmar (Burma) ➡ Kachin‧Latin‧Myanmar (Burma)-->
<likelySubtag from="und_Latn_MO" to="pt_Latn_MO"/> <!--?‧Latin‧Macao SAR China ➡ Portuguese‧Latin‧Macao SAR China-->
<likelySubtag from="und_Latn_MO" to="en_Latn_MO"/> <!--?‧Latin‧Macao SAR China ➡ English‧Latin‧Macao SAR China-->
<likelySubtag from="und_Latn_MR" to="fr_Latn_MR"/> <!--?‧Latin‧Mauritania ➡ French‧Latin‧Mauritania-->
<likelySubtag from="und_Latn_MV" to="en_Latn_MV"/> <!--?‧Latin‧Maldives ➡ English‧Latin‧Maldives-->
<likelySubtag from="und_Latn_NP" to="en_Latn_NP"/> <!--?‧Latin‧Nepal ➡ English‧Latin‧Nepal-->
Expand Down
15 changes: 9 additions & 6 deletions common/supplemental/supplementalData.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1541,7 +1541,7 @@ XXX Code for transations where no currency is involved
<language type="eky" scripts="Kali"/>
<language type="el" scripts="Grek" territories="CY GR"/>
<language type="en" scripts="Latn" territories="AG AI AS AU BB BI BM BS BW BZ CA CC CK CM CQ CX DG DM ER FJ FK FM GB GD GG GH GI GM GU GY HK IE IM IN IO JE JM KE KI KN KY LC LR LS MG MH MP MS MT MU MW NA NF NG NR NU NZ PG PH PK PN PR PW RW SB SC SD SG SH SL SS SX SZ TC TK TO TT TV TZ UG UM US VC VG VI VU WS ZA ZM ZW"/>
<language type="en" scripts="Dsrt Shaw" territories="AC AE AR AT BA BD BE BG BR CH CL CY CZ DE DK DZ EE EG ES ET FI FR GR HR HU IL IQ IT JO KZ LB LK LT LU LV MA MV MX MY NL PL PT RO SE SI SK TA TH TR YE" alt="secondary"/>
<language type="en" scripts="Dsrt Shaw" territories="AC AE AR AT BA BD BE BG BR CH CL CY CZ DE DK DZ EE EG ES ET FI FR GR HR HU IL IQ IT JO KZ LB LK LT LU LV MA MO MV MX MY NL PL PT RO SE SI SK TA TH TR YE" alt="secondary"/>
<language type="enm" scripts="Latn" alt="secondary"/>
<language type="eo" scripts="Latn"/>
<language type="es" scripts="Latn" territories="AR BO CL CO CR CU DO EA EC ES GQ GT HN IC MX NI PA PE PR PY SV UY VE"/>
Expand Down Expand Up @@ -2410,7 +2410,7 @@ XXX Code for transations where no currency is involved
<language type="yrk" scripts="Cyrl"/>
<language type="yrl" scripts="Latn"/>
<language type="yua" scripts="Latn"/>
<language type="yue" scripts="Hans Hant"/>
<language type="yue" scripts="Hans Hant" territories="MO"/>
<language type="yue" territories="CN HK" alt="secondary"/>
<language type="za" scripts="Latn"/>
<language type="za" scripts="Hans" territories="CN" alt="secondary"/>
Expand Down Expand Up @@ -3630,9 +3630,12 @@ XXX Code for transations where no currency is involved
</territory>
<territory type="MO" gdp="71840000000" literacyPercent="95.6" population="644426"> <!--Macao SAR China-->
<languagePopulation type="zh_Hant" populationPercent="98" officialStatus="official"/> <!--Chinese (Traditional)-->
<languagePopulation type="pt" populationPercent="5" officialStatus="official" references="R1085"/> <!--Portuguese-->
<languagePopulation type="yue" populationPercent="86" officialStatus="de_facto_official" references="R1085"/> <!--Cantonese-->
<languagePopulation type="en" populationPercent="23" references="R1085"/> <!--English-->
<languagePopulation type="zh" populationPercent="5" references="R1143"/> <!--Chinese-->
<languagePopulation type="en" populationPercent="2.3" references="R1273"/> <!--English-->
<languagePopulation type="nan" populationPercent="3.7" references="R1273"/> <!--Min Nan Chinese-->
<languagePopulation type="fil" populationPercent="3.1" references="R1085"/> <!--Filipino-->
<languagePopulation type="pt" populationPercent="2.3" officialStatus="official" references="R1085"/> <!--Portuguese-->
</territory>
<territory type="MP" gdp="1242000000" literacyPercent="97" population="51118"> <!--Northern Mariana Islands-->
<languagePopulation type="en" populationPercent="97" officialStatus="de_facto_official"/> <!--English-->
Expand Down Expand Up @@ -5572,7 +5575,7 @@ XXX Code for transations where no currency is involved
<reference type="R1082" uri="http://www.ethnologue.com/show_language.asp?code=eng">Ethnologue lists 1 million 2nd lang users of English; no other good figures found.</reference>
<reference type="R1083" uri="http://www.bhas.ba/index.php?option=com_content&amp;view=article&amp;id=52&amp;itemid=80&amp;lang=en&amp;Itemid="> also: http://en.wikipedia.org/wiki/Bosnian_language</reference>
<reference type="R1084" uri="http://www.nationsonline.org/oneworld/equatorial_guinea.htm">French is a minority official language. Crude estimate of usage based on import partner data.</reference>
<reference type="R1085" uri="http://en.wikipedia.org/wiki/Geographic_distribution_of_Portuguese">Macao reported 5% native Portuguese speakers.</reference>
<reference type="R1085" uri="https://www.dsec.gov.mo/getAttachment/6cb29f2f-524a-488f-aed3-4d7207bb109e/E_CEN_PUB_2021_Y.aspx">2021 Census, counting people who are fluent in the language</reference>
<reference type="R1086">5% writing pop estimated in absence of other data</reference>
<reference type="R1087" uri="http://www.ethnologue.com/show_language.asp?code=rkt">[missing]</reference>
<reference type="R1088" uri="http://www.nationsonline.org/oneworld/syria.htm">Crude estimate based on import partner data.</reference>
Expand Down Expand Up @@ -5760,7 +5763,7 @@ XXX Code for transations where no currency is involved
<reference type="R1270">Mainly unwritten</reference>
<reference type="R1271">Vai script is the main script for this language.</reference>
<reference type="R1272">Latin listed as being used (Scriptsource) but no pop figures available.</reference>
<reference type="R1273" uri="https://www.cia.gov/library/publications/the-world-factbook/geos/mc.html">and https://en.wikipedia.org/wiki/Macau</reference>
<reference type="R1273" uri="https://www.dsec.gov.mo/getAttachment/7a3b17c2-22cc-4197-9bd5-ccc6eec388a2/E_CEN_PUB_2011_Y.aspx">2011 Census -- the language is not distinguished in the 2021 census</reference>
<reference type="R1274" uri="http://www.ethnologue.com/language/mgh">but no literacy data</reference>
<reference type="R1275" uri="http://www.ethnologue.com/language/pcm">Including 1st and 2nd lang speakers</reference>
<reference type="R1276" uri="http://www.ethnologue.com/show_language.asp?code=bin">[missing]</reference>
Expand Down
8 changes: 7 additions & 1 deletion common/testData/localeIdentifiers/likelySubtags.txt
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,11 @@ hsb-AQ ; hsb-Latn-AQ ; hsb-AQ ;
hsb-DE ; hsb-Latn-DE ; hsb ;
hsb-Egyp ; hsb-Egyp-DE ; hsb-Egyp ;
hsb-Latn ; hsb-Latn-DE ; hsb ;
ht ; ht-Latn-HT ; ht ;
ht-AQ ; ht-Latn-AQ ; ht-AQ ;
ht-Egyp ; ht-Egyp-HT ; ht-Egyp ;
ht-HT ; ht-Latn-HT ; ht ;
ht-Latn ; ht-Latn-HT ; ht ;
hu ; hu-Latn-HU ; hu ;
hu-AQ ; hu-Latn-AQ ; hu-AQ ;
hu-Egyp ; hu-Egyp-HU ; hu-Egyp ;
Expand Down Expand Up @@ -1435,7 +1440,7 @@ und-Latn-MG ; mg-Latn-MG ; mg ;
und-Latn-MH ; en-Latn-MH ; en-MH ;
und-Latn-MK ; sq-Latn-MK ; sq-MK ;
und-Latn-ML ; bm-Latn-ML ; bm ;
und-Latn-MO ; pt-Latn-MO ; pt-MO ;
und-Latn-MO ; en-Latn-MO ; en-MO ;
und-Latn-MP ; en-Latn-MP ; en-MP ;
und-Latn-MQ ; fr-Latn-MQ ; fr-MQ ;
und-Latn-MR ; fr-Latn-MR ; fr-MR ;
Expand Down Expand Up @@ -1738,6 +1743,7 @@ yue-Egyp ; yue-Egyp-HK ; yue-Egyp ;
yue-HK ; yue-Hant-HK ; yue ;
yue-Hans ; yue-Hans-CN ; yue-Hans ; yue-CN
yue-Hant ; yue-Hant-HK ; yue ;
yue-MO ; yue-Hant-MO ; yue-MO ;
za ; za-Latn-CN ; za ;
za-AQ ; za-Latn-AQ ; za-AQ ;
za-CN ; za-Latn-CN ; za ;
Expand Down
26 changes: 26 additions & 0 deletions common/testData/localeIdentifiers/localeDisplayName.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1310,6 +1310,32 @@ nl-Latn-BE; flamšćina (łaćonsce)
zh-Hans-fonipa; chinšćina [zjednorjena] (FONIPA)


@locale=ht
@languageDisplay=standard

en-MM; anglais (Myanmar [Birmanie])
es; espagnol
es-419; espagnol (Amérique latine)
es-Cyrl-MX; espagnol (cyrillique, Mexique)
hi-Latn; hindi (latin)
nl-BE; néerlandais (Belgique)
nl-Latn-BE; néerlandais (latin, Belgique)
zh-Hans-fonipa; chinois (simplifié, alphabet phonétique international)


@locale=ht
@languageDisplay=dialect

en-MM; anglais (Myanmar [Birmanie])
es; espagnol
es-419; espagnol d’Amérique latine
es-Cyrl-MX; espagnol du Mexique (cyrillique)
hi-Latn; hindi (latin)
nl-BE; flamand
nl-Latn-BE; flamand (latin)
zh-Hans-fonipa; chinois simplifié (alphabet phonétique international)


@locale=hu
@languageDisplay=standard

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -791,10 +791,13 @@ Luxembourg LU "605,764" 100% "62,110,000,000" official French fr 87%
Luxembourg LU "605,764" 100% "62,110,000,000" official German de 63%
Luxembourg LU "605,764" 100% "62,110,000,000" official Luxembourgish lb 67% 5% "http://www.ethnologue.com/show_language.asp?code=ltz Some 99% of users are literate in French or German. For languages not customarily written, the writing population is artificially set to 5% in the absence of better information."
Luxembourg LU "605,764" 100% "62,110,000,000" Portuguese pt 16% https://en.wikipedia.org/wiki/Portuguese_Luxembourger
Macao SAR China MO "606,340" 96% "71,820,000,000" Chinese zh 5% Hans literacy is unknown; set to 5% artificially pending better or official figures.
Macao SAR China MO "606,340" 96% "71,820,000,000" official Chinese (Traditional) zh_Hant 98%
Macao SAR China MO "606,340" 96% "71,820,000,000" English en "13,900" https://www.cia.gov/library/publications/the-world-factbook/geos/mc.html and https://en.wikipedia.org/wiki/Macau
Macao SAR China MO "606,340" 96% "71,820,000,000" official Portuguese pt 5% http://en.wikipedia.org/wiki/Geographic_distribution_of_Portuguese Macao reported 5% native Portuguese speakers.
Macao SAR China MO "682,070" 96% "71,820,000,000" Chinese zh 5% Hans literacy is unknown; set to 5% artificially pending better or official figures.
Macao SAR China MO "682,070" 96% "71,820,000,000" official Chinese (Traditional) zh_Hant 98%
Macao SAR China MO "682,070" 96% "71,820,000,000" English en 22.7% https://www.dsec.gov.mo/getAttachment/6cb29f2f-524a-488f-aed3-4d7207bb109e/E_CEN_PUB_2021_Y.aspx 2021 Census, counting people who are fluent in the language
Macao SAR China MO "682,070" 96% "71,820,000,000" official Portuguese pt 2.3% https://www.dsec.gov.mo/getAttachment/6cb29f2f-524a-488f-aed3-4d7207bb109e/E_CEN_PUB_2021_Y.aspx 2021 Census, counting people who are fluent in the language
Macao SAR China MO "682,070" 96% "71,820,000,000" de_facto_official Cantonese yue 86.2% https://www.dsec.gov.mo/getAttachment/6cb29f2f-524a-488f-aed3-4d7207bb109e/E_CEN_PUB_2021_Y.aspx 2021 Census, counting people who are fluent in the language
Macao SAR China MO "682,070" 96% "71,820,000,000" Filipino fil "20,879" https://www.dsec.gov.mo/getAttachment/6cb29f2f-524a-488f-aed3-4d7207bb109e/E_CEN_PUB_2021_Y.aspx 2021 Census, counting people who are fluent in the language
Macao SAR China MO "682,070" 96% "71,820,000,000" Hokkien nan 3.7% https://www.dsec.gov.mo/getAttachment/7a3b17c2-22cc-4197-9bd5-ccc6eec388a2/E_CEN_PUB_2011_Y.aspx 2011 Census -- the language is not distinguished in the 2021 census
Madagascar MG "25,683,610" 65% "39,850,000,000" official English en 18% No literacy figure available for English in Madagascar; newly adopted official language; 5% is an estimate.
Madagascar MG "25,683,610" 65% "39,850,000,000" official French fr 69%
Madagascar MG "25,683,610" 65% "39,850,000,000" official Malagasy mg 90% http://www.wildmadagascar.org/overview/loc/27-minorities.html
Expand Down

0 comments on commit 78a78b0

Please sign in to comment.