Skip to content

Commit

Permalink
CLDR-11888 Update French speakers
Browse files Browse the repository at this point in the history
https://unicode-org.atlassian.net/browse/CLDR-11888 was created to update the French speakers for Djibouti but while I was researching that I found many other Francophone countries that significantly underestimated French populations. Most of those gaps probably come from the number being L1 users but the point of this file is L1+L2 users -- basically how many people in each country could use an interface in this language.

See the original data in:
https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf

mvn package -DskipTests=true
java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData
java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags

CLDR-11888 Update French speakers

https://unicode-org.atlassian.net/browse/CLDR-11888 was created to update the French speakers for Djibouti but while I was researching that I found many other Francophone countries that significantly underestimated French populations. Most of those gaps probably come from the number being L1 users but the point of this file is L1+L2 users -- basically how many people in each country could use an interface in this language.

See the original data in:
https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf

mvn package -DskipTests=true
java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData
java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags

CLDR-11888 Redo automated scripts after merge conflicts
  • Loading branch information
conradarcturus committed Oct 29, 2024
1 parent 8ed7fe4 commit 7c74957
Show file tree
Hide file tree
Showing 5 changed files with 53 additions and 23 deletions.
4 changes: 2 additions & 2 deletions common/supplemental/likelySubtags.xml
Original file line number Diff line number Diff line change
Expand Up @@ -895,7 +895,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_CY" to="el_Grek_CY"/> <!--?‧?‧Cyprus ➡ Greek‧Greek‧Cyprus-->
<likelySubtag from="und_CZ" to="cs_Latn_CZ"/> <!--?‧?‧Czechia ➡ Czech‧Latin‧Czechia-->
<likelySubtag from="und_DE" to="de_Latn_DE"/> <!--?‧?‧Germany ➡ German‧Latin‧Germany-->
<likelySubtag from="und_DJ" to="aa_Latn_DJ"/> <!--?‧?‧Djibouti ➡ Afar‧Latin‧Djibouti-->
<likelySubtag from="und_DJ" to="fr_Latn_DJ"/> <!--?‧?‧Djibouti ➡ French‧Latin‧Djibouti-->
<likelySubtag from="und_DK" to="da_Latn_DK"/> <!--?‧?‧Denmark ➡ Danish‧Latin‧Denmark-->
<likelySubtag from="und_DO" to="es_Latn_DO"/> <!--?‧?‧Dominican Republic ➡ Spanish‧Latin‧Dominican Republic-->
<likelySubtag from="und_DZ" to="ar_Arab_DZ"/> <!--?‧?‧Algeria ➡ Arabic‧Arabic‧Algeria-->
Expand Down Expand Up @@ -924,7 +924,7 @@ not be patched by hand, as any changes made in that fashion may be lost.
<likelySubtag from="und_HK" to="zh_Hant_HK"/> <!--?‧?‧Hong Kong SAR China ➡ Chinese‧Traditional‧Hong Kong SAR China-->
<likelySubtag from="und_HN" to="es_Latn_HN"/> <!--?‧?‧Honduras ➡ Spanish‧Latin‧Honduras-->
<likelySubtag from="und_HR" to="hr_Latn_HR"/> <!--?‧?‧Croatia ➡ Croatian‧Latin‧Croatia-->
<likelySubtag from="und_HT" to="ht_Latn_HT"/> <!--?‧?‧Haiti ➡ Haitian Creole‧Latin‧Haiti-->
<likelySubtag from="und_HT" to="fr_Latn_HT"/> <!--?‧?‧Haiti ➡ French‧Latin‧Haiti-->
<likelySubtag from="und_HU" to="hu_Latn_HU"/> <!--?‧?‧Hungary ➡ Hungarian‧Latin‧Hungary-->
<likelySubtag from="und_IC" to="es_Latn_IC"/> <!--?‧?‧Canary Islands ➡ Spanish‧Latin‧Canary Islands-->
<likelySubtag from="und_ID" to="id_Latn_ID"/> <!--?‧?‧Indonesia ➡ Indonesian‧Latin‧Indonesia-->
Expand Down
19 changes: 9 additions & 10 deletions common/supplemental/supplementalData.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1574,7 +1574,7 @@ XXX Code for transations where no currency is involved
<language type="fon" scripts="Latn"/>
<language type="fon" territories="BJ" alt="secondary"/>
<language type="fr" scripts="Latn" territories="BE BF BI BJ BL CA CD CF CG CH CI CM DJ DZ FR GA GF GN GP GQ HT KM LU MA MC MF MG ML MQ MU NC NE PF PM RE RW SC SN SY TD TG TN VU WF YT"/>
<language type="fr" scripts="Dupl" territories="DE GB IT NL PT RO TF US" alt="secondary"/>
<language type="fr" scripts="Dupl" territories="DE GB IT LB NL PT RO TF US" alt="secondary"/>
<language type="frc" scripts="Latn"/>
<language type="frm" scripts="Latn" alt="secondary"/>
<language type="fro" scripts="Latn" alt="secondary"/>
Expand Down Expand Up @@ -2739,9 +2739,9 @@ XXX Code for transations where no currency is involved
</territory>
<territory type="CH" gdp="733800000000" literacyPercent="99" population="8860570"> <!--Switzerland-->
<languagePopulation type="de" populationPercent="73" officialStatus="official"/> <!--German-->
<languagePopulation type="fr" populationPercent="67" officialStatus="official" references="R1030"/> <!--French-->
<languagePopulation type="gsw" writingPercent="5" populationPercent="65" officialStatus="de_facto_official" references="R1006"/> <!--Swiss German-->
<languagePopulation type="en" populationPercent="61" references="R1137"/> <!--English-->
<languagePopulation type="fr" populationPercent="21" officialStatus="official" references="R1137"/> <!--French-->
<languagePopulation type="it" populationPercent="4.3" officialStatus="official"/> <!--Italian-->
<languagePopulation type="lmo" writingPercent="5" populationPercent="4.1" references="R1086"/> <!--Lombard-->
<languagePopulation type="pt" populationPercent="3.4" references="R1316"/> <!--Portuguese-->
Expand Down Expand Up @@ -2899,10 +2899,10 @@ XXX Code for transations where no currency is involved
<languagePopulation type="en" populationPercent="99" officialStatus="de_facto_official" references="R1065"/> <!--English-->
</territory>
<territory type="DJ" gdp="7380000000" literacyPercent="67.9" population="994974"> <!--Djibouti-->
<languagePopulation type="fr" populationPercent="50" officialStatus="official" references="R1030"/> <!--French-->
<languagePopulation type="aa" populationPercent="42"/> <!--Afar-->
<languagePopulation type="so" populationPercent="41"/> <!--Somali-->
<languagePopulation type="ar" populationPercent="7.3" officialStatus="official"/> <!--Arabic-->
<languagePopulation type="fr" populationPercent="2.1" officialStatus="official"/> <!--French-->
</territory>
<territory type="DK" gdp="428400000000" literacyPercent="99" population="5973140"> <!--Denmark-->
<languagePopulation type="da" populationPercent="93" officialStatus="official"/> <!--Danish-->
Expand Down Expand Up @@ -3188,7 +3188,7 @@ XXX Code for transations where no currency is involved
</territory>
<territory type="HT" gdp="34410000000" literacyPercent="48.7" population="11753900"> <!--Haiti-->
<languagePopulation type="ht" populationPercent="81" officialStatus="official" references="R1029"/> <!--Haitian Creole-->
<languagePopulation type="fr" literacyPercent="100" populationPercent="4.7" officialStatus="official" references="R1030"/> <!--French-->
<languagePopulation type="fr" literacyPercent="100" populationPercent="42" officialStatus="official" references="R1030"/> <!--French-->
</territory>
<territory type="HU" gdp="388900000000" literacyPercent="99" population="9855750"> <!--Hungary-->
<languagePopulation type="hu" populationPercent="100" officialStatus="official"/> <!--Hungarian-->
Expand Down Expand Up @@ -3495,9 +3495,9 @@ XXX Code for transations where no currency is involved
<languagePopulation type="apc" populationPercent="100" references="R1173"/> <!--Levantine Arabic-->
<languagePopulation type="ar" populationPercent="86" officialStatus="official"/> <!--Arabic-->
<languagePopulation type="en" populationPercent="40"/> <!--English-->
<languagePopulation type="fr" populationPercent="38" references="R1030"/> <!--French-->
<languagePopulation type="hy" populationPercent="5.2"/> <!--Armenian-->
<languagePopulation type="ku_Arab" populationPercent="1.7"/> <!--Kurdish (Arabic)-->
<languagePopulation type="fr" populationPercent="0.37"/> <!--French-->
</territory>
<territory type="LC" gdp="4083000000" literacyPercent="90.1" population="168038"> <!--St. Lucia-->
<languagePopulation type="en" populationPercent="90" officialStatus="official"/> <!--English-->
Expand Down Expand Up @@ -3554,8 +3554,8 @@ XXX Code for transations where no currency is involved
<territory type="MA" gdp="337500000000" literacyPercent="67.1" population="37387600"> <!--Morocco-->
<languagePopulation type="ary" populationPercent="87"/> <!--Moroccan Arabic-->
<languagePopulation type="ar" populationPercent="62" officialStatus="official"/> <!--Arabic-->
<languagePopulation type="fr" populationPercent="35" officialStatus="de_facto_official" references="R1030"/> <!--French-->
<languagePopulation type="zgh" populationPercent="22" references="R1254"/> <!--Standard Moroccan Tamazight-->
<languagePopulation type="fr" populationPercent="20" officialStatus="de_facto_official" references="R1050"/> <!--French-->
<languagePopulation type="en" populationPercent="14" references="R1050"/> <!--English-->
<languagePopulation type="tzm" literacyPercent="25" populationPercent="9.8" officialStatus="official"/> <!--Central Atlas Tamazight-->
<languagePopulation type="shi" populationPercent="8.7"/> <!--Tachelhit-->
Expand Down Expand Up @@ -3659,10 +3659,10 @@ XXX Code for transations where no currency is involved
</territory>
<territory type="MU" gdp="33530000000" literacyPercent="88.8" population="1310500"> <!--Mauritius-->
<languagePopulation type="mfe" populationPercent="90"/> <!--Morisyen-->
<languagePopulation type="fr" populationPercent="73" officialStatus="official" references="R1030"/> <!--French-->
<languagePopulation type="en" populationPercent="72" officialStatus="official" references="R1152"/> <!--English-->
<languagePopulation type="bho" populationPercent="27"/> <!--Bhojpuri-->
<languagePopulation type="ur" populationPercent="5.2"/> <!--Urdu-->
<languagePopulation type="fr" populationPercent="3" officialStatus="official"/> <!--French-->
<languagePopulation type="ta" populationPercent="2.5"/> <!--Tamil-->
</territory>
<territory type="MV" gdp="11650000000" literacyPercent="98.4" population="388858"> <!--Maldives-->
Expand Down Expand Up @@ -4225,7 +4225,7 @@ XXX Code for transations where no currency is involved
<territory type="TN" gdp="153600000000" literacyPercent="79.1" population="12048800"> <!--Tunisia-->
<languagePopulation type="aeb" populationPercent="90"/> <!--Tunisian Arabic-->
<languagePopulation type="ar" populationPercent="90" officialStatus="official"/> <!--Arabic-->
<languagePopulation type="fr" populationPercent="74" officialStatus="official" references="R1132"/> <!--French-->
<languagePopulation type="fr" populationPercent="52" officialStatus="official" references="R1030"/> <!--French-->
</territory>
<territory type="TO" gdp="700400000" literacyPercent="99" population="104889"> <!--Tonga-->
<languagePopulation type="to" populationPercent="95" officialStatus="official"/> <!--Tongan-->
Expand Down Expand Up @@ -5517,7 +5517,7 @@ XXX Code for transations where no currency is involved
<reference type="R1027" uri="https://www.cia.gov/cia/publications/factbook/geos/pu.html">Many minor langs; Portuguese official</reference>
<reference type="R1028" uri="http://www.seasite.niu.edu/tagalog/essays_on_philippine_languages.htm">In this and other sources, such as Ethnologue, there is no estimate for number of users. http://en.wikipedia.org/wiki/Filipino_language http://www.ethnologue.com/show_language.asp?code=fil </reference>
<reference type="R1029" uri="http://www.ethnologue.com/show_language.asp?code=hat">Most of the population uses Creole; see also http://www.country-studies.com/haiti/creole,-literacy,-and-education.html http://en.wikipedia.org/wiki/French_language#Haiti</reference>
<reference type="R1030" uri="http://www.ethnologue.com/show_language.asp?code=fra">400k 2nd language speakers</reference>
<reference type="R1030" uri="https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf">[missing]</reference>
<reference type="R1031" uri="http://www.ethnologue.com/show_country.asp?name=CV">Official language, 37-77% literacy</reference>
<reference type="R1032" uri="http://www.ethnologue.com/show_country.asp?name=ER">Official language, used in some schools.</reference>
<reference type="R1033" uri="http://www.ciil.org/Main/Announcement/MBE_Programme/paper/paper2.htm">http://www.censusindia.net/cendat/datatable26.html</reference>
Expand Down Expand Up @@ -5619,7 +5619,6 @@ XXX Code for transations where no currency is involved
<reference type="R1129" uri="http://www.ethnologue.com/show_language.asp?code=skr">[missing]</reference>
<reference type="R1130" uri="http://en.wikipedia.org/wiki/R%C3%A9union">- Education is in French; using literacy rate * pop for French-using population</reference>
<reference type="R1131" uri="http://en.wikipedia.org/wiki/Singapore"> English is the first language learned by half the children by the time they reach preschool age; using 92.6% of pop for the English figure</reference>
<reference type="R1132" uri="http://en.wikipedia.org/wiki/Tunisia#Language">- using pop * literacy rate</reference>
<reference type="R1133" uri="http://en.wikipedia.org/wiki/Swahili_language">- 90 percent of approximately 39 million Tanzanians speak Swahili</reference>
<reference type="R1134" uri="http://en.wikipedia.org/wiki/Swahili_language">- Baganda generally don't speak Swahili, but it is in common use among the 25 million people elsewhere in the country, and is currently being implemented in schools nationwide (use 75% of Cpop for this figure)</reference>
<reference type="R1135" uri="https://en.wikipedia.org/wiki/Talian_dialect">[missing]</reference>
Expand Down
13 changes: 9 additions & 4 deletions common/testData/localeIdentifiers/likelySubtags.txt
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,11 @@ hsb-AQ ; hsb-Latn-AQ ; hsb-AQ ;
hsb-DE ; hsb-Latn-DE ; hsb ;
hsb-Egyp ; hsb-Egyp-DE ; hsb-Egyp ;
hsb-Latn ; hsb-Latn-DE ; hsb ;
ht ; ht-Latn-HT ; ht ;
ht-AQ ; ht-Latn-AQ ; ht-AQ ;
ht-Egyp ; ht-Egyp-HT ; ht-Egyp ;
ht-HT ; ht-Latn-HT ; ht ;
ht-Latn ; ht-Latn-HT ; ht ;
hu ; hu-Latn-HU ; hu ;
hu-AQ ; hu-Latn-AQ ; hu-AQ ;
hu-Egyp ; hu-Egyp-HU ; hu-Egyp ;
Expand Down Expand Up @@ -1173,7 +1178,7 @@ und-Cyrl-UZ ; uz-Cyrl-UZ ; uz-Cyrl ;
und-Cyrl-XK ; sr-Cyrl-XK ; sr-XK ;
und-DE ; de-Latn-DE ; de ;
und-DG ; en-Latn-DG ; en-DG ;
und-DJ ; aa-Latn-DJ ; aa-DJ ;
und-DJ ; fr-Latn-DJ ; fr-DJ ;
und-DK ; da-Latn-DK ; da ;
und-DM ; en-Latn-DM ; en-DM ;
und-DO ; es-Latn-DO ; es-DO ;
Expand Down Expand Up @@ -1236,7 +1241,7 @@ und-Guru-IN ; pa-Guru-IN ; pa ;
und-HK ; zh-Hant-HK ; zh-HK ;
und-HN ; es-Latn-HN ; es-HN ;
und-HR ; hr-Latn-HR ; hr ;
und-HT ; ht-Latn-HT ; ht ;
und-HT ; fr-Latn-HT ; fr-HT ;
und-HU ; hu-Latn-HU ; hu ;
und-Hans ; zh-Hans-CN ; zh ;
und-Hans-AQ ; zh-Hans-AQ ; zh-AQ ;
Expand Down Expand Up @@ -1365,7 +1370,7 @@ und-Latn-CY ; tr-Latn-CY ; tr-CY ;
und-Latn-CZ ; cs-Latn-CZ ; cs ;
und-Latn-DE ; de-Latn-DE ; de ;
und-Latn-DG ; en-Latn-DG ; en-DG ;
und-Latn-DJ ; aa-Latn-DJ ; aa-DJ ;
und-Latn-DJ ; fr-Latn-DJ ; fr-DJ ;
und-Latn-DK ; da-Latn-DK ; da ;
und-Latn-DM ; en-Latn-DM ; en-DM ;
und-Latn-DO ; es-Latn-DO ; es-DO ;
Expand Down Expand Up @@ -1401,7 +1406,7 @@ und-Latn-GY ; en-Latn-GY ; en-GY ;
und-Latn-HK ; en-Latn-HK ; en-HK ;
und-Latn-HN ; es-Latn-HN ; es-HN ;
und-Latn-HR ; hr-Latn-HR ; hr ;
und-Latn-HT ; ht-Latn-HT ; ht ;
und-Latn-HT ; fr-Latn-HT ; fr-HT ;
und-Latn-HU ; hu-Latn-HU ; hu ;
und-Latn-IC ; es-Latn-IC ; es-IC ;
und-Latn-ID ; id-Latn-ID ; id ;
Expand Down
26 changes: 26 additions & 0 deletions common/testData/localeIdentifiers/localeDisplayName.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1310,6 +1310,32 @@ nl-Latn-BE; flamšćina (łaćonsce)
zh-Hans-fonipa; chinšćina [zjednorjena] (FONIPA)


@locale=ht
@languageDisplay=standard

en-MM; anglais (Myanmar [Birmanie])
es; espagnol
es-419; espagnol (Amérique latine)
es-Cyrl-MX; espagnol (cyrillique, Mexique)
hi-Latn; hindi (latin)
nl-BE; néerlandais (Belgique)
nl-Latn-BE; néerlandais (latin, Belgique)
zh-Hans-fonipa; chinois (simplifié, alphabet phonétique international)


@locale=ht
@languageDisplay=dialect

en-MM; anglais (Myanmar [Birmanie])
es; espagnol
es-419; espagnol d’Amérique latine
es-Cyrl-MX; espagnol du Mexique (cyrillique)
hi-Latn; hindi (latin)
nl-BE; flamand
nl-Latn-BE; flamand (latin)
zh-Hans-fonipa; chinois simplifié (alphabet phonétique international)


@locale=hu
@languageDisplay=standard

Expand Down
Loading

0 comments on commit 7c74957

Please sign in to comment.