-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-18095 Remove Japonic languageGroup from Altaic #4239
base: main
Are you sure you want to change the base?
CLDR-18095 Remove Japonic languageGroup from Altaic #4239
Conversation
CLDR-18095 The Altaic hypothesis, while really fun, is largely discredited. This change organizes the Japonic languages into a group, and removes them & similar languages from the Altaic language group.
The data is generated directly from wikipedia. If the change is desired, it
has to be done by tweaking the tool and regenerating the data.
…On Wed, Dec 11, 2024 at 10:00 AM Tom Bishop ***@***.***> wrote:
image.png (view on web)
<https://github.com/user-attachments/assets/c7fa0545-0a86-4dad-9348-3dbb99e9cc2f>
LGTM. The ticket isn't accepted yet, though. I'm curious, how is this data
used?
—
Reply to this email directly, view it on GitHub
<#4239 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMGVKUHQ64BMPVKIHED2FB4SJAVCNFSM6AAAAABTMWYS7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZWG4YTGOBWGY>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Whoops I left it in Investigate -- I'll put it in Investigate Complete after I check out Mark's comments about the data being generated from Wikipedia. The file doesn't refer to how its generated so I didn't know. I may not be quick about it since I'm just making these changes in my downtime in my trip in Europe.
I don't believe we currently utilize it in CLDR or ICU for other reasons, but it's there for consumers of CLDR to have the data in an XML format. |
@conradarcturus FYI, I see that a PR of mine, with changes to languageGroup.xml among others, was merged back in March 2024: #3538 I don't remember whether those changes were made directly by a tool, or by me hand-editing in response to messages shown when running a tool. Seemingly the only tool that writes languageGroup.xml is GenerateLanguageContainment. Searching through the notes I made at the time, I don't see a record that I ever ran GenerateLanguageContainment, so probably it was hand-editing to satisfy a consistency-checking tool. I wish auto-generated files always had comments at the top saying "Do not edit! This file is auto-generated by ..." |
Actually, looking at #3538 I see that I wrote there about hand-editing likelySubtags.xml in discussion with @macchiati -- and GenerateMaximalLocales is disabled, not to be confused with GenerateLanguageContainment... |
I agree: always generating an XML comment for every generated file (or in
some cases, an element or range of elements in an otherwise non-generated
file*) is a good idea**. It should point to the generating program, or
perhaps even better, a site .md file that names the generating program,
documents how to use it, and how to modify it.
Mark
* Ideally we would move these out into separate files.
** My rule of thumb was to look at the BRS for all the tasks that generate
files to check, but explicit comments are better.
…On Thu, Dec 12, 2024 at 9:39 PM Tom Bishop ***@***.***> wrote:
@conradarcturus <https://github.com/conradarcturus> FYI, I see that a PR
of mine, with changes to languageGroup.xml among others, was merged back in
March 2024: #3538 <#3538>
I don't remember whether those changes were made directly by a tool, or by
me hand-editing in response to messages shown when running a tool.
Seemingly the only tool that writes languageGroup.xml is
GenerateLanguageContainment. Searching through the notes I made at the
time, I don't see a record that I ever ran GenerateLanguageContainment, so
probably it was hand-editing to satisfy a consistency-checking tool.
I wish auto-generated files always had comments at the top saying "Do not
edit! This file is auto-generated by ..."
—
Reply to this email directly, view it on GitHub
<#4239 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMC7QLV42MTOED5KDB32FHYAPAVCNFSM6AAAAABTMWYS7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZZHE3TCOBZGA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
GenerateMaximalLocales is now GenerateLikelySubtags Looking into it, GenerateLanguageContainment is the file we are looking for. There's a number of hard-coded relations in that file (like ja -> jpx -> mul). So to fulfill this ticket I may have to add new hard-coded relations. BUT let's first see if the wikipedia data has changed. However, for the life of me, I cannot get
|
This seems to work for me:
I haven't done that before and I'm not even sure what RDF stands for; maybe "Resource Description Framework"? I get this output:
|
|
@btangmu i edited your comment to put the long log behind a 'details' section. |
CLDR-18095
The Altaic hypothesis, while really fun, is largely discredited. This change organizes the Japonic languages into a group, and removes them & Koreanic languages from the Altaic language group. FWIW there is no ISO 639-5 Koreanic group.
I'm happy to make more LangaugeGroup changes, I just limited this to the discussion in the specific ticket.
ALLOW_MANY_COMMITS=true