-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax node version (18.16.1 → 18.x) #967
Conversation
The saga of this bug: We observed a bug when updating to Node.js 18.17.0. This test failed: ``` 0 passing (362ms) 1 failing 1) label #localizedNameWithLocalGloss deduplicates matching anglicized and local names: AssertionError: expected [ 'L\'Aquila', undefined ] to deeply equal [ 'L’Aquila', 'L\'Aquila' ] + expected - actual [ + "L’Aquila" "L'Aquila" - [undefined] ] at expectGloss (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:309:71) at Context.<anonymous> (file:///home/jleedev/pkg/openstreetmap-americana/test/spec/label.js:379:7) ``` The label gloss code rendered "L’Aquila\n(L'Aquila)", reflecting a different label in Wikidata and OSM. We added a test case to enforce this. There are a few layers of abstraction at play, so let's unpack them one by one. In src/constants/label.js, we set the collator with "case-sensitive" to false, and "diacritic-sensitive" to true in all languages but false in English. (The intent here is to _not_ write the same name twice, with and without diacritics, which is clutter when the map is set to English.) This translates to an Intl.Collator instance having a "sensitivity" of, let's see, "base" for English (and "accent" otherwise), in maplibre-style-spec/src/expression/types/collator.ts. The fact that these two characters - ' (U+0027 APOSTROPHE) - ’ (U+2019 RIGHT SINGLE QUOTATION MARK) were _not_ folded together was not seen as desirable or optimal, but merely documenting a corner case amongst what we did want. So anyway, Unicode was updated, Node.js 18.17.0 was released, and this test case failed. ``` ¶ ASDF_NODEJS_VERSION=18.17.0 node > new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’") 0 > process.versions.icu '73.1' ¶ ASDF_NODEJS_VERSION=18.16.1 node > new Intl.Collator(["en"], {sensitivity:"base"}).compare("'", "’") -1 > process.versions.icu '72.1' ``` Who said anything about diacritics or accents? Those are punctuation marks. Oh well, thse aren't very precise words. The point is, the change was made. https://github.com/unicode-org/icu/releases/tag/release-73-1 > We are pleased to announce the release of Unicode® ICU 73. It updates to CLDR 43 locale data with various additions and corrections. https://blog.unicode.org/2023/04/the-unicode-cldr-v43-released.html > 5. *Collation & Searching* > - Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. https://cldr.unicode.org/index/downloads/cldr-43 > - *Collation & Searching* > - The default collation and searching now treats various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim. In searching they are treated as identical when ignoring case and accents; in collation they are ignored unless there are no primary differences (such as a vs b) and no preceding secondary differences (like a vs â). And the actual data change is somewhere in here: unicode-org/cldr@85c70ce Back to the map, it looks like the OSM node (n70990193) and Wikidata entry (Q3476) have the same names they used to, but I'm not seeing this in the tiles right now. Perhaps Planetiler picked up the CLDR change as well, who knows. Anyway, remove this test case, it doesn't reflect what browsers will do _now_, as they will all have the latest versions of Unicode. Allow the workflows to run on Node.js 18.x. Perhaps it would be nice to prefer the typographic apostrophe, but that's no different from the gloss logic as a whole: collate the labels together if they are the same modulo diacritic and case folding. We should generally prefer a label source (such as Wikidata) that supports such tendencies in order to get what we want.
Planetiler since switched to preferring OSM’s |
Either way, I don’t think it would be feasible for the client to prefer curly apostrophes on its own. We couldn’t for example replace straight apostrophes with curly ones, because there would be too many edge cases even if we had context about the name’s language. (Unicode conflates the apostrophe with a closing quotation mark in the same codepoint, but this is not a closing quotation mark in some languages.) To the extent that curly apostrophes are a good thing – and I personally think they are – it should be the data source’s responsibility to provide them in the appropriate names. |
The saga of this bug:
We observed a bug when updating to Node.js 18.17.0. This test failed:
The label gloss code rendered "L’Aquila\n(L'Aquila)", reflecting a different label in Wikidata and OSM. We added a test case to enforce this.
There are a few layers of abstraction at play, so let's unpack them one by one.
In src/constants/label.js, we set the collator with "case-sensitive" to false, and "diacritic-sensitive" to true in all languages but false in English. (The intent here is to not write the same name twice, with and without diacritics, which is clutter when the map is set to English.)
This translates to an Intl.Collator instance having a "sensitivity" of, let's see, "base" for English (and "accent" otherwise), in maplibre-style-spec/src/expression/types/collator.ts.
The fact that these two characters
were not folded together was not seen as desirable or optimal, but merely documenting a corner case amongst what we did want.
So anyway, Unicode was updated, Node.js 18.17.0 was released, and this test case failed.
Who said anything about diacritics or accents? Those are punctuation marks. Oh well, thse aren't very precise words. The point is, the change was made.
https://github.com/unicode-org/icu/releases/tag/release-73-1
https://blog.unicode.org/2023/04/the-unicode-cldr-v43-released.html
https://cldr.unicode.org/index/downloads/cldr-43
And the actual data change is somewhere in here: unicode-org/cldr@85c70ce
Back to the map, it looks like the OSM node (n70990193) and Wikidata entry (Q3476) have the same names they used to, but I'm not seeing this in the tiles right now. Perhaps Planetiler picked up the CLDR change as well, who knows.
Anyway, remove this test case, it doesn't reflect what browsers will do now, as they will all have the latest versions of Unicode. Allow the workflows to run on Node.js 18.x.
Perhaps it would be nice to prefer the typographic apostrophe, but that's no different from the gloss logic as a whole: collate the labels together if they are the same modulo diacritic and case folding. We should generally prefer a label source (such as Wikidata) that supports such tendencies in order to get what we want.