Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-15923 load UN Literacy data from new location #3512

Merged
merged 2 commits into from
Feb 23, 2024

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Feb 22, 2024

CLDR-15923

  • update docs (new location)
  • includes updated data
  • includes unit tests that will fail if parsing fails
  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@srl295 srl295 self-assigned this Feb 22, 2024
Copy link
Contributor

@pedberg-icu pedberg-icu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely organized!

@@ -416,6 +425,7 @@ public void startElement(
startElements.push(tempPath.toString());
chars.setLength(0); // clear garbage
lastIsStart = true;
simpleHandler.handleElement(tempPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little surprising, that its absence was never a problem before... oh, I think I see, the new SimpleHandler.handleElement doesn't currently do anything, and this is in case it eventually does

Copy link
Member Author

@srl295 srl295 Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a new (internal) API. There wasn't a way (in this API) to register for notification on a new element. The issue comes because the data is of this form:

<a>
   <b c="3"/>
   <b d="4"/>
</a>
<a>
   <b c="5"/>
   <b d="6"/>
</a>

the XPATH=based api would only produce:

//a/b[@c="3"]
//a/b[@d="4"]
 (need notification that we're on a new 'a' here)
//a/b[@c="5"]
//a/b[@c="6"]

since there's no identity on <a> this new API lets me reset the parser. Anyway, that's what it's for.

This isn't needed for CLDR's own data, because the <a> above always has a clear identity <a id="something"/> vs <a id="somethingElse"/> so produces a unique xpath. This minor API update lets me reuse our existing convenience functions without writing something entirely new for this single file.

Copy link
Member

@macchiati macchiati Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For normal CLDR there must be a distinguishing attribute for the first a and the second a to be different.

In CLDR, the mechanism for such distinctions is to add an _q distinguishing element. That also preserves order where needed. That could be done here as well, for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati this is the internal low-level XMLFileReader interface. _q is handled at the next layer up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normal CLDR data doesn't go through this interface, see other comment.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the _q discussion settled first. I'm ok with a special case, but I just don't want it to interfere with other tooling.

@srl295
Copy link
Member Author

srl295 commented Feb 23, 2024

I'd like the _q discussion settled first. I'm ok with a special case, but I just don't want it to interfere with other tooling.

take a look at the code change in context. This is just a new function on SimpleHandler. Normal CLDR data is loaded with XMLNormalizingLoader.loadXMLFile() which calls XMLFileReader.read(fullFileName, fis, -1, true, XML_HANDLER); - not SimpleHandler.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the background

@srl295 srl295 merged commit c9ab6de into unicode-org:main Feb 23, 2024
10 checks passed
@srl295 srl295 deleted the cldr-15923/un-literacy branch February 23, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants