Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we replace unicode_norm.rs with the unicode_norm crate? #14

Open
LaurenzV opened this issue Sep 26, 2024 · 7 comments
Open

Can we replace unicode_norm.rs with the unicode_norm crate? #14

LaurenzV opened this issue Sep 26, 2024 · 7 comments

Comments

@LaurenzV
Copy link
Contributor

LaurenzV commented Sep 26, 2024

I've attempted to do this in rustybuzz before, and the reason why I didn't end up pursuing this idea further is that, from what I gathered, the unicode_norm crate always decomposes a character as much as possible, while in harfbuzz (and currently in rustybuzz), we have a decomposition table that always decomposes it into exactly two components.

Not sure if that makes any difference in the end, but since rustybuzz should stay as similar to harfbuzz as possible, I didn't actually try it. Maybe we can try it for harfruzz, though?

@behdad
Copy link
Member

behdad commented Sep 26, 2024

Yeah HarfBuzz needs the 1:2 decomposition, which some libraries don't expose. It would be easier to add it to the unicode_norm crate in my opinion.

@dfrg
Copy link
Collaborator

dfrg commented Oct 23, 2024

My plan here is to just use icu4x which already has the low level composition functions (seemingly added in anticipation of supporting HarfBuzz :)

@behdad
Copy link
Member

behdad commented Oct 23, 2024

I think having an alternative to ICU would be nice, since that's a YUGE crate IIUC.

@dfrg
Copy link
Collaborator

dfrg commented Oct 23, 2024

No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.

@behdad
Copy link
Member

behdad commented Oct 24, 2024

No disagreement from me. One thing I’ve considered is adding a build script that pulls in the icu4x crates and extracts the necessary properties into a compact data structure. This would be a nice option for a standalone shaper for users who are not already consuming the icu4x crates.

Or do what everyone else does and roll your own Python code to read the UCD data and spew out code. Given HB uses this:

https://github.com/harfbuzz/harfbuzz/blob/main/src/gen-ucd-table.py

and that mostly uses packTab to pack tables, and I've started adding Rust output to it:

harfbuzz/packtab#5

looks like you might get a replacement for free.

@LaurenzV
Copy link
Contributor Author

We already have that, no? 😄 https://github.com/harfbuzz/harfruzz/blob/main/scripts/gen-unicode-norm-table.py

Althought this one is not using packTab yet.

@dfrg
Copy link
Collaborator

dfrg commented Oct 24, 2024

My primary concern is that I’d like to avoid pulling in a bunch of arbitrary unicode- crates.

I’m 100% on board with bundling our own UCD data and I don’t have strong feelings on whether this is generated with rust or python.

However, since Chrome (and the various Linebender projects) are planning on using icu4x for other things, it would be nice feature gate our bundled blobs and allow external implementations to avoid duplication. I suppose we just need HB style unicode funcs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants