Improve latin normalization · rutar-forks/nucleo@8e1318c

Commit

Improve latin normalization

This improves the normalization for Latin characters, mainly to
address the concerns in helix-editor#51. This adds a very large number of new
normalizations, especially in the 'Latin Extended Additional' block
which for some reason was missing every capital letter.

I did not add normalizations in any new Unicode blocks, but I did
slightly extend the 'Latin 1' block to also capture some of the
subscripts; this is for consistency with the 'Subscripts and
Superscripts' block which was previously handled. I also preserved
the actual implementation of the `normalize` function in terms of
the check order, etc. In particular, the generated code should be
approximately the same. To verify this, I ran some crude
benchmarks on a variety of input (all ASCII, sparse Unicode, heavy
Unicode, all outside normalizatio ranges) and there was no
observable difference, but definitely not super rigorous.

Finally, I inlined all of the char blocks, rather than replying on
the 'sparse table' static generation which was implemented
earlier. In particular, `normalization` is now a `const fn`. At
least in my mind it is a bit easier to read in this form. It also
makes it much clearer when characters are missed.

Loading branch information

alexrutar committed Nov 18, 2024

1 parent ef24853 commit 8e1318c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8e1318c`

Commit

There are no files selected for viewing

0 comments on commit 8e1318c

0 comments on commit `8e1318c`