Speed up Unicode Data lookups and remove reflection. #281

JimBobSquarePants · 2022-06-24T13:39:14Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This PR does a couple of things.

Removes reflection from UnicodeTrie reading. Fixes UnicodeData class is not working without reflection #279
Removes bounds checks when performing Unicode property lookups. Touches Measuring text with TextMeasurer.Measure is slow #268
Adds a benchmark project

I chose to remove the compression from the trie packing for performance since the binaries do not grow out of hand and the benefit is great. @Gillibald maybe this is something to consider for Avalonia?

There's no before/after benchmarks unfortunately because I added them as an afterthought, however I took a sampling snapshot early in the process. As you can see, loading Unicode data is no longer a hotspot.

Before

After

I don't want to go crazy optimizing much further in this PR but will have a good scan through the sampling data to see if there's some low hanging fruit at another time.

CC @Sergio0694 @kant2002 @0xced

codecov · 2022-06-24T13:57:22Z

Codecov Report

Merging #281 (7438b74) into main (1b80769) will increase coverage by 1%.
The diff coverage is 99%.

@@          Coverage Diff           @@
##            main    #281    +/-   ##
======================================
+ Coverage     81%     83%    +1%     
======================================
  Files        215     222     +7     
  Lines      11278   12200   +922     
  Branches    1757    1756     -1     
======================================
+ Hits        9244   10167   +923     
+ Misses      1607    1606     -1     
  Partials     427     427

Flag	Coverage Δ
unittests	`83% <99%> (+1%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...bors.Fonts/Unicode/Resources/BidiTrie.Generated.cs	`100% <ø> (ø)`
....Fonts/Unicode/Resources/GraphemeTrie.Generated.cs	`100% <ø> (ø)`
...Fonts/Unicode/Resources/LineBreakTrie.Generated.cs	`100% <ø> (ø)`
...rs.Fonts/Unicode/Resources/ScriptTrie.Generated.cs	`100% <ø> (ø)`
...Unicode/Resources/UnicodeCategoryTrie.Generated.cs	`100% <ø> (ø)`
src/SixLabors.Fonts/Unicode/UnicodeTrieBuilder.cs	`85% <50%> (ø)`
src/SixLabors.Fonts/Unicode/BidiClass.cs	`62% <100%> (ø)`
src/SixLabors.Fonts/Unicode/CodePoint.cs	`91% <100%> (+<1%)`	⬆️
src/SixLabors.Fonts/Unicode/JoiningClass.cs	`100% <100%> (ø)`
...s/Unicode/Resources/ArabicShapingTrie.Generated.cs	`100% <100%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b80769...7438b74. Read the comment docs.

Gillibald · 2022-06-24T15:46:21Z

Nice work @JimBobSquarePants

kant2002 · 2022-06-24T16:39:32Z

I'm not sure how do you get trie files, but I think it's better to keep source trie files in source control and document generation process. I understand that nobody want run file generation process on the build, but that process easy to forget, and maybe somebody would like to improve it, so better to have source files around.
Or maybe I miss something in the changes ?

JimBobSquarePants · 2022-06-24T17:00:39Z

Thanks @Gillibald your idea to create classes was a great one!

@kant2002 The trie files are created via the Generator project and simply contained a header and the parsed bytes. That project reads a bunch of Unicode spec data files and creates the output.

Here's the project in the main branch.
https://github.com/SixLabors/Fonts/tree/main/src/UnicodeTrieGenerator

If you look at Generator.cs in this PR we're simply switching out the file generation, instead opting to create classes containing the trie data. All the source files required to create those classes are in the solution.

We definitely do not need to build them every time, though the process is pretty quick since the spec files are stored locally. We actually won't need to build these again until Unicode 15 is released and that will require a dedicated PR to do so there's no opportunity to forget.

Sergio0694 · 2022-06-24T17:06:44Z

Very nice! I assume this will also be easier to maintain than a locally referenced source generator, so we can shelf that idea for now. I think this approach makes sense given you're now leveraging this to also do some more custom preprocessing when generating the blobs, and not just serializing an already existing file. Very nice work James! 😄

kant2002 · 2022-06-24T18:47:25Z

@JimBobSquarePants thanks for explanation. Yeah, now it make sense to me. Thanks you for caring about such niche workloads 😄

Gillibald · 2022-06-24T19:32:08Z

I will adapt your changes for Avalonia. Really appreciate your work.

JimBobSquarePants · 2022-06-29T09:03:03Z

Imma gonna merge this.

DaZombieKiller · 2022-06-29T13:44:58Z

I just thought it'd be worth pointing out that the UnicodeTrie-related code won't work on big endian systems (and will potentially fail or throw DataMisalignedException on systems where aligned access is required). Something more along the lines of this would solve both of those issues:

struct UnicodeTrieHeader
{
    public int HighStart;
    public uint ErrorValue;
    public int DataLength;
}

// ...

readonly uint[] data;
readonly int highStart;
readonly uint errorValue;

public UnicodeTrie(ReadOnlySpan<byte> rawData)
{
    // MemoryMarshal.Read instead of MemoryMarshal.Cast/Unsafe.As
    // because it performs an unaligned read and a bounds check.
    var header = MemoryMarshal.Read<UnicodeTrieHeader>(rawData);

    if (!BitConverter.IsLittleEndian)
    {
        header.HighStart  = BinaryPrimitives.ReverseEndianness(header.HighStart);
        header.ErrorValue = BinaryPrimitives.ReverseEndianness(header.ErrorValue);
        header.DataLength = BinaryPrimitives.ReverseEndianness(header.DataLength);
    }

    highStart  = header.HighStart;
    errorValue = header.ErrorValue;
    data       = new uint[header.DataLength / sizeof(uint)];

    // Avoid MemoryMarshal.Cast because we don't know if the rawData buffer is aligned
    // to uint, which would potentially throw DataMisalignedException if it weren't.
    rawData[Unsafe.SizeOf<UnicodeTrieHeader>()..].CopyTo(MemoryMarshal.AsBytes(data.AsSpan()));

    if (!BitConverter.IsLittleEndian)
    {
        for (int i = 0; i < data.Length; i++)
        {
            data[i] = BinaryPrimitives.ReverseEndianness(data[i]);
        }
    }
}

JimBobSquarePants · 2022-06-29T13:51:38Z

We know that the buffer is aligned and we also know the endianness though? We create the data.

DaZombieKiller · 2022-06-29T13:55:47Z

I wasn't able to find any actual usages of the ReadOnlySpan<byte> constructor, so that wasn't immediately clear to me and it seemed like a potential oversight. If that's the case then you can disregard my points above.

TechPizzaDev · 2022-06-29T14:54:07Z

The data is coming from the ReadOnlySpan<byte> constants isn't it? That means that casting the bytes (which are most likely little-endian) into integers will break on big-endian.

JimBobSquarePants · 2022-06-29T15:23:26Z

Ooh. You're right! The endianness definitely matters since it's a compiled constant. I'll update unless you want to create a PR @DaZombieKiller ?

DaZombieKiller · 2022-06-30T05:00:31Z

@JimBobSquarePants Sure! Opened one here: #284

JimBobSquarePants added 4 commits June 16, 2022 23:42

Remove bounds checks when fetching Unicode data

0763b8c

Add benchmarks

f109eca

Remove reflection namespace

00d86f5

Use generated classes over embedded resources

3d2e5b4

JimBobSquarePants added area:aot performance labels Jun 24, 2022

JimBobSquarePants requested review from brianpopow and tocsoft June 24, 2022 13:39

JimBobSquarePants added this to the 1.0.0-rc1 milestone Jun 24, 2022

JimBobSquarePants added 3 commits June 24, 2022 23:42

Update MeasureTextBenchmark.cs

08ca6a6

Merge branch 'main' into js/perf-tweaks

17ed231

Cleanup

3526242

Merge branch 'main' into js/perf-tweaks

7438b74

Faolan-Rad mentioned this pull request Jun 29, 2022

Add Bench mark for text render optimized TryGetGlyphMetricsAtOffset #282

Closed

tocsoft approved these changes Jun 29, 2022

View reviewed changes

JimBobSquarePants merged commit fc97cb1 into main Jun 29, 2022

JimBobSquarePants deleted the js/perf-tweaks branch June 29, 2022 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up Unicode Data lookups and remove reflection. #281

Speed up Unicode Data lookups and remove reflection. #281

JimBobSquarePants commented Jun 24, 2022 •

edited

Loading

codecov bot commented Jun 24, 2022 •

edited

Loading

Gillibald commented Jun 24, 2022

kant2002 commented Jun 24, 2022

JimBobSquarePants commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022

kant2002 commented Jun 24, 2022

Gillibald commented Jun 24, 2022

JimBobSquarePants commented Jun 29, 2022

DaZombieKiller commented Jun 29, 2022

JimBobSquarePants commented Jun 29, 2022 •

edited

Loading

DaZombieKiller commented Jun 29, 2022

TechPizzaDev commented Jun 29, 2022

JimBobSquarePants commented Jun 29, 2022

DaZombieKiller commented Jun 30, 2022

Speed up Unicode Data lookups and remove reflection. #281

Speed up Unicode Data lookups and remove reflection. #281

Conversation

JimBobSquarePants commented Jun 24, 2022 • edited Loading

Prerequisites

Description

codecov bot commented Jun 24, 2022 • edited Loading

Codecov Report

Gillibald commented Jun 24, 2022

kant2002 commented Jun 24, 2022

JimBobSquarePants commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022

kant2002 commented Jun 24, 2022

Gillibald commented Jun 24, 2022

JimBobSquarePants commented Jun 29, 2022

DaZombieKiller commented Jun 29, 2022

JimBobSquarePants commented Jun 29, 2022 • edited Loading

DaZombieKiller commented Jun 29, 2022

TechPizzaDev commented Jun 29, 2022

JimBobSquarePants commented Jun 29, 2022

DaZombieKiller commented Jun 30, 2022

JimBobSquarePants commented Jun 24, 2022 •

edited

Loading

codecov bot commented Jun 24, 2022 •

edited

Loading

JimBobSquarePants commented Jun 29, 2022 •

edited

Loading