-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Unicode Data lookups and remove reflection. #281
Conversation
Codecov Report
@@ Coverage Diff @@
## main #281 +/- ##
======================================
+ Coverage 81% 83% +1%
======================================
Files 215 222 +7
Lines 11278 12200 +922
Branches 1757 1756 -1
======================================
+ Hits 9244 10167 +923
+ Misses 1607 1606 -1
Partials 427 427
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Nice work @JimBobSquarePants |
I'm not sure how do you get trie files, but I think it's better to keep source trie files in source control and document generation process. I understand that nobody want run file generation process on the build, but that process easy to forget, and maybe somebody would like to improve it, so better to have source files around. |
Thanks @Gillibald your idea to create classes was a great one! @kant2002 The trie files are created via the Generator project and simply contained a header and the parsed bytes. That project reads a bunch of Unicode spec data files and creates the output. Here's the project in the If you look at We definitely do not need to build them every time, though the process is pretty quick since the spec files are stored locally. We actually won't need to build these again until Unicode 15 is released and that will require a dedicated PR to do so there's no opportunity to forget. |
Very nice! I assume this will also be easier to maintain than a locally referenced source generator, so we can shelf that idea for now. I think this approach makes sense given you're now leveraging this to also do some more custom preprocessing when generating the blobs, and not just serializing an already existing file. Very nice work James! 😄 |
@JimBobSquarePants thanks for explanation. Yeah, now it make sense to me. Thanks you for caring about such niche workloads 😄 |
I will adapt your changes for Avalonia. Really appreciate your work. |
Imma gonna merge this. |
I just thought it'd be worth pointing out that the struct UnicodeTrieHeader
{
public int HighStart;
public uint ErrorValue;
public int DataLength;
}
// ...
readonly uint[] data;
readonly int highStart;
readonly uint errorValue;
public UnicodeTrie(ReadOnlySpan<byte> rawData)
{
// MemoryMarshal.Read instead of MemoryMarshal.Cast/Unsafe.As
// because it performs an unaligned read and a bounds check.
var header = MemoryMarshal.Read<UnicodeTrieHeader>(rawData);
if (!BitConverter.IsLittleEndian)
{
header.HighStart = BinaryPrimitives.ReverseEndianness(header.HighStart);
header.ErrorValue = BinaryPrimitives.ReverseEndianness(header.ErrorValue);
header.DataLength = BinaryPrimitives.ReverseEndianness(header.DataLength);
}
highStart = header.HighStart;
errorValue = header.ErrorValue;
data = new uint[header.DataLength / sizeof(uint)];
// Avoid MemoryMarshal.Cast because we don't know if the rawData buffer is aligned
// to uint, which would potentially throw DataMisalignedException if it weren't.
rawData[Unsafe.SizeOf<UnicodeTrieHeader>()..].CopyTo(MemoryMarshal.AsBytes(data.AsSpan()));
if (!BitConverter.IsLittleEndian)
{
for (int i = 0; i < data.Length; i++)
{
data[i] = BinaryPrimitives.ReverseEndianness(data[i]);
}
}
} |
We know that the buffer is aligned and we also know the endianness though? We create the data. |
I wasn't able to find any actual usages of the |
The data is coming from the |
Ooh. You're right! The endianness definitely matters since it's a compiled constant. I'll update unless you want to create a PR @DaZombieKiller ? |
@JimBobSquarePants Sure! Opened one here: #284 |
Prerequisites
Description
This PR does a couple of things.
TextMeasurer.Measure
is slow #268I chose to remove the compression from the trie packing for performance since the binaries do not grow out of hand and the benefit is great. @Gillibald maybe this is something to consider for Avalonia?
There's no before/after benchmarks unfortunately because I added them as an afterthought, however I took a sampling snapshot early in the process. As you can see, loading Unicode data is no longer a hotspot.
Before
After
I don't want to go crazy optimizing much further in this PR but will have a good scan through the sampling data to see if there's some low hanging fruit at another time.
CC @Sergio0694 @kant2002 @0xced