-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize String _find_upper
and _find_lower
by handling low-bit characters (including normal latin) explicitly.
#99971
base: master
Are you sure you want to change the base?
Conversation
378b6d4
to
9c9e7d3
Compare
Let me get this. Is the memory footprint roughly the same in this PR? The chart shows interesting results, but it's so minimal in most scenarios I'm not sure it's worth it. |
I added an amendment to my statement; due to removed entries in the other two tables it should actually be about 370b less total even after the change to 32-bit tables :) |
#93360 Seems to do the same |
Interesting PR! It uses a different technique, switching to hashes instead of a binary search, which improves the speed for different reasons. My PR is likely to gain more speed than the other for most latin texts, but they are not mutually exclusive and could be combined. |
9c9e7d3
to
4daea36
Compare
_find_upper
and _find_lower
by including a dense ASCII-range conversion table.
4daea36
to
6469b52
Compare
Since we just realized the tables are using |
_find_upper
and _find_lower
by including a dense ASCII-range conversion table._find_upper
and _find_lower
by including a dense ASCII-range conversion table. Reduce size of caps tables by half (about 5kb).
Are those suppose to be functions that find the upper/lower versions for every unicode codepoints ? And what you did is extract the ASCII range into its own look up table to have direct look up for that range ? Do I get that right ? |
Exactly right! |
Would be nice to see how it compares to old |
I actually tried this first. I don't have a benchmark for it but I'd wager the branch mispredictions would make it quite slow. |
This PR stands as an opposite of what I did in #90726 😅 |
Well that's what I wrote, for example for to lower
You should cover ascii ranges here (I think) without additional if cost |
I don't think that's true, they are doing quite orthogonal things and could both be merged :) |
Please keep in mind rapid CI runs on this as well, make sure things compile on your own branch before opening a PR and ensure it compiles locally to avoid taxing the CI too much |
I tested the if based code. Performance is very similar to the one based on dense tables1. I am currently without internet but i will update the PR to use ifs for the ASCII range when that changes :) Footnotes
|
Oh, only a portion of the table that fits a narrower data type got converted - my bad, at first glance I thought you removed other matchings 😁 |
int is 4 bytes on x64 on all major compilers, so there is no gain in size here I suspect. But being explicit on the size is a plus, no more ambiguity. |
Wow... I can't believe I didn't know that. Makes the change less critical (though still good to be specific). |
6469b52
to
ebbb09c
Compare
_find_upper
and _find_lower
by including a dense ASCII-range conversion table. Reduce size of caps tables by half (about 5kb)._find_upper
and _find_lower
by handling low-bit characters (including normal latin) explicitly.
ebbb09c
to
fa40766
Compare
Alright, the PR is ready for review again. I did some more tests with different We also don't save 5kb; the tables were already in 32 bit mode, though i've kept the change from |
fa40766
to
e8195a1
Compare
e8195a1
to
355dfad
Compare
#90726 was just merged. This is good, because it fixed the tables being outdated, but it also decreased performance of I have updated the PR accordingly (moving the methods to Compare this to the old benchmark, and you can see the importance of this PR increased a fair bit: |
355dfad
to
aa5200f
Compare
…haracters (including normal latin) explicitly. Move `_find_upper` and `_find_lower` to non-generated `char_utils.h`.
aa5200f
to
97a14e5
Compare
I ran across this and wanted to try if it can't be improved. The throughput of
_find_upper
and_find_lower
for normal latin is improved severalfold, making measurable difference in several benchmarks.Here are the benchmark results of 355dfad (this branch) vs ee4acfb (current master):
The change also removes 896 bytes from the caps table (while adding only a few more in code).
Explanation
This is taking advantage of the fact that most common (latin) text is in the lower-bit range (ASCII). This is exploited by running an if-based algorithm instead of a binary search for this case.
As can be seen in the benchmarks (e.g.
find
vsfindn
), conversion of capitalization of a single common latin char is now almost free, compared to other parts of the algorithms.I have also adjusted moved the functions out of the generator script to make them easier to maintain (as plain C++ rather than strings).
char_utils
seemed like a fitting place since it already harbours similar functions.