Performance Optimisation for String Literal Matching #32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a performance optimisation that skips compiling a regex for string literal matching.
Motivation:
Issue #25 points out, that we are about ~6x slower than
filecheck 0.24
in the worst case, and about 3.3x on average.We are also about 34x slower than LLVMs filecheck, but we can't get that down too far, due to pythons limitations. FileCheck is usually done before CPython finished loading the runtime.
Approach:
After some digging in traces (thanks to viztracer), I found that we spend a lot of time compiling regexes, even when they are just for fancy string literals (most of them are of the form
test\s+string\s...
, which is regular enough to special case. This time is dominating everything else by a huge margin:The regex compile is about 135us of 156us total time spent, so about 85%. We then spend ~.8us on average in the actual matching logic. I was wondering how "slow" a non-regex implementation would compare.
I added logic in the existing check compiler that detects if the check is only made up of string literals, and returns a new
LiteralMatcher
that duck typesre.Pattern
for all cases that mater for our implementation. As it turns out that is justfind
andmatch
.Sadly we can't just replace
re.search
bystring.find
in all cases, as we need to handle white-space normalisation, which bloats the below code a bit. Otherwise it's quite readable though.LiteralMatcher
returns a special duck-typed version ofre.Match
calledLiteralMatch
that only has a single group. This is all that's needed for this little hack, and the other code can be left unmodified, thanks to the power of duck typing (and modifying some type hints).Results:
The optimisation gets an average speedup in our benchmarks of 1.6x, making the new implementation only about 2.1x slower on average. This understates the effect though, as this optimisation manages to really cut down the longest benchmark (4.7k lines of
CHECK-NEXT
statements) times by more than 3x.See the below chart for overall results:
The new trace shows us that we have indeed removed a bottleneck:
The new timing shows that compilation time is down to 5.5us on average, but the matching has grown to ~14.7us on average. Still the average
CHECK-DAG
statement now is down to 21.6us, so a reduction of 7x.