-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize performance of blocklist filtering and checking by using Regex #17
base: main
Are you sure you want to change the base?
Conversation
eb3338e
to
9559ba4
Compare
9559ba4
to
8b1d826
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @4kimov will take a look who is the blocklist expert.
if ($id == $word) { | ||
return true; | ||
} | ||
} elseif (preg_match('/~[0-9]+~/', (string) $word)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is not matching anything as the words never contain the tilde ~
char.
src/Sqids.php
Outdated
protected MathInterface $math; | ||
|
||
protected ?string $blocklist = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the type of a protected property (from array
to ?string
is a BC break. I don't know what is the Backward Compatiblity policy of this project (I got here because of your toot about performance improvements, by curiosity) but it might make sense to be care about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I reverted to use an other property name and leave this one. Even if I don't see any reason this class would be extended. It should be final, it has an interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right @stof, we don't want to introduce any breaking changes.
A single call to
preg_match
can replace a lot of lines of code, and is executed in optimized C code instead of PHP.1.
Blocklist filterThe regex/^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]+$/i
tell if all the characters of the blocked work are in the alphabet.(cancelled because I removed the blocklist filter in 3).
2. Check if the ID contains a blocked word
A regex is generated with all the blocked words
/(1d10t|b0ob)/i
and run with in case-insensitive mode.3. Apply leet transformation to blocked word
The blocklist if full of leet variations of the same words. Using regex, we can check directly for alternative way of writting the same word.
/(ahole)/i
becomes/(ah[oO][l1]e)/i
to checkah0le
,aho1e
,ah01e
and all other case variations.Benchmark
PHPBench code
In
phpbench.json
In
tests/SqidsBench.php
Before
After 2
After 3