Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance of blocklist filtering and checking by using Regex #17

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

GromNaN
Copy link
Contributor

@GromNaN GromNaN commented Nov 28, 2024

A single call to preg_match can replace a lot of lines of code, and is executed in optimized C code instead of PHP.

1. Blocklist filter

The regex /^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]+$/i tell if all the characters of the blocked work are in the alphabet.

(cancelled because I removed the blocklist filter in 3).

2. Check if the ID contains a blocked word

A regex is generated with all the blocked words /(1d10t|b0ob)/i and run with in case-insensitive mode.

3. Apply leet transformation to blocked word

The blocklist if full of leet variations of the same words. Using regex, we can check directly for alternative way of writting the same word. /(ahole)/i becomes /(ah[oO][l1]e)/i to check ah0le, aho1e, ah01e and all other case variations.

Benchmark

PHPBench code
composer req --dev phpbench/phpbench

In phpbench.json

{
    "$schema": "./vendor/phpbench/phpbench/phpbench.schema.json",
    "runner.bootstrap": "vendor/autoload.php",
    "runner.file_pattern": "*Bench.php",
    "runner.path": "tests",
    "runner.iterations": 3
}

In tests/SqidsBench.php

<?php

namespace Sqids\Tests;

use PhpBench\Attributes\ParamProviders;
use PhpBench\Attributes\Revs;
use PhpBench\Attributes\Warmup;
use Sqids\Sqids;

#[Warmup(1)]
final class SqidsBench
{
    #[Revs(1_000)]
    public static function benchInit(): void
    {
        new Sqids();
    }

    #[Revs(1_000)]
    #[ParamProviders('provideSqids')]
    public static function benchEncode(array $params): void
    {
        $params[0]->encode([1_000_000, 2_000_000]);
    }

    public static function provideSqids(): \Generator
    {
        yield 'default' => [
            new Sqids()
        ];
        yield 'custom blocklist' => [
            new Sqids(blocklist: [
                'JSwXFaosAN',
                'OCjV9JK64o',
                'rBHf',
                '79SM',
                '7tE6',
            ])
        ];
    }
}

Before

    benchInit...............................I2 - Mo1.617ms (±0.23%)
    benchEncode # default...................I2 - Mo280.402μs (±0.16%)
    benchEncode # custom blocklist..........I2 - Mo194.539μs (±0.34%)

After 2

    benchInit...............................I2 - Mo164.020μs (±0.44%)
    benchEncode # default...................I2 - Mo34.207μs (±0.46%)
    benchEncode # custom blocklist..........I2 - Mo183.985μs (±0.22%)

After 3

    benchInit...............................I2 - Mo77.698μs (±0.70%)
    benchEncode # default...................I2 - Mo32.270μs (±0.44%)
    benchEncode # custom blocklist..........I2 - Mo183.796μs (±0.63%)

Copy link
Collaborator

@vinkla vinkla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @4kimov will take a look who is the blocklist expert.

if ($id == $word) {
return true;
}
} elseif (preg_match('/~[0-9]+~/', (string) $word)) {
Copy link
Contributor Author

@GromNaN GromNaN Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is not matching anything as the words never contain the tilde ~ char.

src/Sqids.php Outdated Show resolved Hide resolved
src/Sqids.php Outdated
protected MathInterface $math;

protected ?string $blocklist = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the type of a protected property (from array to ?string is a BC break. I don't know what is the Backward Compatiblity policy of this project (I got here because of your toot about performance improvements, by curiosity) but it might make sense to be care about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I reverted to use an other property name and leave this one. Even if I don't see any reason this class would be extended. It should be final, it has an interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @stof, we don't want to introduce any breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants