Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use regex that can be slow #295

Merged
merged 1 commit into from
Dec 14, 2024
Merged

Conversation

tats-u
Copy link
Contributor

@tats-u tats-u commented Dec 13, 2024

The regex introduced in #289 can be slow.
For example,

('a' + ' '.repeat(65534) + 'a').replace(/^[\t \r\n]+|[\t \r\n]+$/g,"").length

takes a long time to be done. (a few seconds)

/^[\t \r\n]+(?![\t \r\n])|(?<![\t \r\n])[\t \r\n]+$/g,

fixes the problem, but the naive implementation of this PR seems to be faster than it when the number of trimmed spaces are not many according to my private bench by mitata.
Using charCodeAt is fastest but reduces the explainability due to the raw code point number literals (e.g. 0x20 for " ").
Sorry for having submitted problematic code.

import { run, bench, boxplot } from "mitata";

const PAD = " \n".repeat(16777216)
const PAD2 = " \n\t".repeat(1024)
const CH = "abc".repeat(16777216)
const DATA = `${PAD2}${CH}${PAD}${CH}${PAD2}`

export function useIfFn(str) {
    let start = 0;
    for(; start < str.length; start++) {
        if (!isSpace(str[start])) {
            break;
        }
    }
    let end = str.length - 1;
    for(; end >= start; end--) {
        if (!isSpace(str[end])) {
            break;
        }
    }
    return str.slice(start, end + 1);

    function isSpace(c) {
        return c === " " || c === "\t" || c === "\n" || c === "\r";
    }
}

export function useIfFnCc(str) {
    let start = 0;
    for(; start < str.length; start++) {
        if (!isSpace(str.charCodeAt(start))) {
            break;
        }
    }
    let end = str.length - 1;
    for(; end >= start; end--) {
        if (!isSpace(str.charCodeAt(end))) {
            break;
        }
    }
    return str.slice(start, end + 1);

    function isSpace(c) {
        return c === 0x20 || c === 0x09 || c === 0x0a || c === 0x0d;
    }
}
export function useRegex(str) {
    return str.replace(/^[ \t\n]+(?![ \t\n])|(?<![ \t\n])[ \t\n]+$/g, "");
}


boxplot(() => {
    bench("useIfFn", () => {
        useIfFn(DATA);
    });
    bench("useRegex", () => {
        useRegex(DATA);
    });
    bench("useIfFnCc", () => {
        useIfFnCc(DATA);
    });
});

await run();
clk: ~4.33 GHz
cpu: 13th Gen Intel(R) Core(TM) i7-1355U
runtime: node 22.12.0 (x64-win32)

benchmark              avg (min … max) p75   p99    (min … top 1%)
-------------------------------------- -------------------------------
useIfFn                  10.54 µs/iter   9.90 µs █
                 (8.80 µs … 419.80 µs)  37.40 µs █▇▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
useRegex                106.56 ms/iter 107.16 ms       █   ▃
               (101.05 ms … 115.18 ms) 110.98 ms ▆▁▁▁▆▁█▁▁▁█▆▆▁▁▁▁▁▁▆▁
useIfFnCc                 9.94 µs/iter   8.80 µs █
                 (7.10 µs … 200.50 µs)  47.80 µs █▇▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

                        ┌                                            ┐
                        ┬
                useIfFn │
                        ┴
                                                                 ╷┌┬ ╷
               useRegex                                          ├┤│─┤
                                                                 ╵└┴ ╵
                        ┬
              useIfFnCc │
                        ┴
                        └                                            ┘
                        7.10 µs           55.49 ms           110.98 ms

@jgm
Copy link
Member

jgm commented Dec 13, 2024

Thanks! useIfFnCc looks good. Perhaps add a comment on isSpace to indicate that 0x20 = space, etc.
Want to make a PR?

@tats-u
Copy link
Contributor Author

tats-u commented Dec 14, 2024

I did. It looks like you prefer the "Rebase and Merge" strategy, so I amended the commit and did a force-push.

@tats-u
Copy link
Contributor Author

tats-u commented Dec 14, 2024

letvar

@tats-u
Copy link
Contributor Author

tats-u commented Dec 14, 2024

Improved the comment.

@jgm jgm merged commit 38d2938 into commonmark:master Dec 14, 2024
6 checks passed
@jgm
Copy link
Member

jgm commented Dec 14, 2024

Thanks!

@tats-u tats-u deleted the regex-buster branch December 15, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants