Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix excessive backtracking in regex engine by introducing backtrackig limit #311

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renatahodovan
Copy link
Contributor

The regex engine was prone to excessive backtracking, leading to timeouts and infinite loops, particularly with patterns involving nested quantifiers.

This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded, the regex engine aborts to prevent excessive backtracking.

@renatahodovan
Copy link
Contributor Author

The patch fixes a 4 years old issue reported by ClusterFuzz. This issue is possibly responsible (among others) for the poor fuzzing results.

@renatahodovan
Copy link
Contributor Author

Minimal qjs repro of the infinite loop that tries to match the empty string non-greedily:

var r = new RegExp("()*?a");
r.exec(",");

Execute: ./qjs test.js

Copy link
Contributor

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment! @chqrlie you probably want to have a look.

@@ -1995,6 +1996,10 @@ static intptr_t lre_exec_backtrack(REExecContext *s, uint8_t **capture,

for(;;) {
// printf("top=%p: pc=%d\n", th_list.top, (int)(pc - (bc_buf + RE_HEADER_LEN)));
if (++s->backtrack_count > 1000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should likely be a define with some sane default value, but overridable at compile time.

@bellard
Copy link
Owner

bellard commented May 30, 2024

I think it is better to fix the non greedy matching first. This patch only hides the real problem.

@bellard
Copy link
Owner

bellard commented May 30, 2024

The commit 36911f0d should fix the issue. Do you have other cases where the backtracking limitation workaround is necessary ?

@renatahodovan
Copy link
Contributor Author

@bellard Thanks for the fix, it really fixed the recursion in the previous test case! However, I have another repro:

var r = new RegExp("(.+)+a");
r.exec("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV");

@renatahodovan
Copy link
Contributor Author

This example is not about infinite recursion. It is just demonstrate valid regexp behavior for given pattern, for which regexp execution time growing very fast with growing length of input string

You are right (I think), but I believe we also agree that such inputs are either useless or malicious. Therefore, if we limit the number of recursive calls they trigger, we won't be restricting the actual capabilities of the tool, but we can prevent malicious loops. (I killed the above code after 6 minutes, and who knows how long it would have run. And the input isn't even that long...)

@renatahodovan
Copy link
Contributor Author

Oh, it seems that the comment I replied to has disappeared in the meantime.

@bellard
Copy link
Owner

bellard commented May 30, 2024

I see no problem with your example because it really requires a large number of backtracking (try it in another JS engine). It is the same problem as having a JS function which takes a long time to execute.

However, there is a real problem that should be addressed: the interrupt handler registered with JS_SetInterruptHandler() is not called during regexp execution which prevents the user from easily interrupting their execution.

@renatahodovan
Copy link
Contributor Author

@bellard Is there any update on the correct method for registering an interrupt handler during regex execution? Alternatively, could the above patch serve as a temporary solution to address the frequent timeout issues during fuzzing, which are causing early exits and poor performance?

@renatahodovan
Copy link
Contributor Author

@bellard: I think I found a solution: we can create a fuzzer-friendly build, as suggested by the libFuzzer documentation, using the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro. This way, the backtracking limit will only be applied for fuzzing and will not affect the normal mode operation of the JS engine, while also preventing the fuzz session from continuously timing out.

TooTallNate pushed a commit to TooTallNate/quickjs that referenced this pull request Jul 3, 2024
Clone the full repo, otherwise the known-good commit we want to test
against won't be available.
@renatahodovan
Copy link
Contributor Author

Gentle ping

The regex engine is prone to excessive backtracking, leading to
timeouts, especially while fuzzing.
This commit introduces a backtracking counter and a limit of 1000
backtracking steps. When this limit is exceeded during fuzzing, the
regex engine aborts to prevent excessive backtracking. For this, the
FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro is used, as per
suggested by the documentation of libFuzzer.
@renatahodovan
Copy link
Contributor Author

@saghul @bellard what do you think of the new fuzzer-specific approach?

@renatahodovan renatahodovan requested a review from saghul July 23, 2024 18:02
@saghul
Copy link
Contributor

saghul commented Aug 7, 2024

The change LGTM (perhaps the name could use some bikesheding) but I have no commit rights here :-)

@renatahodovan
Copy link
Contributor Author

@saghul Thanks for the LGTM! The question about the name of the environment variable is an "easy" one, since it is a constant set by OSS-Fuzz and it is also the officially recommended name by LLVM for creating fuzzer friendly build from the target. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants