Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pattern.programSize() and Matcher.programSize() #180

Merged
merged 2 commits into from
Jan 5, 2025

Conversation

sergiitk
Copy link
Member

This PR exposes Pattern.programSize() and Matcher.programSize() public API.

The program size represents a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Similar to the canonical C++ implementation (see RE2.ProgramSize()), re2j will return the program size as the number of instructions of the regex program without making any promises or claims except "larger is more expensive".

Context: The need for this change arose from cross-language projects, such as gRPC and CEL. gRPC needs to configure the maximum size of regex programs in CEL to the same number across all languages. While it's possible in gRPC-Cpp via CEL-Cpp options.regex_max_program_size that uses re2's ProgramSize, CEL-Java doesn't provide the same configuration option simply because the program size is not available in re2j. In Go, the number of instruction is available in Go len(syntax.Prog.Inst).

Internal ref: go/grpc-cel-integration
CCs:
@ejona86 (grpc-java), @markdroth (grpc)
@l46kok (cel-java), @TristonianJones (cel)

This PR exposes `Pattern.programSize()` and `Matcher.programSize()`
public API.

The program size represents a very approximate measure of a
regexp's "cost". Larger numbers are more expensive than smaller
numbers.

Similar to the canonical C++ implementation, re2j will return the
program size as the number of instructions of the regex program
without making any promises or claims except "larger is more
expensive".

Context: The need for this change arose from cross-language projects,
such as gRPC and CEL. gRPC needs to configure the maximum size of
regex programs in CEL to the same number across all languages. While
it's possible in CEL-Cpp and gRPC-Cpp, CEL-Java doesn't provide the
same configuration option simply because the program size is not
available in re2j. In Go, the number of instruction is available in
Go via the length of https://pkg.go.dev/regexp/syntax#Prog.Inst.
@sjamesr sjamesr merged commit 2757238 into google:master Jan 5, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants