Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require unique names for named captures #264

Closed
rxwei opened this issue Apr 11, 2022 · 4 comments · Fixed by #379
Closed

Require unique names for named captures #264

rxwei opened this issue Apr 11, 2022 · 4 comments · Fixed by #379
Assignees

Comments

@rxwei
Copy link
Contributor

rxwei commented Apr 11, 2022

According to regex101, PCRE2 requires a unique name for each capture, but the current implementation of Regex.init(compiling:) doesn't throw an error when there's duplicate names.

@hamishknight
Copy link
Contributor

Note that there is a mode (?J) that allows duplicate capture names, so we will also need to handle the coalescing of names for the purposes of typed captures. Perhaps the parser could mark capture groups that have names that have been used before.

@hamishknight hamishknight self-assigned this Apr 11, 2022
@benlings
Copy link

A slight relaxation of requiring unique names for captures would be to only allow a name to correspond to a single capture index.

This would be following the advice given here, and would be consistent with how captures are available in Tuple outputs.

In Perl 5.10, PCRE 8.00, PHP 5.2.14, and Boost 1.42 (or later versions of these) it is best to use a branch reset group when you want groups in different alternatives to have the same name, as in (?|a(?[0-5])|b(?[4-7]))c\k. With this special syntax—group opened with (?| instead of (?:—the two groups named “digit” really are one and the same group. Then backreferences to that group are always handled correctly and consistently between these flavors.

@milseman
Copy link
Member

A slight relaxation of requiring unique names for captures would be to only allow a name to correspond to a single capture index.

This would be following the advice given here, and would be consistent with how captures are available in Tuple outputs.

In Perl 5.10, PCRE 8.00, PHP 5.2.14, and Boost 1.42 (or later versions of these) it is best to use a branch reset group when you want groups in different alternatives to have the same name, as in (?|a(?[0-5])|b(?[4-7]))c\k. With this special syntax—group opened with (?| instead of (?:—the two groups named “digit” really are one and the same group. Then backreferences to that group are always handled correctly and consistently between these flavors.

@hamishknight are we tracking branch reset groups anywhere? I believe we were parsing them but I don't know if we implemented renumbering logic or not.

@hamishknight
Copy link
Contributor

They're in #370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants