Incorrect regex translation #32

agentzh · 2015-03-27T05:19:34Z

Hi Ingy!

I ran into a bug in the Pegex grammar translator that took me a while to figure out. Basically the following Pegex regex

    / 'a' | 'b' /

is translated into the following Perl regex:

    qr/\Ga|b/

while the expected Perl regex should be

qr/\G(?:a|b)/

Adding (: ...) to the original Pegex regex worked around this bug.

The text was updated successfully, but these errors were encountered:

ingydotnet · 2015-03-27T13:44:36Z

@agentzh I don't see this as a bug. Pegex cannot know when to add parens in a regex.

/ (: 'a' | 'b' ) /

or:

/ (:a|b) /

Would be the proper way to do it.

https://github.com/ingydotnet/pegex-pgx/blob/master/pegex.pgx#L53 is the regex parsing rule.

It just splits up stuff between 2 / into ws, strings, refs and raw. It ignores whitespace, escapes strings, expands refs and leaves raw intact. Then it joins them together and slaps a \G in front (for Perl5).

Maybe a doc patch is in order somewhere?

agentzh · 2015-03-27T15:15:54Z

@ingydotnet I can understand the implementation complications here but mandating a (: ...) on the user side looks counter-intuitive and error-prone. The simplest / 'a' | 'b' / form becomes a pitfall that is tricky to debug. Alas. IMHO I hope Pegex could do better here instead of patching the documentation :)

ingydotnet · 2015-03-27T22:05:10Z

The problem here is scope of concern. I'd have to special case that. And how does that compare to

/ 'a' x 'b' /

To pegex the | is just a raw string. Pegex doesn't try to understand the regex meaning.

Also these 2 are equal:

/ 'a' | 'b' /
/a|b/

Now what could work is this:

'a' | 'b'

The pipe is Pegex syntax here. It's between 2 regexes that can be optimized into:

/(?:a|b)/

The optimizer would know that it must add parens.

Maybe we can add degugging that shows the regex or something.

I'm happy to make Pegex more user friendly, but (unless I don't understand what you are wanting) I see this as leading to a world of special cases.

agentzh · 2015-03-27T23:06:05Z

@ingydotnet Yeah, it requires more work on the regex translator to get exactly right. I'll try living with it for now :) Not a big deal for me.

ingydotnet · 2015-04-01T14:13:33Z

The optimizer should be made to handle this correctly (I now believe).

agentzh · 2015-04-01T19:48:11Z

@ingydotnet Yay!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect regex translation #32

Incorrect regex translation #32

agentzh commented Mar 27, 2015

ingydotnet commented Mar 27, 2015

agentzh commented Mar 27, 2015

ingydotnet commented Mar 27, 2015

agentzh commented Mar 27, 2015

ingydotnet commented Apr 1, 2015

agentzh commented Apr 1, 2015

Incorrect regex translation #32

Incorrect regex translation #32

Comments

agentzh commented Mar 27, 2015

ingydotnet commented Mar 27, 2015

agentzh commented Mar 27, 2015

ingydotnet commented Mar 27, 2015

agentzh commented Mar 27, 2015

ingydotnet commented Apr 1, 2015

agentzh commented Apr 1, 2015