Difference in multi-line match/pattern between ugrep and ripgrep/pcregrep/pcre2grep? #391
-
I have a pattern that I use with the non-ugrep tools mentioned to print the entire paragraph/block containing a matched term.
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok
99 negate . \ -99 ok
-99 abs . \ 99 ok
52 23 max . \ 52 ok
52 23 min . \ 23 ok $ rg --multiline '(^[^\n]+\n)*[^\n]*'1337'[^\n]*(\n[^\n]+)*' sample.forth
1:6 7 * . \ 42 ok
2:1360 23 - . \ 1337 ok
3:12 12 / . \ 1 ok
4:13 2 mod . \ 1 ok
$ pcregrep --multiline '(^[^\n]+\n)*[^\n]*'1337'[^\n]*(\n[^\n]+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok
$ pcre2grep --multiline '(^[^\n]+\n)*[^\n]*'1337'[^\n]*(\n[^\n]+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok
$ ugrep '(^[^\n]+\n)*[^\n]*'1337'[^\n]*(\n[^\n]+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok Is the pattern syntax different, especially when it comes to multiline matching? Or is this a bug in ugrep? Thanks for any help understanding! |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
The anchor $ ugrep '^([^\n]+\n)*[^\n]*'1337'[^\n]*(\n[^\n]+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok Why is this? Please note that ugrep's default pattern matching is POSIX, which puts some restrictions on regex and anchors because of the internal matching machinery used. I may be able to work around the Simpler is to write this regex with a dot $ ugrep '^(.+\n)*.*'1337'.*(\n.+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok Use $ ugrep -P '(^.+\n)*.*'1337'.*(\n.+)*' sample.forth
6 7 * . \ 42 ok
1360 23 - . \ 1337 ok
12 12 / . \ 1 ok
13 2 mod . \ 1 ok |
Beta Was this translation helpful? Give feedback.
-
The Thank you for this, I will experiment with my use cases and these patterns to see if I can use a single pattern for ripgrep+pcregrep+pcre2grep+ugrep. |
Beta Was this translation helpful? Give feedback.
-
Yes, it is an interesting little twist in the way POSIX versus Perl matching differ that can be a bit surprising. Note that the |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Thanks so much for this! |
Beta Was this translation helpful? Give feedback.
-
Everything seems all straightened out now, even in my more complicated use cases. I'll just note while I'm here that when using Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Yes, explicit newlines in formats with |
Beta Was this translation helpful? Give feedback.
The anchor
^
is part of the initial parenthesized repetition, which causes some ambiguity so take it outside:Why is this? Please note that ugrep's default pattern matching is POSIX, which puts some restrictions on regex and anchors because of the internal matching machinery used. I may be able to work around the
^
anchor placement issue in a future update, but I'm not 100% sure.Simpler is to write this regex with a dot
.
instead of[^\n]
(because dot doesn't match newlines unless explicitly forced to do so with--dotall
) so this loo…