Some regex bugs #63

gfredericks · 2019-06-15T19:29:55Z

A java-8 bug that I somehow missed originally was reported in c228537; might be tricky (assuming it really has no matches), because I don't know if we currently parse anything that has no matches; debugging approach is probably to run it through the QE parsing method in the Pattern class to see what comes out the other end
There are at least two new bugs for java 9-or-later, that I mentioned in Allow named groups in regex generation #62: \X and \N{WHITE SMILING FACE}; \X can probably be parsed-but-not-supported (unless the definition turns out to be super easy to implement), and the other one might be an easy lookup on the Character class or something, we'll see

The text was updated successfully, but these errors were encountered:

lvh · 2019-06-18T14:50:38Z

I think the answer to \N is:

(Character/codePointOf "WHITE SMILING FACE")
(Character/codePointOf "some nonexistent nonsense")

... which only exists in JDK9+. If you need it to work below that, there's CharacterName/getCodePoint but that appears to be a package-scoped class.

gfredericks · 2019-06-18T16:09:02Z

I don't think the \N construct is a valid regex pre-JDK9 -- my goal with this functionality is to correctly parse/interpret things according to re-pattern's behavior -- i.e., parsing and interpreting relative to the jvm you're running on.

There's already one or two variable features for things that differ between 7 and 8. I just did all this work prior to 9.

Probably don't need to support 7 anymore (since clojure doesn't, I don't think?), so some of that variability can be removed.

gfredericks · 2019-06-18T16:46:42Z

and yes, Character/codePointOf looks like exactly what we'd need, thanks for looking that up

gfredericks · 2019-06-18T16:47:22Z

(I'm planning on digging into this in early July if nobody else gets to it first)

gfredericks · 2019-08-11T00:57:33Z

Just pushed fixes for both of these. \X is parsed but unsupported, \c\Q0 does the correct (insane) thing, and \N{...} is fully supported. Additionally, large code-points are now supported with \x and \u literals.

gfredericks mentioned this issue Jun 15, 2019

Allow named groups in regex generation #62

Merged

gfredericks closed this as completed Aug 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some regex bugs #63

Some regex bugs #63

gfredericks commented Jun 15, 2019

lvh commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Aug 11, 2019 •

edited

Loading

Some regex bugs #63

Some regex bugs #63

Comments

gfredericks commented Jun 15, 2019

lvh commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Jun 18, 2019

gfredericks commented Aug 11, 2019 • edited Loading

gfredericks commented Aug 11, 2019 •

edited

Loading