Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some regex bugs #63

Closed
gfredericks opened this issue Jun 15, 2019 · 5 comments
Closed

Some regex bugs #63

gfredericks opened this issue Jun 15, 2019 · 5 comments

Comments

@gfredericks
Copy link
Owner

  1. A java-8 bug that I somehow missed originally was reported in c228537; might be tricky (assuming it really has no matches), because I don't know if we currently parse anything that has no matches; debugging approach is probably to run it through the QE parsing method in the Pattern class to see what comes out the other end
  2. There are at least two new bugs for java 9-or-later, that I mentioned in Allow named groups in regex generation #62: \X and \N{WHITE SMILING FACE}; \X can probably be parsed-but-not-supported (unless the definition turns out to be super easy to implement), and the other one might be an easy lookup on the Character class or something, we'll see
@lvh
Copy link
Contributor

lvh commented Jun 18, 2019

I think the answer to \N is:

(Character/codePointOf "WHITE SMILING FACE")
(Character/codePointOf "some nonexistent nonsense")

... which only exists in JDK9+. If you need it to work below that, there's CharacterName/getCodePoint but that appears to be a package-scoped class.

@gfredericks
Copy link
Owner Author

I don't think the \N construct is a valid regex pre-JDK9 -- my goal with this functionality is to correctly parse/interpret things according to re-pattern's behavior -- i.e., parsing and interpreting relative to the jvm you're running on.

There's already one or two variable features for things that differ between 7 and 8. I just did all this work prior to 9.

Probably don't need to support 7 anymore (since clojure doesn't, I don't think?), so some of that variability can be removed.

@gfredericks
Copy link
Owner Author

and yes, Character/codePointOf looks like exactly what we'd need, thanks for looking that up

@gfredericks
Copy link
Owner Author

(I'm planning on digging into this in early July if nobody else gets to it first)

@gfredericks
Copy link
Owner Author

gfredericks commented Aug 11, 2019

Just pushed fixes for both of these. \X is parsed but unsupported, \c\Q0 does the correct (insane) thing, and \N{...} is fully supported. Additionally, large code-points are now supported with \x and \u literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants