-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode support for keywords #133
Comments
const moo = require('moo')
const KW = ['ban', 'this']
const lexer = moo.compile({
kw: {match: KW, type: moo.keywords({kw: KW})},
w: /[A-Za-z_][\w]*/,
ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'kw', value: 'ban'}
lexer.next() // {type: 'w', value: 'ana'} The normal use case for const moo = require('moo')
const KW = ['ban', 'this']
const lexer = moo.compile({
w: {match: /[A-Za-z_][\w]*/, type: moo.keywords({kw: KW})},
ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'w', value: 'banana'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'kw', value: 'ban'} It actually works fine with Unicode as-is: const moo = require('moo')
const KW = ['η', 'ο', 'το', 'οι', 'τα']
const lexer = moo.compile({
w: {match: /\p{XIDS}\p{XIDC}*/u, type: moo.keywords({kw: KW})},
ws: {match: /\p{WSpace}+/u, lineBreaks: true},
})
lexer.reset('η ηθική')
lexer.next() // {type: 'kw', value: 'η'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'w', value: 'ηθική'} We also already allow string literal and array matches to be combined with (Some of these changes haven't been published to npm yet [@tjvr]; maybe that's where the confusion is coming from?) |
Thank nathan, after seeing the first two examples it became much clearer. Regarding the array match combined with |
We should probably have a test for that. The |
When’s the next npm publish planned? |
I've published 0.5.1. 👍 |
Since
/u
is supported now, is there some convenient way to define a rule using an array of keywords with unicode enabled? Sth. like:In my understanding
moo.keywords
in the unicode scenario only work if the "match" is a pattetrn with an/u
flag.The text was updated successfully, but these errors were encountered: