parser/lexer: correct ID_Start & ID_Continue checks #524

filips · 2024-06-10T13:55:53Z

unicode.IsLetter and unicode.IsDigit will not return the complete set of ID_Start and ID_Continue characters defined here: https://www.unicode.org/reports/tr31/.

stevenh

Looks reasonable but lets get some tests which exercise this.

filips · 2024-06-10T23:52:28Z

Looks reasonable but lets get some tests which exercise this.

I added a few tests for the areas where the allowed character sets differ. Please let me know if you want me to add further tests.

filips · 2024-06-11T15:17:12Z

@stevenh Linter warnings should be fixed now

stevenh

Thanks for this, just a minor nit and then good to go.

stevenh · 2024-06-11T16:08:03Z

parser/lexer.go

+	unicode.Pattern_White_Space,
+}
+
+func UnicodeIDStart(r rune) bool {


Just a minor tweak can we unexport these:

Suggested change

func UnicodeIDStart(r rune) bool {

func unicodeIDStart(r rune) bool {

stevenh · 2024-06-11T16:08:11Z

parser/lexer.go

+	return unicode.In(r, includeIDStart...)
+}
+
+func UnicodeIDContinue(r rune) bool {


Suggested change

func UnicodeIDContinue(r rune) bool {

func unicodeIDContinue(r rune) bool {

stevenh · 2024-06-11T16:08:19Z

parser/lexer.go

 func isDigit(chr rune, base int) bool {
 	return digitValue(chr) < base
 }

 func isIdentifierStart(chr rune) bool {
 	return chr == '$' || chr == '_' || chr == '\\' ||
 		'a' <= chr && chr <= 'z' || 'A' <= chr && chr <= 'Z' ||
-		chr >= utf8.RuneSelf && unicode.IsLetter(chr)
+		chr >= utf8.RuneSelf && UnicodeIDStart(chr)


Suggested change

chr >= utf8.RuneSelf && UnicodeIDStart(chr)

chr >= utf8.RuneSelf && unicodeIDStart(chr)

stevenh · 2024-06-11T16:08:29Z

parser/lexer.go

 }

 func isIdentifierPart(chr rune) bool {
 	return chr == '$' || chr == '_' || chr == '\\' ||
 		'a' <= chr && chr <= 'z' || 'A' <= chr && chr <= 'Z' ||
 		'0' <= chr && chr <= '9' ||
-		chr >= utf8.RuneSelf && (unicode.IsLetter(chr) || unicode.IsDigit(chr))
+		chr >= utf8.RuneSelf && UnicodeIDContinue(chr)


Suggested change

chr >= utf8.RuneSelf && UnicodeIDContinue(chr)

chr >= utf8.RuneSelf && unicodeIDContinue(chr)

filips · 2024-06-11T16:22:01Z

Thanks for this, just a minor nit and then good to go.

Sure thing, I pushed with the methods unexported

stevenh requested changes Jun 10, 2024

View reviewed changes

filips force-pushed the lexer-unicode-start-continue branch from fc0cdaf to a14eb3c Compare June 10, 2024 23:50

filips requested a review from stevenh June 10, 2024 23:51

filips force-pushed the lexer-unicode-start-continue branch from a14eb3c to c1a6c0a Compare June 11, 2024 15:16

stevenh requested changes Jun 11, 2024

View reviewed changes

filips added 2 commits June 11, 2024 18:20

parser/lexer: correct ID_Start & ID_Continue checks

3def71a

parser/lexer: add tests for special identifiers

8237717

filips force-pushed the lexer-unicode-start-continue branch from c1a6c0a to 8237717 Compare June 11, 2024 16:20

stevenh approved these changes Jun 12, 2024

View reviewed changes

stevenh merged commit d4edd51 into robertkrimen:master Jun 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser/lexer: correct ID_Start & ID_Continue checks #524

parser/lexer: correct ID_Start & ID_Continue checks #524

filips commented Jun 10, 2024

stevenh left a comment

filips commented Jun 10, 2024

filips commented Jun 11, 2024

stevenh left a comment

stevenh Jun 11, 2024

stevenh Jun 11, 2024

stevenh Jun 11, 2024

stevenh Jun 11, 2024

filips commented Jun 11, 2024

	func UnicodeIDStart(r rune) bool {
	func unicodeIDStart(r rune) bool {

	func UnicodeIDContinue(r rune) bool {
	func unicodeIDContinue(r rune) bool {

	chr >= utf8.RuneSelf && UnicodeIDStart(chr)
	chr >= utf8.RuneSelf && unicodeIDStart(chr)

	chr >= utf8.RuneSelf && UnicodeIDContinue(chr)
	chr >= utf8.RuneSelf && unicodeIDContinue(chr)

parser/lexer: correct ID_Start & ID_Continue checks #524

parser/lexer: correct ID_Start & ID_Continue checks #524

Conversation

filips commented Jun 10, 2024

stevenh left a comment

Choose a reason for hiding this comment

filips commented Jun 10, 2024

filips commented Jun 11, 2024

stevenh left a comment

Choose a reason for hiding this comment

stevenh Jun 11, 2024

Choose a reason for hiding this comment

stevenh Jun 11, 2024

Choose a reason for hiding this comment

stevenh Jun 11, 2024

Choose a reason for hiding this comment

stevenh Jun 11, 2024

Choose a reason for hiding this comment

filips commented Jun 11, 2024