-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended identifiers #818
base: master
Are you sure you want to change the base?
Extended identifiers #818
Conversation
Identifiers may contain $, ? or |, but not /. They may not begin with $ or ?. The identifier | is not valid (since it's the VBAR).
### Changed | ||
|
||
- Identifiers may contain `$` or `?` but may not begin with | ||
them. They may contain `|` as well. Identifiers must not contain `/`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they can contain VBAR, this may conflict with constructor declarations in inductive type definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not allowing /
as (single-letter) identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there would be a conflict with constructors: a constructor is declared with | Foo
so you parse the vbar and Foo
. We force that there is a space between the vbar and the identifier, which is rather natural since tokens are separated by spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if the user forgets a space and write a|b ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's incorrect! Tokens are separated by spaces, so it's incorrect as much as symbolfoo
is not symbol foo
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could try to follow standards and do like Rust https://doc.rust-lang.org/reference/identifiers.html, that is using standard sets of codepoints defined in https://www.unicode.org/reports/tr31/tr31-33.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, math symbols are not included in identifiers. But according to this source, it is as good to keep the identifiers and maths symbols as separate classes. which we can do. We can say that an identifier is either xid_start, Star xid_continue
or Plus math
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems that either we want to work around something like this
let ident = [%sedlex.regexp? xid_start, Star (xid_continue | Chars "'")]
let regid = [%sedlex.regexp? ident | Plus math]
One of the main advantage is that people have taken care of removing ambiguous characters from these sets (such as the duplication between the Greek mu μ
and the micro sign μ
), and we can rely on the unicode standards to keep the set up to date, and our specification is handled by the standard (plus or minus some implementation specific details which shouldn't be that numerous).
Very fun things can appear with this signature: |
What do you mean? |
Identifiers may contain $, ? or |, but not /. They may not begin with $ or ?. The identifier | is not valid (since it's the VBAR).