Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended identifiers #818

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

gabrielhdt
Copy link
Member

Identifiers may contain $, ? or |, but not /. They may not begin with $ or ?. The identifier | is not valid (since it's the VBAR).

Identifiers may contain $, ? or |, but not /. They may not begin with $
or ?. The identifier | is not valid (since it's the VBAR).
### Changed

- Identifiers may contain `$` or `?` but may not begin with
them. They may contain `|` as well. Identifiers must not contain `/`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they can contain VBAR, this may conflict with constructor declarations in inductive type definitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allowing / as (single-letter) identifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be a conflict with constructors: a constructor is declared with | Foo so you parse the vbar and Foo. We force that there is a space between the vbar and the identifier, which is rather natural since tokens are separated by spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if the user forgets a space and write a|b ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's incorrect! Tokens are separated by spaces, so it's incorrect as much as symbolfoo is not symbol foo, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could try to follow standards and do like Rust https://doc.rust-lang.org/reference/identifiers.html, that is using standard sets of codepoints defined in https://www.unicode.org/reports/tr31/tr31-33.html.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, math symbols are not included in identifiers. But according to this source, it is as good to keep the identifiers and maths symbols as separate classes. which we can do. We can say that an identifier is either xid_start, Star xid_continue or Plus math.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that either we want to work around something like this

let ident = [%sedlex.regexp? xid_start, Star (xid_continue | Chars "'")]
let regid = [%sedlex.regexp? ident | Plus math]

One of the main advantage is that people have taken care of removing ambiguous characters from these sets (such as the duplication between the Greek mu μ and the micro sign μ), and we can rely on the unicode standards to keep the set up to date, and our specification is handled by the standard (plus or minus some implementation specific details which shouldn't be that numerous).

@amelieled
Copy link
Contributor

Very fun things can appear with this signature: type -> type -> type.

@gabrielhdt
Copy link
Member Author

Very fun things can appear with this signature: type -> type -> type.

What do you mean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants