Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended identifiers #818

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/),
and this project adheres to [Semantic Versioning](https://semver.org/).

## Unreleased

### Changed

- Identifiers may contain `$` or `?` but may not begin with
them. They may contain `|` as well. Identifiers must not contain `/`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they can contain VBAR, this may conflict with constructor declarations in inductive type definitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allowing / as (single-letter) identifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be a conflict with constructors: a constructor is declared with | Foo so you parse the vbar and Foo. We force that there is a space between the vbar and the identifier, which is rather natural since tokens are separated by spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if the user forgets a space and write a|b ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's incorrect! Tokens are separated by spaces, so it's incorrect as much as symbolfoo is not symbol foo, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could try to follow standards and do like Rust https://doc.rust-lang.org/reference/identifiers.html, that is using standard sets of codepoints defined in https://www.unicode.org/reports/tr31/tr31-33.html.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, math symbols are not included in identifiers. But according to this source, it is as good to keep the identifiers and maths symbols as separate classes. which we can do. We can say that an identifier is either xid_start, Star xid_continue or Plus math.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that either we want to work around something like this

let ident = [%sedlex.regexp? xid_start, Star (xid_continue | Chars "'")]
let regid = [%sedlex.regexp? ident | Plus math]

One of the main advantage is that people have taken care of removing ambiguous characters from these sets (such as the duplication between the Greek mu μ and the micro sign μ), and we can rely on the unicode standards to keep the set up to date, and our specification is handled by the standard (plus or minus some implementation specific details which shouldn't be that numerous).


## 2.1.0 (2022-01-17)

### Added
Expand Down
9 changes: 6 additions & 3 deletions src/parsing/lpLexer.ml
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,11 @@ let string = [%sedlex.regexp? '"', Star (Compl '"'), '"']

(** Unqualified regular identifiers are any non-empty sequence of characters
not among: *)
let forbidden_letter = [%sedlex.regexp? Chars " ,;\r\t\n(){}[]:.`\"@$|/"]
let regid = [%sedlex.regexp? '/' | '*' | Plus (Compl forbidden_letter)]
let forbidden_letter = [%sedlex.regexp? Chars " ,;\r\t\n(){}[]:.`\"@/"]
let forbidden_first_letter = [%sedlex.regexp? (forbidden_letter | Chars "$?")]
let regid =
[%sedlex.regexp?
Compl forbidden_first_letter, Star (Compl forbidden_letter)]

let is_regid : string -> bool = fun s ->
let lb = Utf8.from_string s in
Expand Down Expand Up @@ -280,7 +283,7 @@ let rec token lb =
| '_' -> UNDERSCORE

(* identifiers *)
| '/' | regid -> UID(Utf8.lexeme lb)
| regid -> UID(Utf8.lexeme lb)
| escid -> UID(remove_useless_escape(Utf8.lexeme lb))
| '@', regid -> UID_EXPL(remove_first lb)
| '@', escid -> UID_EXPL(remove_useless_escape(remove_first lb))
Expand Down
1 change: 1 addition & 0 deletions tests/KO/identifiers.lp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
symbol foo/bar: TYPE;
1 change: 1 addition & 0 deletions tests/KO/identifiers_bar.lp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
symbol | : TYPE;
1 change: 1 addition & 0 deletions tests/KO/identifiers_start_dollar.lp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
symbol $foo: TYPE;
1 change: 0 additions & 1 deletion tests/OK/comment_in_qid.lp
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
require tests.OK.bool;
assert ⊢ tests./*comment*/OK.bool.true : tests.OK.bool.B;
symbol A:TYPE;
symbol /:A → A → A;
symbol *:A → A → A;
4 changes: 3 additions & 1 deletion tests/OK/identifiers.lp
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
symbol /:TYPE;
symbol *:TYPE;
symbol **:TYPE;
symbol foo*: TYPE;
symbol foo$: TYPE;
symbol foo|bar: TYPE;