Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEG syntax considerations #17

Merged
merged 4 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions src/peg/adql2.1.peg
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
# Note: in the actual PEG definition comments start with #
# =========================== Configurables for deployers

# additional prefixes to be added here
udf_prefix <-
'ivo_'

# ============================ The Gramma's root symbol
# ============================ The Grammar's root symbol

query_specification <-
with_clause? _
Expand Down Expand Up @@ -245,13 +240,13 @@ identifier <-
(regular_identifier / delimited_identifier)

delimited_identifier <-
'"' ('""' / '[^"]')+ '"'
'"' ('""' / !["])+ '"'

regular_identifier <-
(!(keyword) letter (letter / digit / '_')*)

character_string_literal <-
("'" ("''" / r"[^']")* "'" (Space+ comment _)*)+
("'" ("''" / !['])* "'" (Space+ comment _)*)+

fold_function <-
('UPPER' / 'LOWER') _
Expand Down Expand Up @@ -280,7 +275,7 @@ geometry_function <-
/ extract_coord_sys

bitwise_op <-
'&' / '|' / '^'
[&|^]

bitwise_expression <-
'~' numeric_value_expression
Expand Down Expand Up @@ -456,13 +451,13 @@ point <-
_ ',' _ coordinates _ ')'

numeric_value_expression <-
term (_ ('+' / '-') _ numeric_value_expression)*
term (_ [-+] _ numeric_value_expression)*

term <-
factor (_ ('*' / '/') _ term)*
factor (_ [*/] _ term)*

factor <-
('+' / '-')? numeric_primary
[-+]? numeric_primary

numeric_value_function <-
math_function
Expand All @@ -482,10 +477,6 @@ user_defined_function <-
(_ value_expression (_ ',' _ value_expression)* _)?
')'

numeric_primary <-
value_expression_primary
/ numeric_value_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hu? How can this disapear without breaking the grammar?

Copy link
Member Author

@jontxu jontxu Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was duplicated for some unknown reason: line 485 and line 558.


# We need to seriously re-write value_expression because PEG
# doesn't have an actual longest-match operator. Thus, we
# cannot decide on the type of the first operand.
Expand Down Expand Up @@ -531,7 +522,7 @@ numeric_expression_operand <-
numeric_value_expression

numeric_expression_rest <-
('+' / '-' / '*' / '/') _ numeric_expression_operand
[-+*/] _ numeric_expression_operand

approximate_numeric_literal <-
exact_numeric_literal 'E'
Expand All @@ -541,7 +532,7 @@ exact_numeric_literal <-
(unsigned_integer '.')* unsigned_integer

signed_integer <-
('+' / '-')? unsigned_integer
[-+]? unsigned_integer

# TODO: We should take out character_string_literal here, MD thinks --
# what sort of use case did people have in mind here?
Expand All @@ -566,13 +557,13 @@ unsigned_hexadecimal <-
'0x' hex_digit+

digit <-
'[0-9]'
[0-9]

hex_digit <-
'[0-9A-F]'
[0-9A-F]

letter <-
'[a-zA-Z]'
[a-zA-Z]

# Reserved words

Expand Down Expand Up @@ -684,7 +675,7 @@ ANY_CHAR <-
letter / digit / ' ' / '\t' / ',' / '' / '.'

comment <-
'--' '[^\n\r]*'
'--' (![\n\r])*

_ <-
(comment / Space / EOL)*
Expand All @@ -693,10 +684,19 @@ __ <-
(comment / Space / EOL)+

_a <-
!'[A-Z0-9_]'
![A-Z0-9_]

Space <-
' '+ / '\t'

EOL <-
'\r\n' / '\n' / '\r'

EOF <-
!.

# =========================== Configurables for deployers
# additional prefixes to be added here
udf_prefix <-
'ivo_'

5 changes: 4 additions & 1 deletion src/peg/testpeg.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,10 @@ def get_parser(debug=False, root='query_specification'):
peg_rules = re.sub('#', '// ', peg_rules)

# adapt character range syntax
peg_rules = re.sub("'\\[", "r'[", peg_rules)
peg_rules = re.sub("\\[", "r'[", peg_rules)
peg_rules = re.sub("\\!r'\\[", "r'[^", peg_rules)
peg_rules = re.sub("\\]", "]'", peg_rules)
peg_rules = re.sub("EOF <-[^;]*;", "", peg_rules)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm too enthused by hacking the grammar syntax with regular expressions. For now, that's probably all right, but we should have a plan to parse our grammar into a tree and then serialise it in the various syntaxes, I guess. I just wonder if someone else might not already have written something like that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed a very interesting idea and much stable/reliable than regular expressions, though apparently this is enough and allowed us to quickly test this grammar with different parsers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hacking is to fit arpeggio, which asks for regular expressions: documentation on grammars.

Regex matches are given as strings with prefix r (e.g. r'\d*\.\d*|\d+').

We'd need to think about an alternate toolset which doesn't rely on them to test the ADQL examples.


return ParserPEG(peg_rules,
root,
Expand Down
Loading