Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of the inclusion command #85

Open
benjub opened this issue May 20, 2022 · 7 comments
Open

Treatment of the inclusion command #85

benjub opened this issue May 20, 2022 · 7 comments

Comments

@benjub
Copy link
Collaborator

benjub commented May 20, 2022

Curently, metamath.c (and mmverify.py) treat inclusion commands ($[ FILENAME $]) as exact copy-pasting, that is, the content of the file FILENAME is pasted in place of $[ FILENAME $] in the input stream before lexing. This permits jokes like the following:

benoit@ordi$ cat joke.mm
$[ joke1.mm $] $.
benoit@ordi$ cat joke1.mm
$c a
benoit@ordi$ ./metamath "read joke.mm" "quit"
Metamath - Version 0.198 7-Aug-2021           Type HELP for help, EXIT to exit.
MM> read joke.mm
Reading source file "joke.mm"... 18 bytes
Reading included file "joke1.mm"... 5 bytes
60 bytes were read into the source buffer.
The source has 1 statements; 0 are $a and 0 are $p.
No errors were found.  However, proofs were not checked.  Type VERIFY PROOF *
if you want to check them.
MM> quit

On the other hand, the EBNF in the book does not permit it. Here, I think that the EBNF way is preferable, and metamath.c is a bit too laxist. Thoughts ? @digama0 ?

@digama0
Copy link
Member

digama0 commented May 20, 2022

Yes, this is a limitation of the current implementation in metamath.c. The way it is written doesn't make it easy to fix...

@digama0 digama0 transferred this issue from metamath/metamath-book May 20, 2022
@benjub
Copy link
Collaborator Author

benjub commented May 23, 2022

@digama0 : Is this, roughly writing, because in metamath.c (and mmverify.py), FILENAME is treated by the lexer (I mean: the lexer, upon encountering '$[', immediately lexes the content of FILENAME), whereas this should be delayed (the lexer simply returns the sequence of tokens '$['; IDENT(FILENAME); '$]') and treated only by the parser ?
I am writing a new verifier and I took the second approach. Is this what you would recommend ? Is this what metamath-knife does ? (the rust code is a bit hard to read for me) (I guess @tirix has ideas on this matter)

@tirix
Copy link

tirix commented May 23, 2022

metamath-knife does a first pass to prepare for parallel parsing, which detects file includes and top-level chapters, as these are good places to split the database.
Because of that, in metamath-knife, file includes are only permitted at the top level, outside of any statement or frame.

There is a note about it in parser.rs:

We require file inclusions to be between statements at the top nesting level (this has been approved by Norman Megill).

So it is actually even more restrictive that the EBNF definition.

@benjub
Copy link
Collaborator Author

benjub commented May 23, 2022

Thanks @tirix. After a rapid look at https://github.com/david-a-wheeler/metamath-knife/blob/071292ef3a10350d0dae9e409b4921f91a08ce95/src/parser.rs it looks that lexing and parsing are kind of merged, with no clear distinction ?

Concerning my first question above, if inclusion commands are simply tokenized as '$['; IDENT(FILENAME); '$]' and the work is deferred to the parser, this makes my lexer simpler since it is then not "multifile" so does not need to keep track of the file it is reading... In progress on my side...

Actually, I think that having no reference at all to the "top nesting level" or "outermost scope" in the EBNF would make things clearer, and actually even easier to code. But I haven't thought about multithreading.

@digama0
Copy link
Member

digama0 commented May 23, 2022

I don't think you need to handle "outermost scope" at the parser level, but it is totally possible to do so - this is a context-free property. I think the majority of metamath verifiers treat $c scoping at verification time, not parse time: at verification time you have a stack of frames so it's easy to tell if you are at the top level.

@tirix
Copy link

tirix commented May 24, 2022

it looks that lexing and parsing are kind of merged

You could say so, in the sense that the parser.rs module handles both lexing and parsing. The lexing part is inside Scanner's get_raw function, which isolates a single token and creates a span, which is just noting its beginning and end, without copying.

Handling file inclusion at the parsing level seems sensible to me, and I would also agree to to forbid in our standard the "jokes" you're referring to. I've now had a look at the Book, and indeed it states in chapter 4.1.2:

A file inclusion command consists of $[ followed by a file name followed by $]. It is only allowed in the outermost scope (i.e., not between ${ and $}) and must not be inside a statement (e.g., it may not occur between the label of a $a statement and its $.).

The specification would have to be relaxed here, for the "outermost scope" part.

@tirix
Copy link

tirix commented May 24, 2022

...so to come back at the initial title of this issue, metamath.exe does not seem to be compliant with the Book here. Appendices are usually non-normative, and the actual specification shall be in the main chapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants