Scanner trims the last newline #58

alexschneider · 2015-03-03T19:32:02Z

The last newline in the scanner doesn't really convey any information to the parser or program - and we have to specially handle it in multiple locations in the parser to ensure that a program with and without a newline at the end get parsed properly. Is there any issue with trimming the last token from the scan tokens if it's a newline?

@rachelriv specifically because you worked on the scanner

rachelriv · 2015-03-03T19:38:49Z

According to our grammar, a block is a sequence of statements followed by newlines (and optionally a return statement at the end).

Program ::= Block
Block   ::= (Stmt newline)* (ReturnStmt newline)?

If we remove the final newline token, then the final statement doesn't fit our grammar.

alexschneider · 2015-03-03T19:39:38Z

So what about adding a newline prior to the end of the file if it doesn't exist? That way we can assume it exists.

rachelriv · 2015-03-03T19:39:54Z

Why would we do that?

alexschneider · 2015-03-03T19:40:48Z

The alternative is just not at all parsing files that look like this:

if exp:
  xyz
end<EOF token>

rachelriv · 2015-03-03T19:42:41Z

Well we are the ones adding in the EOF token. I'm really not sure what you are getting at.

alexschneider · 2015-03-03T19:43:40Z

Some files don't end with a newline - there's an implicit EOF token put in the files so we know where the file ends (by the operating system). Though it's best practice, not everyone has newlines before the end of file.

rachelriv · 2015-03-04T09:09:42Z

I understand what you are saying now! Thanks for the explanation.

If you can think of an elegant way to fix this, go ahead and implement it and submit a PR. However, I think this issue should be low on our priority list. I'd really like to get some more tests and a fully working parser first!

rtoal · 2015-03-04T15:13:23Z

Because your scripts are just lines of code the need for the classic EOF token isn't really there. For files that are bracketed with, say "program" and "end" (like Pascal) or that are allowed only one class, say, the EOF is important to ensure there is no additional source after the single syntactic structure allowed in the compilation unit. I believe in your case that emitting a newline when you hit the end of your stream will suffice. It would be a shame to put newline | eof everywhere in your grammar.

This is a great issue. Good find, Alex. Agree with Rachel that it can be postponed a bit. It emitting a newline at the end of file works for you, though, you can do it sooner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scanner trims the last newline #58

Scanner trims the last newline #58

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 4, 2015

rtoal commented Mar 4, 2015

Scanner trims the last newline #58

Scanner trims the last newline #58

Comments

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 3, 2015

alexschneider commented Mar 3, 2015

rachelriv commented Mar 4, 2015

rtoal commented Mar 4, 2015