Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct grammar of in-out expressions #194

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

amartini51
Copy link
Member

Fixes: #193

@amartini51
Copy link
Member Author

amartini51 commented Oct 10, 2023

The current grammar dates back to commit b0bc885 for Swift 1. The original RST langref grammar doesn't have specific discussion of in-out expressions; the older HTML langref says:

inout indicates that the argument will be passed as an "in-out" parameter. The caller must pass an lvalue decorated with the & prefix operator as the argument. Semantically, the value of the argument is passed "in" to the callee to a local value, and that local value is stored back "out" to the lvalue when the callee exits. This is normally indistinguishable from pass-by-reference semantics.

Neither of the langref formal grammars include a production for in-out-expression.

@amartini51 amartini51 requested a review from lorentey October 10, 2023 00:34
@amartini51
Copy link
Member Author

@lorentey Is this something you can review? You and I looked at some corrections and additions around in-out in 2020 (merged in commit 7c9e800).

This grammar avoids productions like &x+y -- even though x+y is a valid
expression, you can't use it this way in an in-out expression.  However,
it does still overproduce -- for example, a.b() is a valid postfix
expression but &a.b() isn't a valid in-out expression.

Co-authored-by: Joe Groff <[email protected]>
@lorentey
Copy link
Member

Interesting! I am not a Swift grammar expert, but to my naive, innocent eyes this does look like a better match with InOutExpr in AST. Not sure who would be a good person to confirm -- let's try @ahoppen, maybe?

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks correct to me.

Moving up one level in the grammar introduces additional overgeneration,
but otherwise code like f(&(x)) doesn't match.
@amartini51
Copy link
Member Author

@ahoppen Thanks for taking a look. Reading it again this morning, I see that 'postfix-expression' doesn't include parenthesized-expression, but you can use parentheses in an in-out expression.

One option is to define 'in-out-expression' as follows:

in-out-expression& primary-expression

This overproduces even more — allowing literals, closures, and conditional expressions inside the in-out expression where the aren't actually allowed. Elsewhere in the reference, we often handle places where the grammar overproduces by explaining the limitations in prose.

Do we have a list of what we specifically do support after the & for an in-out expression? For example, are key paths ever allowed?

Trying code out in the REPL, we allow parentheses for grouping, dot for member lookup, square brackets for array subscripts. An alternate approach that might be feasible here, if the list of allowed in-out things is small, is to build them up in the in-out-expression grammar specifically by repeating/re-using bits of other grammar rules:

Grammar of an in-out expression:

in-out-expression& in-out-expression-tail
in-out-expression-tailin-out-expression-tail . decimal-digits
in-out-expression-tailin-out-expression-tail . identifier
in-out-expression-tailin-out-expression-tail [ function-call-argument-list ]
in-out-expression-tail( in-out-expression-tail )
in-out-expression-tailidentifier

It doesn't look like we support tuple expressions — but the error I get suggests maybe we do and I need to spell something more explicitly:

var x = 10; var y = 10
func f(_ z: inout (Int, Int)) { z = (100, 200) }
f( &(x, y) )
// error: repl.swift:3:1: error: type of expression is ambiguous without more context
// f( &(x, y) )
// ^~~~~~~~~~~~

There isn't a clear point in the expression grammar where everything
produced by that rule is allowed in an in-out expression, but nothing
else, so we will always either over- or under-produce if we write
in-out-expression in terms of existing productions.

Combined rules from the parts of the grammar that are definitely
supported to make these production rules, although this list might be
incomplete:

- The grammar for . is borrowed from explicit-member-expression
- The grammar for [] is borrowed from subscript-expression
- The grammar for () is adapted from parenthesized-expression
@ahoppen
Copy link
Member

ahoppen commented Oct 11, 2023

The Swift parser itself allows any expression after &. parseExprUnary calls back into parseExprUnary after consuming the &.

https://github.com/apple/swift/blob/e298fb5a046cde1b588a8e39ba622ac6e17b5dba/lib/Parse/ParseExpr.cpp#L616-L626

Any enforcement that you can’t pass a literal after the ampersand is semantic. For example the diagnostic of passing an integer literal is the same as when passing a let constant.

func foo(_ x: inout Int) {}

foo(&1) // error: Cannot pass immutable value as inout argument: literals are not mutable

let myInt = 1
foo(&myInt) // error: Cannot pass immutable value as inout argument: 'myInt' is a 'let' constant

I’m not entirely sure what’s used to represent parseExprUnary in the grammar but I think that’s what we should use as a child here.

CC @rintaro In case you have opinions / thoughts as well.

@ahoppen ahoppen requested a review from rintaro October 11, 2023 20:06
@rintaro
Copy link
Member

rintaro commented Oct 11, 2023

Grammar of an in-out expression:

in-out-expression → & in-out-expression-tail
in-out-expression-tail → in-out-expression-tail . decimal-digits
in-out-expression-tail → in-out-expression-tail . identifier
in-out-expression-tail → in-out-expression-tail [ function-call-argument-list ]
in-out-expression-tail → ( in-out-expression-tail )
in-out-expression-tail → identifier

This is not enough. At least it's missing

in-out-expression-tail → 'self'
in-out-expression-tail → in-out-expression-tail . 'self'

And somewhat surprisingly, the following code also works

struct S {
  static var shared: Self = S()
}

func foo(_ x: inout S) {}
foo(&.shared)

As @ahoppen mentioned, the parser parses any prefix-expression (parseExprUnary). And the semantic analysis decides if it accepts the operand. IMO, as the grammar, it's OK to say just

in-out-expression → '&' postfix-expression

NOTE: not prefix-expression because both &&foo or &!foo is parsed as && or &! prefix-operator respectively.

@rintaro
Copy link
Member

rintaro commented Oct 12, 2023

On second thought,

the parser parses any prefix-expression (parseExprUnary). And the semantic analysis decides if it accepts the operand.

This is similar to keypath expression. Parser parses any postfix components, but in the current grammar, key-path-components only accepts certain suffixes. I don't have strong opinion on whether the grammar should reflect technically-semantic things, or not . But at least we should be consistent.

@amartini51
Copy link
Member Author

amartini51 commented Oct 12, 2023

Thanks, @rintaro.

TSPL doesn't generally distinguish between parsing, lexing, semantic analysis, etc. — so its formal grammar doesn't specifically try to include content that parses but is semantically invalid. If there's something that isn't valid Swift code, we try to exclude it from the grammar, regardless of which stage of compilation rejects it.

Defining 'in-out-expression' in terms of 'prefix-expression' as in your comment doesn't entirely work, because grouping parentheses appear in the grammar in the production for 'primary-expression', but they're valid after the &. I see three approaches we could take:

  • Define 'in-out-expression' in terms of 'postfix-expression' and re-implement the productions for grouping parentheses as part of 'in-out-expression'. I don't particularly like this — it still overproduces, and it's less than ideal to have grouping parentheses defined in multiple places in the grammar. [EDIT: That's incorrect; 'postfix-expression' includes 'primary-expression', which includes 'parenthesized-expression'. No need to re-implement grouping.]
  • Define 'in-out-expression' in terms of 'primary-expression' and add a prose description of what's actually allowed. We've used this to handle overproduction elsewhere in the grammar, where it's not practical to reduce it in the grammar.
  • Define 'in-out-expression' in terms of an explicit list of what grouping and postfixes are allowed, similar to how 'key-path-expression' handles things. (That approach was added by commit 2757f7a and accumulated additional pieces over time.)

If the third option produces a reasonable grammar, that's my initial preference. That said, it might not be practical — adding implicit member expressions and self expressions might make the list of productions here unwieldy.

For the second option, I'd add a paragraph before the "for more information" line like the following:

The expression is a combination of the following kinds of expression:

  • An identifier or self
  • Access to a member, including properties, tuple elements, and .self
  • Access to an array subscript

Member access can be explicit like f(&x.y) or implicit like f(&.z).
The expression can also include grouping parentheses.

The previous grammar didn't include `self` or `.self` or implicit member
expressions, as Rintaro Ishizaki <[email protected]> noted on the PR.
Duplicating those productions here, with modifications to prevent them
from pulling in all of primary-expression or postfix-expression, would
make this grammar a bit too long and unwieldy.
amartini51 added a commit that referenced this pull request Oct 20, 2023
The old grammar didn't allow things like f(&x.y) which are legal in
Swift. The new grammar allows a bunch of extra stuff -- literals,
closures, and conditional expressions -- but that's a better problem to
have.

See also #194, where we're discussing a more specific fix.

Fixes: #193 (partially)
@sarahcantohyatt
Copy link
Contributor

sarahcantohyatt commented Oct 26, 2023

Hi everyone.
I'm the person who originally posted the issue about this in #193

I've read through this thread along with the partial fix that was merged.
I just wanted to leave my thoughts here:

I think that @rintaro proposed the fix that I believe is the most correct with & postfix-expression
This allows everything that is syntactically valid and pushes rejection to semantic analysis.
A language grammar is really a syntactic representation of the language, not semantic.
And of course we want the grammar to be readable and not too complex thus we end up with this mix between concrete and abstract grammar that defines the Swift grammar.
A side effect of this is that there's going to be overproduction and lots of it.
I could provide dozens of examples here of overproductions in other areas of the grammar.

With that being said, I also see the concern of consistency when looking at how keypath-expression is handled.
It's an idea of "If overproduction is limited there, where else should it be limited?"

I'm not sure it's my place to be chiming in on that decision since it's really a design decision.
I just wanted to share my thoughts and also point out that the partial fix doesn't address the original issue that I brought up (&s.a).
s.a was produced through the production postfix-expression. So if we limit it to & primary-expression the original example is still not producible.
Thus, defining it in terms of primary-expression and adding a prose specification would not work.

Thanks for the consideration.

@amartini51
Copy link
Member Author

amartini51 commented Nov 2, 2023

This got incorrectly (and accidentally) closed as part of merging #199 because both PRs resolve the same GitHub issue. EDIT: Or possibly because commit 028a2ec links to this PR.

@amartini51 amartini51 reopened this Nov 2, 2023
amartini51 and others added 2 commits November 2, 2023 14:05
The grammar for postfix-expression produces this case, and includes
primary-expression.  This grammar also allows for production of
parentheses because primary-expression produces parenthesized-expression.

Co-authored-by: Sarah Canto <[email protected]>
Kept this branch's changes to the grammar.
@amartini51 amartini51 marked this pull request as draft December 19, 2023 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in grammar regarding in-out-expression
5 participants