Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(sqlsmith): generator needs to consider expression precedence #7671

Closed
kwannoel opened this issue Feb 2, 2023 · 3 comments · Fixed by #7672
Closed

bug(sqlsmith): generator needs to consider expression precedence #7671

kwannoel opened this issue Feb 2, 2023 · 3 comments · Fixed by #7672
Assignees
Labels
type/bug Something isn't working

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Feb 2, 2023

Describe the bug

#7665 (comment)

LGTM, it is failing because of an issue on sqlsmith side I think.

select 'a' = lower('jRvXgnxBXI') NOT IN (true, false, false, true, false, false, false, true, true);

The following query is generated by sqlsmith, but it is invalid.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@kwannoel kwannoel added the type/bug Something isn't working label Feb 2, 2023
@github-actions github-actions bot added this to the release-0.1.17 milestone Feb 2, 2023
@kwannoel kwannoel self-assigned this Feb 2, 2023
@mergify mergify bot closed this as completed in #7672 Feb 2, 2023
@xiangjinwu
Copy link
Contributor

IIUC, the problem may be deeper and more widespread.

Is it possible that sqlsmith generate an AST like Plus(1, Plus(1, 2))?

Given input 1 + 2 + 3, the parser will build Plus(Plus(1, 2), 3), which unparses back to 1 + 2 + 3.
Given input 1 + (2 + 3), the parser will build Plus(1. Nested(Plus(2, 3))), which unparses back to 1 + (2 + 3).

Th parser itself would never build Plus(1, Plus(2, 3)), which cannot unparse and parse to the same AST.

Maybe related to #7635.

One idea is that parsed AST should not be aware of parentheses (i.e. remove Expr::Nested, also related #6717 (comment)), but I have not checked why it was introduced in upstream initially.

@kwannoel
Copy link
Contributor Author

kwannoel commented Feb 3, 2023

IIUC, the problem may be deeper and more widespread.

Is it possible that sqlsmith generate an AST like Plus(1, Plus(1, 2))?

Given input 1 + 2 + 3, the parser will build Plus(Plus(1, 2), 3), which unparses back to 1 + 2 + 3. Given input 1 + (2 + 3), the parser will build Plus(1. Nested(Plus(2, 3))), which unparses back to 1 + (2 + 3).

Th parser itself would never build Plus(1, Plus(2, 3)), which cannot unparse and parse to the same AST.

Maybe related to #7635.

One idea is that parsed AST should not be aware of parentheses (i.e. remove Expr::Nested, also related #6717 (comment)), but I have not checked why it was introduced in upstream initially.

Makes sense. We currently evade this problem in sqlsmith because of #7635.

infix binop gets unparsed with parenthesis, hence we are able to parse and unparse (Plus(1, Plus(1,2)) back to same expression.

After #7636, this "feature" of sqlsmith will be fixed.

Additionally in 7636 we will also parenthesize infix and postfix expressions. Sqlsmith will then avoid generating invalid forms as described.


I recommend we keep Nesting given that it mainly affects sqlsmith (and not much else?), but there's already a workaround. Without it, it will be much harder to generate expressions. Unparsing a valid parenthesized expression may result in an invalid sql string:
e.g.

1 + (2 + 3)

Can't be displayed unambiguously without Nested being preserved in AST.

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Feb 7, 2023

Basically 2 different approaches:

  • Plus(1, Plus(1, 2)) is an invalid AST. fmt/unparse should not add extra parentheses. sqlsmith should avoid generating the invalid AST, possibly via wrapping with Expr::Nested aggressively.
  • Remove Expr::Nested. fmt/unparse need to wrap subexpr with parentheses aggressively. Plus(1, Plus(1, 2)) is valid and canonical AST of 1 + (2 + 3) and sqlsmith can just generate this AST. (To elaborate, 1 + 2 + 3 -> Plus(Plus(1, 2), 3) -> (1 + 2) + 3 -> Plus(Plus(1, 2), 3))

The upstream sqlparser is following the former approach, so does the recent PR #7636. But we used the latter parentheses approach when introducing sqlsmith (#3305). This may be how the bug arose and there may be no problem as long as we stick to one approach consistently rather than mixing them.

it mainly affects sqlsmith (and not much else?)

Another motivation to make parentheses insignificant in AST (i.e. remove Expr::Nested) is #6717 (comment) . But as mentioned above, the latter approach diverges from upstream significantly and likely requires refactoring of existing code. There may also be ways to support op ALL(subquery) regardless of number of parentheses following the first approach. We can defer it until then. For now, it is true that this only affects sqlsmith and I agree fixing it on sqlsmith side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants