Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ANY/ALL/SOME(array) operator to support get_indexes in sqlalchemy #6717

Merged
merged 17 commits into from
Jan 9, 2023

Conversation

HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Dec 3, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

As the title.
These operators are used in the right side of binary operators which return boolean value. They require that there are some expressions in the parentheses, which return arrays. SOME is a synonym for ANY.
In PostgreSQL, there are some other operators named ANY/ALL/SOME(subquery). They are not supported in this PR.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

Types of user-facing changes

  • SQL commands, functions, and operators

Release note

support ANY/ALL/SOME(array) operator.
PostgreSQL's doc:
ANY/SOME (array): https://www.postgresql.org/docs/current/functions-comparisons.html#id-1.5.8.30.16
ALL (array): https://www.postgresql.org/docs/current/functions-comparisons.html#id-1.5.8.30.17

Refer to a related PR or issue link (optional)

related #6636

@codecov
Copy link

codecov bot commented Dec 3, 2022

Codecov Report

Merging #6717 (6175e39) into main (268df0d) will decrease coverage by 0.05%.
The diff coverage is 34.88%.

@@            Coverage Diff             @@
##             main    #6717      +/-   ##
==========================================
- Coverage   73.02%   72.96%   -0.06%     
==========================================
  Files        1063     1064       +1     
  Lines      169728   169983     +255     
==========================================
+ Hits       123937   124028      +91     
- Misses      45791    45955     +164     
Flag Coverage Δ
rust 72.96% <34.88%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/expr/src/expr/build_expr_from_prost.rs 42.73% <0.00%> (-7.10%) ⬇️
src/expr/src/expr/expr_some_all.rs 0.00% <0.00%> (ø)
src/expr/src/expr/mod.rs 40.35% <0.00%> (-0.73%) ⬇️
src/frontend/src/expr/mod.rs 80.08% <ø> (ø)
src/stream/src/executor/dynamic_filter.rs 93.42% <ø> (ø)
src/stream/src/from_proto/dynamic_filter.rs 0.00% <ø> (ø)
src/frontend/src/binder/expr/binary_op.rs 75.67% <77.27%> (+13.17%) ⬆️
src/frontend/src/expr/function_call.rs 87.30% <81.81%> (-0.53%) ⬇️
src/sqlparser/src/parser.rs 92.07% <90.00%> (-0.03%) ⬇️
src/frontend/src/expr/type_inference/func.rs 88.54% <94.28%> (+0.27%) ⬆️
... and 13 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@HuaHuaY HuaHuaY requested a review from BowenXiao1999 December 5, 2022 05:18
Comment on lines +1059 to +1062
// TODO: support `all/any/some(subquery)`.
if let Expr::Subquery(_) = &right {
parser_err!("ANY/SOME/ALL(Subquery) is not implemented")?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this only covers part of the cases:

select 1 < ALL(select 1); -- parse_expr error above
select 1 < ALL((select 1)); -- this check
select 1 < ALL(((select 1))); -- allowed by PG but rejected for mismatching type by us
select 1 < ALL(((select array[1]))); -- disallowed by PG but evaluated as scalar subquery by us

A full fix may take a bigger effort. Just pointing it out and lets focus on non-subquery first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new issue to implement all/any/some(subquery) and avoid inconsistent behavior.

src/sqlparser/tests/sqlparser_common.rs Outdated Show resolved Hide resolved
Comment on lines 125 to 128
self.compare_left_right(
left.map(|s| s.into_scalar_impl()),
right.map(|s| s.into_scalar_impl()),
)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should avoid relying on repeated eval_row calls inside eval. Especially when we are already considering deprecate it. cc @TennyZhuang

Ideally, SomeAllExpression (or ScalarArrayOpExpr in PostgreSQL) takes 3 generic parameter <L, R, F> similar to BinaryExpression<L, R, O, F>, where output array type is omitted because it is always bool. That is, we store the inner operator in the form of a rust function (func: F) rather than a BoxedExpression. But this seems requires a refactor of our way of building expressions, similar to the frontend binder refactor of introducing resolve_binary_operator - we just want the match arms without building the whole expr.

Alternatively, we can still embed BoxExpression but avoid eval_row. Borrowing from the idea of short-circuit evaluation (#6208), we can first eval over a chunk of 2 columns consisting (lhs, rhs[0]), mask out with visibility for rows already determined (ALL-false/SOME-true/array-len-reached), then continue with another chunk of (lhs, rhs[1])...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we will change in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I'm doing other work now, so this modification will be delayed for a while.

&bound_right,
)?);

FunctionCall::new_binary_op_func(func_types, vec![bound_left, bound_right])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor and introduction of dedicated resolve_binary_operator/new_binary_op_func/infer_some_all seems have some space of improvement. But compared to the backend eval_row issue, let's focus on that first and then unblock future development of sqlalchemy.

Talking about the better solution, note that:

  • Concat never returns a boolean and can be excluded / treated specially.
  • PGRegexMatch/PGRegexNotMatch actually needs to have their dedicated ExprType rather than current RegexpMatch+IsNull. fix(expr): regex operator ~ / !~ should return null when either input is null #5136
  • After the fix above, resolve_binary_operator no longer needs to return Vec<ExprType> for arbitrary levels of nesting. It only needs (ExprType, bool) for one level and an optional negation.
  • PostgreSQL also forbids x SIMILAR TO ALL(y) simply because SIMILAR TO is not a native operator but rewritten into multiple exprs.

The fact of operator being a single ExprType may help simplify these parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new issue to track this.

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the query part. I'm not so sure about the impl in binder and expression so pls follow @xiangjinwu

@HuaHuaY HuaHuaY requested a review from xiangjinwu December 22, 2022 08:54
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most lgtm. Just found more edge cases:

src/frontend/src/expr/function_call.rs Outdated Show resolved Hide resolved
src/expr/src/expr/expr_some_all.rs Outdated Show resolved Hide resolved
@xiangjinwu xiangjinwu self-assigned this Jan 4, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 2607 files.

Valid Invalid Ignored Fixed
1248 1 1358 0
Click to see the invalid file list
  • src/expr/src/expr/expr_some_all.rs

src/expr/src/expr/expr_some_all.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed remaining issues on his behalf.

@mergify mergify bot merged commit c30aaae into main Jan 9, 2023
@mergify mergify bot deleted the zehua/some_any_all branch January 9, 2023 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants