Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(binder): correctly bind rcte in bind_with & bind_relation_by_name #16023

Merged
merged 13 commits into from
Apr 3, 2024

Conversation

xzhseh
Copy link
Contributor

@xzhseh xzhseh commented Mar 30, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

some subsequent work(s) of #15522 for binding rcte.

related: #15681.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

// https://www.postgresql.org/docs/16/sql-select.html#:~:text=the%20recursive%20self%2Dreference%20must%20appear%20on%20the%20right%2Dhand%20side%20of%20the%20UNION
let bound_base = self.bind_set_expr(*left)?;
// todo: to be further reviewed
fn gen_query(s: SetExpr) -> Query {
Copy link
Contributor Author

@xzhseh xzhseh Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps. the below example is copied from https://github.com/risingwavelabs/risingwave/pull/15522/files#r1524367781.

with recursive t as
    ((select 1 limit 1)
    union all
    (select n+1 from t as t(n) where n < 5 limit 3));

since under current definition of union, we can only get SetExpr rather than an entire Query for left and right.

thus, should we use BoundSetExpr as a workaround (and also KISS) for RecursiveUnion at present, or just initially conforming to postgres's behavior - which of course, need to modify some more stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BoundSetExpr LGTM, because BoundQuery and BoundSetExpr contains each other actually. 🥵

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's stick with BoundSetExpr first.

@@ -377,16 +378,23 @@ impl Binder {
Ok(Relation::BackCteRef(Box::new(BoundBackCteRef { share_id })))
}
BindingCteState::Bound { query } => {
let schema = match query.clone() {
Left(normal) => normal.body.schema().clone(),
Right(recursive) => recursive.recursive.body.schema().clone(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, could we just use the recursive schema here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecursiveUnion is also a union, so we need to align its inputs' schema, after that, we can use any input's schema as the union's schema.
See for more details

pub(super) fn bind_set_expr(&mut self, set_expr: SetExpr) -> Result<BoundSetExpr> {

This comment was marked as outdated.

Copy link
Contributor Author

@xzhseh xzhseh Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus, for the case like below (e.g., <value> union all <select stmt>),
we might directly use rhs's schema as the final schema.

  • since align_schema only handles select expr(s).
with recursive t(n) as (
    values (1)
    union all
    select n + 1 from t where n < 100
)
select * from t;

@TennyZhuang
Copy link
Contributor

What will happen if we write a CTE with recursive keyword but not really have recursive reference? It seems that it's acceptable in pg.

dev=# with recursive t AS (select 1) select * from t;
 ?column?
----------
        1
(1 row)

dev=#
dev=# with recursive t AS (select 1 UNION ALL select 1) select * from t;
 ?column?
----------
        1
        1
(2 rows)

cc @xiangjinwu Can you give some ideas?

@xzhseh
Copy link
Contributor Author

xzhseh commented Apr 1, 2024

What will happen if we write a CTE with recursive keyword but not really have recursive reference? It seems that it's acceptable in pg.

dev=# with recursive t AS (select 1) select * from t;
 ?column?
----------
        1
(1 row)

dev=#
dev=# with recursive t AS (select 1 UNION ALL select 1) select * from t;
 ?column?
----------
        1
        1
(2 rows)

cc @xiangjinwu Can you give some ideas?

one solution might be - treat it as normal union iff we find the base and recursive parts are both constant-like expr;

but,

  1. this involves extra (and potentially large) conditional checking paths to detect these cases - since they are already a BoundQuery.
  2. will there be any other corner case? CleanShot 2024-04-01 at 09 27 02@2x

@TennyZhuang
Copy link
Contributor

treat it as normal union iff we find the base and recursive parts are both constant-like expr;

  1. Currently, we’ll reject recursive union without a top-level UNION. I guess we can try bind it as non recursive CTE when met this.
  2. For CTE with recursive definition but non-recursive body, may we can convert it as a normal UNION in logical planner?

cc @dylan

Not necessary to resolve the case in the PR.

@@ -58,6 +58,11 @@ impl Planner {
Relation::BackCteRef(..) => {
bail_not_implemented!(issue = 15135, "recursive CTE is not supported")
}
// todo: ensure this will always be wrapped in a `Relation::Share`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if this is not true.

@chenzl25
Copy link
Contributor

chenzl25 commented Apr 2, 2024

treat it as normal union iff we find the base and recursive parts are both constant-like expr;

  1. Currently, we’ll reject recursive union without a top-level UNION. I guess we can try bind it as non recursive CTE when met this.
  2. For CTE with recursive definition but non-recursive body, may we can convert it as a normal UNION in logical planner?

cc @dylan

Not necessary to resolve the case in the PR.

Yes, I think we can just bind it to a normal UNION operator when it is proved that it is not a recursive CTE.

@xiangjinwu
Copy link
Contributor

What will happen if we write a CTE with recursive keyword but not really have recursive reference? It seems that it's acceptable in pg.

cc @xiangjinwu Can you give some ideas?

#15522 (comment)

If we know, at the binding phase, it is not recursive, we can just bind to a normal union (wrapped inside SetExpr and Query). As per my comment in that PR, the top level would be Either<BoundQuery, RecursiveUnion>.

@TennyZhuang
Copy link
Contributor

If we know, at the binding phase, it is not recursive

I guess it’s a little hard for our binder to know this, since we don’t have visitor on binder structure?

Or we can just store a flag in context.

@xzhseh
Copy link
Contributor Author

xzhseh commented Apr 2, 2024

I guess it’s a little hard for our binder to know this, since we don’t have visitor on binder structure?

a somewhat "hacky" solution would be, (with recursive flag enabled) check if the pattern like select (...) from <cte_name> appears in the rhs (a.k.a. the recursive term) for our currently binding union - if so, treat this as a recursvie cte, vice versa.

problem is - this is definitely error-prone and add extra (potentially ugly) hard-coded checking when bind_with; and the semantic may not be exactly aligned with postgres's.

Or we can just store a flag in context.

could you elaborate on this approach? I don't quite get the idea.

@xzhseh xzhseh changed the title feat(binder): update RecursiveUnion and related function(s) feat(binder): correctly bind rcte in bind_with & bind_relation_by_name Apr 2, 2024
}

impl RewriteExprsRecursive for BoundRecursiveUnion {
fn rewrite_exprs_recursive(&mut self, _rewriter: &mut impl crate::expr::ExprRewriter) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation seems should be blanket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean by "blanket"? CleanShot 2024-04-01 at 09 27 02@2x

ps. this part is the same as BoundBackRef.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator should be similar to BoundSetExpr, but not BoundBackRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated - now we will rewrite the two BoundSetExpr respectively.

Copy link
Contributor

@TennyZhuang TennyZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really hard to test binder in our codebase, the codes LGTM. We can refine it later when finish the planner part.

// https://www.postgresql.org/docs/16/sql-select.html#:~:text=the%20recursive%20self%2Dreference%20must%20appear%20on%20the%20right%2Dhand%20side%20of%20the%20UNION
let bound_base = self.bind_set_expr(*left)?;
// todo: to be further reviewed
fn gen_query(s: SetExpr) -> Query {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BoundSetExpr LGTM, because BoundQuery and BoundSetExpr contains each other actually. 🥵

e.key()
))
.into());
if let BindingCteState::Bound { .. } =
Copy link
Contributor Author

@xzhseh xzhseh Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the special check here, is because the lateral context typically contains the same stuff of the new bound context, when binding the final statement for a rcte - that's why we don't what to return error for the case like this.

with recursive t(n) as (
values (1)
union all
select n + 1 from t where n < 100
)
select * from t;
______________^ here

ps. need further review, cc @chenzl25 @TennyZhuang.
pss. I'll merge this pr first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here: #16282

@xzhseh xzhseh merged commit 78a2422 into bind_rcte Apr 3, 2024
5 of 15 checks passed
@xzhseh xzhseh deleted the bind_rcte_tmp_111 branch April 3, 2024 18:38
@xzhseh xzhseh mentioned this pull request Apr 3, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants