Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(optimizer): reduce expr tree depth when merge logical operations #17342

Merged
merged 11 commits into from
Jun 21, 2024

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Jun 19, 2024

Signed-off-by: Bugen Zhao [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

When creating an expression with a series of logical operations OR or AND together, instead of chaining it simply with fold or reduce, try to build a perfect binary tree.

There are no significant benefits to doing this, but it reduces the depth of the expression tree, making it more readable and less likely to cause a stack overflow (see #17224 for the background).

A real customer example for stack overflow is a complicated full outer join after #8520, where we apply a NotNull filter on all key columns chained with OR. Added as a test case.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@github-actions github-actions bot added the type/fix Bug fix label Jun 19, 2024
/// Merge the given expressions by the a logical operation.
///
/// The `op` must be commutative and associative, typically `And` or `Or`.
pub(super) fn merge_expr_by_logical<I>(exprs: I, op: ExprType, identity_elem: ExprImpl) -> ExprImpl
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change is in this function. Also provide wrappers like ExprImpl::and, ExprImpl::or to adopt it more.

@chenzl25
Copy link
Contributor

chenzl25 commented Jun 19, 2024

I am confused by Postgres's behavior of expression evaluation order. @xiangjinwu Do you have any idea? Any potential risk if we reorder the expression among conjunctions or disjunctions?
Postgres result:

dev=#  select false and (1/0 is null);
 ?column?
----------
 f
(1 row)

dev=# select (1/0 is null) false and ;
ERROR:  division by zero
dev=# \timing
Timing is on.
tpch=# select  (pg_sleep(20) is null) and false;
 ?column?
----------
 f
(1 row)

Time: 0,470 ms
dev=# select  false and (pg_sleep(20) is null);
 ?column?
----------
 f
(1 row)

Time: 0,584 ms

@chenzl25 chenzl25 requested a review from xiangjinwu June 19, 2024 09:22
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao
Copy link
Member Author

BugenZhao commented Jun 19, 2024

Any potential risk if we reorder the expression among conjunctions or disjunctions?

I think we're already doing some reordering during Condition::simplify. Specific to this PR, the post-order traversal during the evaluation of binary tree remains the same as the left-deep tree.

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jun 19, 2024

  • ((A and B) and C) and D is NOT equivalent to (A and B) and (C and D), if short-circuiting is taken into account.
  • We could try implementing and/or as vararg exprs rather than binary exprs, to avoid a deep tree.
  • Alternatively, we could just give up on the short-circuiting / evaluation order guarantee for and/or. PostgreSQL only provides it for case when/coalesce (which we already implemented as vararg and there is no depth problem) but not and/or.

related: #6202

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're already doing some reordering during Condition::simplify. Specific to this PR, the post-order traversal during the evaluation of binary tree remains the same as the left-deep tree.

I see.

Alternatively, we could just give up on the short-circuiting / evaluation order guarantee for and/or.

Yes, give up the guarantee LGTM.


# This query used to overflow the stack during optimization as it generated a left-deep tree
# of `OR xx IS NOT NULL` expression in the filter after each full outer join.
skipif madsim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link #17347

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why it doesn't work in madsim?

Copy link
Member Author

@BugenZhao BugenZhao Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if cfg!(madsim) {
f() // madsim does not support stack growth
} else {
stacker::maybe_grow(RED_ZONE, STACK_SIZE, f)
}

thread '<unnamed>' panicked at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:415:13:
assertion `left == right` failed
  left: -1
 right: 0
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:342:17
   3: core::panicking::assert_failed
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:297:5
   4: stacker::guess_os_stack_limit
   5: stacker::STACK_LIMIT::__init
             at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:119:9
   6: stacker::STACK_LIMIT::__getit::{{closure}}
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:99:25
   7: std::sys::common::thread_local::lazy::LazyKeyInner<T>::initialize
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/mod.rs:54:25
   8: std::sys::common::thread_local::fast_local::Key<T>::try_initialize
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:190:27
   9: stacker::STACK_LIMIT::__getit
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:91:21
  10: std::thread::local::LocalKey<T>::try_with
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/thread/local.rs:269:32
  11: std::thread::local::LocalKey<T>::with
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/thread/local.rs:246:9
  12: stacker::get_stack_limit
             at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:125:17
  13: stacker::remaining_stack
             at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:92:5
  14: stacker::maybe_grow
             at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:50:30
  15: risingwave_common::util::recursive::Tracker::recurse
             at ./src/common/src/util/recursive.rs:95:9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe fn guess_os_stack_limit() -> Option<usize> {
    let mut attr = std::mem::MaybeUninit::<libc::pthread_attr_t>::uninit();
    assert_eq!(libc::pthread_attr_init(attr.as_mut_ptr()), 0);
    assert_eq!(libc::pthread_getattr_np(libc::pthread_self(),
                                        attr.as_mut_ptr()), 0);
    let mut stackaddr = std::ptr::null_mut();
    let mut stacksize = 0;
    assert_eq!(libc::pthread_attr_getstack(
        attr.as_ptr(), &mut stackaddr, &mut stacksize
    ), 0);
    assert_eq!(libc::pthread_attr_destroy(attr.as_mut_ptr()), 0);
    Some(stackaddr as usize)
}

Some assertion failed. Possibly due to libc overridden. Haven't dived into it since it's not a big deal.

BugenZhao and others added 2 commits June 20, 2024 11:33
@BugenZhao BugenZhao requested a review from a team as a code owner June 20, 2024 04:58
Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for the Cargo.toml change


# This query used to overflow the stack during optimization as it generated a left-deep tree
# of `OR xx IS NOT NULL` expression in the filter after each full outer join.
skipif madsim
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link #17347

@@ -50,7 +50,7 @@ risingwave_sqlsmith = { workspace = true }
serde = "1.0.188"
serde_derive = "1.0.188"
serde_json = "1.0.107"
sqllogictest = "0.20"
sqllogictest = "0.20.5"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped sqllogictest library to include risinglightdb/sqllogictest-rs#215 to make skipif madsim work.

@BugenZhao BugenZhao added this pull request to the merge queue Jun 21, 2024
Merged via the queue into main with commit 8a4bd3f Jun 21, 2024
29 of 30 checks passed
@BugenZhao BugenZhao deleted the bz/avoid-deep-tree-full-outer-join branch June 21, 2024 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants