fix(optimizer): reduce expr tree depth when merge logical operations #17342

BugenZhao · 2024-06-19T07:46:26Z

Signed-off-by: Bugen Zhao [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

When creating an expression with a series of logical operations OR or AND together, instead of chaining it simply with fold or reduce, try to build a perfect binary tree.

There are no significant benefits to doing this, but it reduces the depth of the expression tree, making it more readable and less likely to cause a stack overflow (see #17224 for the background).

A real customer example for stack overflow is a complicated full outer join after #8520, where we apply a NotNull filter on all key columns chained with OR. Added as a test case.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao · 2024-06-19T07:47:15Z

src/frontend/src/expr/utils.rs

+/// Merge the given expressions by the a logical operation.
+///
+/// The `op` must be commutative and associative, typically `And` or `Or`.
+pub(super) fn merge_expr_by_logical<I>(exprs: I, op: ExprType, identity_elem: ExprImpl) -> ExprImpl


The core change is in this function. Also provide wrappers like ExprImpl::and, ExprImpl::or to adopt it more.

Signed-off-by: Bugen Zhao <[email protected]>

e2e_test/streaming/bug_fixes/stack_overflow_17342.slt

Co-authored-by: Dylan <[email protected]>

chenzl25 · 2024-06-19T09:22:24Z

I am confused by Postgres's behavior of expression evaluation order. @xiangjinwu Do you have any idea? Any potential risk if we reorder the expression among conjunctions or disjunctions?
Postgres result:

dev=#  select false and (1/0 is null);
 ?column?
----------
 f
(1 row)

dev=# select (1/0 is null) false and ;
ERROR:  division by zero

dev=# \timing
Timing is on.
tpch=# select  (pg_sleep(20) is null) and false;
 ?column?
----------
 f
(1 row)

Time: 0,470 ms
dev=# select  false and (pg_sleep(20) is null);
 ?column?
----------
 f
(1 row)

Time: 0,584 ms

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao · 2024-06-19T09:39:58Z

Any potential risk if we reorder the expression among conjunctions or disjunctions?

I think we're already doing some reordering during Condition::simplify. Specific to this PR, the post-order traversal during the evaluation of binary tree remains the same as the left-deep tree.

xiangjinwu · 2024-06-19T09:40:13Z

~~((A and B) and C) and D is NOT equivalent to (A and B) and (C and D), if short-circuiting is taken into account.~~
We could try implementing and/or as vararg exprs rather than binary exprs, to avoid a deep tree.
Alternatively, we could just give up on the short-circuiting / evaluation order guarantee for and/or. PostgreSQL only provides it for case when/coalesce (which we already implemented as vararg and there is no depth problem) but not and/or.

related: #6202

chenzl25

I think we're already doing some reordering during Condition::simplify. Specific to this PR, the post-order traversal during the evaluation of binary tree remains the same as the left-deep tree.

I see.

Alternatively, we could just give up on the short-circuiting / evaluation order guarantee for and/or.

Yes, give up the guarantee LGTM.

xxchan · 2024-06-19T12:01:52Z

e2e_test/streaming/bug_fixes/stack_overflow_17342.slt

+
+# This query used to overflow the stack during optimization as it generated a left-deep tree
+# of `OR xx IS NOT NULL` expression in the filter after each full outer join.
+skipif madsim


link #17347

I mean why it doesn't work in madsim?

risingwave/src/common/src/util/recursive.rs

Lines 98 to 102 in 14b2f9c

if cfg!(madsim) {

f() // madsim does not support stack growth

} else {

stacker::maybe_grow(RED_ZONE, STACK_SIZE, f)

}

thread '<unnamed>' panicked at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:415:13: assertion `left == right` failed left: -1 right: 0 stack backtrace: 0: rust_begin_unwind at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/panicking.rs:645:5 1: core::panicking::panic_fmt at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:72:14 2: core::panicking::assert_failed_inner at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:342:17 3: core::panicking::assert_failed at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:297:5 4: stacker::guess_os_stack_limit 5: stacker::STACK_LIMIT::__init at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:119:9 6: stacker::STACK_LIMIT::__getit::{{closure}} at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:99:25 7: std::sys::common::thread_local::lazy::LazyKeyInner<T>::initialize at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/mod.rs:54:25 8: std::sys::common::thread_local::fast_local::Key<T>::try_initialize at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:190:27 9: stacker::STACK_LIMIT::__getit at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/sys/common/thread_local/fast_local.rs:91:21 10: std::thread::local::LocalKey<T>::try_with at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/thread/local.rs:269:32 11: std::thread::local::LocalKey<T>::with at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/thread/local.rs:246:9 12: stacker::get_stack_limit at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:125:17 13: stacker::remaining_stack at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:92:5 14: stacker::maybe_grow at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:50:30 15: risingwave_common::util::recursive::Tracker::recurse at ./src/common/src/util/recursive.rs:95:9

unsafe fn guess_os_stack_limit() -> Option<usize> { let mut attr = std::mem::MaybeUninit::<libc::pthread_attr_t>::uninit(); assert_eq!(libc::pthread_attr_init(attr.as_mut_ptr()), 0); assert_eq!(libc::pthread_getattr_np(libc::pthread_self(), attr.as_mut_ptr()), 0); let mut stackaddr = std::ptr::null_mut(); let mut stacksize = 0; assert_eq!(libc::pthread_attr_getstack( attr.as_ptr(), &mut stackaddr, &mut stacksize ), 0); assert_eq!(libc::pthread_attr_destroy(attr.as_mut_ptr()), 0); Some(stackaddr as usize) }

Some assertion failed. Possibly due to libc overridden. Haven't dived into it since it's not a big deal.

src/frontend/src/expr/utils.rs

Co-authored-by: xxchan <[email protected]>

Signed-off-by: Bugen Zhao <[email protected]>

fuyufjh

Approve for the Cargo.toml change

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao · 2024-06-20T09:48:37Z

e2e_test/streaming/bug_fixes/stack_overflow_17342.slt

+
+# This query used to overflow the stack during optimization as it generated a left-deep tree
+# of `OR xx IS NOT NULL` expression in the filter after each full outer join.
+skipif madsim


link #17347

BugenZhao · 2024-06-20T09:49:36Z

src/tests/simulation/Cargo.toml

@@ -50,7 +50,7 @@ risingwave_sqlsmith = { workspace = true }
 serde = "1.0.188"
 serde_derive = "1.0.188"
 serde_json = "1.0.107"
-sqllogictest = "0.20"
+sqllogictest = "0.20.5"


Bumped sqllogictest library to include risinglightdb/sqllogictest-rs#215 to make skipif madsim work.

BugenZhao added 4 commits June 19, 2024 15:01

fix(optimizer): reduce expr tree depth when merge logical operations

c148f06

Signed-off-by: Bugen Zhao <[email protected]>

refactor api

30032c7

Signed-off-by: Bugen Zhao <[email protected]>

apply to more

b2ea565

Signed-off-by: Bugen Zhao <[email protected]>

try leveled

2bbe2c9

Signed-off-by: Bugen Zhao <[email protected]>

github-actions bot added the type/fix Bug fix label Jun 19, 2024

BugenZhao requested review from chenzl25, fuyufjh and st1page June 19, 2024 08:21

BugenZhao commented Jun 19, 2024

View reviewed changes

BugenZhao added 2 commits June 19, 2024 16:44

add e2e test for customer case

e042798

Signed-off-by: Bugen Zhao <[email protected]>

apply to null safe eq

447d3c0

Signed-off-by: Bugen Zhao <[email protected]>

chenzl25 reviewed Jun 19, 2024

View reviewed changes

e2e_test/streaming/bug_fixes/stack_overflow_17342.slt Outdated Show resolved Hide resolved

Update stack_overflow_17342.slt

ce8d29e

Co-authored-by: Dylan <[email protected]>

fuyufjh mentioned this pull request Jun 19, 2024

Division by zero exception #17131

Closed

chenzl25 requested a review from xiangjinwu June 19, 2024 09:22

skip in madsim

7f293c4

Signed-off-by: Bugen Zhao <[email protected]>

chenzl25 approved these changes Jun 19, 2024

View reviewed changes

xxchan reviewed Jun 19, 2024

View reviewed changes

src/frontend/src/expr/utils.rs Outdated Show resolved Hide resolved

BugenZhao and others added 2 commits June 20, 2024 11:33

Update src/frontend/src/expr/utils.rs

782975f

Co-authored-by: xxchan <[email protected]>

bump sqllogictest

b51a40c

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao requested a review from a team as a code owner June 20, 2024 04:58

fuyufjh approved these changes Jun 20, 2024

View reviewed changes

use released version sqllogictest

14b2f9c

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao commented Jun 20, 2024

View reviewed changes

BugenZhao added this pull request to the merge queue Jun 21, 2024

Merged via the queue into main with commit 8a4bd3f Jun 21, 2024
29 of 30 checks passed

BugenZhao deleted the bz/avoid-deep-tree-full-outer-join branch June 21, 2024 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(optimizer): reduce expr tree depth when merge logical operations #17342

fix(optimizer): reduce expr tree depth when merge logical operations #17342

BugenZhao commented Jun 19, 2024 •

edited

Loading

BugenZhao Jun 19, 2024

chenzl25 commented Jun 19, 2024 •

edited

Loading

BugenZhao commented Jun 19, 2024 •

edited

Loading

xiangjinwu commented Jun 19, 2024 •

edited

Loading

chenzl25 left a comment

xxchan Jun 19, 2024

BugenZhao Jun 20, 2024

xxchan Jun 20, 2024

BugenZhao Jun 20, 2024 •

edited

Loading

BugenZhao Jun 20, 2024

fuyufjh left a comment

BugenZhao Jun 20, 2024

BugenZhao Jun 20, 2024

	if cfg!(madsim) {
	f() // madsim does not support stack growth
	} else {
	stacker::maybe_grow(RED_ZONE, STACK_SIZE, f)
	}

fix(optimizer): reduce expr tree depth when merge logical operations #17342

fix(optimizer): reduce expr tree depth when merge logical operations #17342

Conversation

BugenZhao commented Jun 19, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

BugenZhao Jun 19, 2024

Choose a reason for hiding this comment

chenzl25 commented Jun 19, 2024 • edited Loading

BugenZhao commented Jun 19, 2024 • edited Loading

xiangjinwu commented Jun 19, 2024 • edited Loading

chenzl25 left a comment

Choose a reason for hiding this comment

xxchan Jun 19, 2024

Choose a reason for hiding this comment

BugenZhao Jun 20, 2024

Choose a reason for hiding this comment

xxchan Jun 20, 2024

Choose a reason for hiding this comment

BugenZhao Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

BugenZhao Jun 20, 2024

Choose a reason for hiding this comment

fuyufjh left a comment

Choose a reason for hiding this comment

BugenZhao Jun 20, 2024

Choose a reason for hiding this comment

BugenZhao Jun 20, 2024

Choose a reason for hiding this comment

BugenZhao commented Jun 19, 2024 •

edited

Loading

chenzl25 commented Jun 19, 2024 •

edited

Loading

BugenZhao commented Jun 19, 2024 •

edited

Loading

xiangjinwu commented Jun 19, 2024 •

edited

Loading

BugenZhao Jun 20, 2024 •

edited

Loading