feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026

chenzl25 · 2024-02-06T07:27:54Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Link perf: improve tpc-h q20 performance (single-topic) #14797
Support PullUpCorrelatedPredicateAggRule to unnest a common subquery pattern in tpch. Pull up correlated predicates from the right agg side of Apply to the on clause of Join.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

chenzl25 · 2024-02-06T07:44:00Z

src/frontend/planner_test/tests/testdata/output/tpch_variant.yaml

-      └─LogicalProject { exprs: [ps_partkey, ps_suppkey, (0.5:Decimal * sum(l_quantity)) as $expr2] }
-        └─LogicalAgg { group_key: [ps_partkey, ps_suppkey], aggs: [sum(l_quantity)] }
-          └─LogicalJoin { type: LeftOuter, on: IsNotDistinctFrom(ps_partkey, l_partkey) AND IsNotDistinctFrom(ps_suppkey, l_suppkey), output: [ps_partkey, ps_suppkey, l_quantity] }
-            ├─LogicalAgg { group_key: [ps_partkey, ps_suppkey], aggs: [] }
-            │ └─LogicalJoin { type: LeftSemi, on: (ps_partkey = p_partkey), output: [ps_partkey, ps_suppkey] }
-            │   ├─LogicalSource { source: partsupp, columns: [ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, _row_id], time_range: (Unbounded, Unbounded) }
-            │   └─LogicalProject { exprs: [p_partkey] }
-            │     └─LogicalSource { source: part, columns: [p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment, _row_id], time_range: (Unbounded, Unbounded) }
-            └─LogicalProject { exprs: [l_partkey, l_suppkey, l_quantity] }
-              └─LogicalFilter { predicate: IsNotNull(l_partkey) AND IsNotNull(l_suppkey) }
-                └─LogicalSource { source: lineitem, columns: [l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment, _row_id], time_range: (Unbounded, Unbounded) }
+      └─LogicalProject { exprs: [(0.5:Decimal * sum(l_quantity)) as $expr2, l_partkey, l_suppkey] }
+        └─LogicalAgg { group_key: [l_partkey, l_suppkey], aggs: [sum(l_quantity)] }
+          └─LogicalSource { source: lineitem, columns: [l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment, _row_id], time_range: (Unbounded, Unbounded) }


That is the TPCH Q20 we want to optimize in this PR.

chenzl25 · 2024-02-06T07:45:09Z

src/frontend/src/optimizer/rule/pull_up_correlated_predicate_agg_rule.rs

+/// Pull up correlated predicates from the right agg side of Apply to the `on` clause of Join.
+///
+/// Before:
+///
+/// ```text
+///     LogicalApply
+///    /            \
+///  LHS          Project
+///                 |
+///                Agg [group by nothing]
+///                 |
+///               Project
+///                 |
+///               Filter [correlated_input_ref(yyy) = xxx]
+/// ```
+///
+/// After:
+///
+/// ```text
+///     LogicalApply [yyy = xxx]
+///    /            \
+///  LHS          Project
+///                 |
+///                Agg [group by xxx]
+///                 |
+///               Project
+///                 |
+///               Filter
+/// ```


A graph to explain how this rule works. It tries to pull up the correlated expr from the filter to the apply.

lmatz

new q20 LGTM, thanks
better than flink as this one is now more bushy while Flink's is left deep and one-level deeper

st1page

LGTM!

st1page · 2024-02-06T09:13:21Z

src/frontend/src/optimizer/rule/pull_up_correlated_predicate_agg_rule.rs

+        // It could be too restrictive to require the group key to be empty. We can relax this in the future if necessary.
+        if !group_key.is_empty() {
+            return None;
+        }


is simply adding tge correlated key(xxx) into the group key correct?

No, simply adding those group keys is not correct. We need to handle the new_agg parent input reference in a more sophisticated way instead of the current simple shifting.

st1page · 2024-02-06T09:17:27Z

src/frontend/src/optimizer/rule/pull_up_correlated_predicate_agg_rule.rs

+        // If there is a count aggregate, bail out and leave for general subquery unnesting to deal.
+        if agg_calls
+            .iter()
+            .any(|agg_call| agg_call.agg_kind == AggKind::Count)
+        {
+            return None;
+        };


why? is count very special here?

Yes, here is the corner case. I didn't come up with an idea of how to deal with it now. If you have some ideas, feel free to share. It is also related to the TPCH Q17.

create table t (a int, b int); create table t2 (c int, d int); insert into t values (1, 2); flush; select * from t where t.a > (select count(*) from t2 where b = d);

When group by is empty, count would return 0 instead of null.

Yes, I can not find a way to rewrite it... But it should ok for Q17 because the aggregator is an AVG, we should choose one to optimize #14799

maintain the AVG and other similar aggregator in plan node and delay their rewriting(currently RW rewrites it when creating the Agg plan node)

consider the project-agg together in the rule later

chenzl25 added 2 commits February 6, 2024 13:49

support PullUpCorrelatedPredicateAggRule

853722f

forbid count agg in PullUpCorrelatedPredicateAggRule

f792457

github-actions bot added the type/feature label Feb 6, 2024

chenzl25 commented Feb 6, 2024

View reviewed changes

chenzl25 requested review from stdrc, xiangjinwu, st1page, lmatz and fuyufjh February 6, 2024 07:45

fmt

1fa68b0

lmatz approved these changes Feb 6, 2024

View reviewed changes

st1page approved these changes Feb 6, 2024

View reviewed changes

add comments

c709cb4

chenzl25 enabled auto-merge February 7, 2024 03:59

chenzl25 added this pull request to the merge queue Feb 7, 2024

Merged via the queue into main with commit 8e3c526 Feb 7, 2024
26 of 27 checks passed

chenzl25 deleted the dylan/support_tpch_subquery_unnest branch February 7, 2024 04:47

yezizp2012 pushed a commit that referenced this pull request Feb 7, 2024

feat(optimizer): support PullUpCorrelatedPredicateAggRule (#15026)

abc633e

st1page mentioned this pull request Feb 8, 2024

2024-02-07 nexmark performance degradation #15054

Closed

lmatz mentioned this pull request Feb 26, 2024

better query rewrite for TPC-H q17 #15247

Closed

st1page mentioned this pull request Mar 1, 2024

feat(optimizer): improve pull_up_correlated_predicate_agg_rule to optimize TPCH Q17 #15383

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026

feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026

chenzl25 commented Feb 6, 2024 •

edited by lmatz

Loading

chenzl25 Feb 6, 2024 •

edited

Loading

chenzl25 Feb 6, 2024

lmatz left a comment

st1page left a comment

st1page Feb 6, 2024

chenzl25 Feb 6, 2024

st1page Feb 6, 2024

chenzl25 Feb 6, 2024

chenzl25 Feb 6, 2024

st1page Feb 7, 2024 •

edited by lmatz

Loading

feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026

feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026

Conversation

chenzl25 commented Feb 6, 2024 • edited by lmatz Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

chenzl25 Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

chenzl25 Feb 6, 2024

Choose a reason for hiding this comment

lmatz left a comment

Choose a reason for hiding this comment

st1page left a comment

Choose a reason for hiding this comment

st1page Feb 6, 2024

Choose a reason for hiding this comment

chenzl25 Feb 6, 2024

Choose a reason for hiding this comment

st1page Feb 6, 2024

Choose a reason for hiding this comment

chenzl25 Feb 6, 2024

Choose a reason for hiding this comment

chenzl25 Feb 6, 2024

Choose a reason for hiding this comment

st1page Feb 7, 2024 • edited by lmatz Loading

Choose a reason for hiding this comment

chenzl25 commented Feb 6, 2024 •

edited by lmatz

Loading

chenzl25 Feb 6, 2024 •

edited

Loading

st1page Feb 7, 2024 •

edited by lmatz

Loading