-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): support PullUpCorrelatedPredicateAggRule #15026
Conversation
└─LogicalProject { exprs: [ps_partkey, ps_suppkey, (0.5:Decimal * sum(l_quantity)) as $expr2] } | ||
└─LogicalAgg { group_key: [ps_partkey, ps_suppkey], aggs: [sum(l_quantity)] } | ||
└─LogicalJoin { type: LeftOuter, on: IsNotDistinctFrom(ps_partkey, l_partkey) AND IsNotDistinctFrom(ps_suppkey, l_suppkey), output: [ps_partkey, ps_suppkey, l_quantity] } | ||
├─LogicalAgg { group_key: [ps_partkey, ps_suppkey], aggs: [] } | ||
│ └─LogicalJoin { type: LeftSemi, on: (ps_partkey = p_partkey), output: [ps_partkey, ps_suppkey] } | ||
│ ├─LogicalSource { source: partsupp, columns: [ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, _row_id], time_range: (Unbounded, Unbounded) } | ||
│ └─LogicalProject { exprs: [p_partkey] } | ||
│ └─LogicalSource { source: part, columns: [p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment, _row_id], time_range: (Unbounded, Unbounded) } | ||
└─LogicalProject { exprs: [l_partkey, l_suppkey, l_quantity] } | ||
└─LogicalFilter { predicate: IsNotNull(l_partkey) AND IsNotNull(l_suppkey) } | ||
└─LogicalSource { source: lineitem, columns: [l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment, _row_id], time_range: (Unbounded, Unbounded) } | ||
└─LogicalProject { exprs: [(0.5:Decimal * sum(l_quantity)) as $expr2, l_partkey, l_suppkey] } | ||
└─LogicalAgg { group_key: [l_partkey, l_suppkey], aggs: [sum(l_quantity)] } | ||
└─LogicalSource { source: lineitem, columns: [l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment, _row_id], time_range: (Unbounded, Unbounded) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the TPCH Q20 we want to optimize in this PR.
/// Pull up correlated predicates from the right agg side of Apply to the `on` clause of Join. | ||
/// | ||
/// Before: | ||
/// | ||
/// ```text | ||
/// LogicalApply | ||
/// / \ | ||
/// LHS Project | ||
/// | | ||
/// Agg [group by nothing] | ||
/// | | ||
/// Project | ||
/// | | ||
/// Filter [correlated_input_ref(yyy) = xxx] | ||
/// ``` | ||
/// | ||
/// After: | ||
/// | ||
/// ```text | ||
/// LogicalApply [yyy = xxx] | ||
/// / \ | ||
/// LHS Project | ||
/// | | ||
/// Agg [group by xxx] | ||
/// | | ||
/// Project | ||
/// | | ||
/// Filter | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A graph to explain how this rule works. It tries to pull up the correlated expr from the filter to the apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new q20 LGTM, thanks
better than flink as this one is now more bushy while Flink's is left deep and one-level deeper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
// It could be too restrictive to require the group key to be empty. We can relax this in the future if necessary. | ||
if !group_key.is_empty() { | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is simply adding tge correlated key(xxx
) into the group key correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, simply adding those group keys is not correct. We need to handle the new_agg
parent input reference in a more sophisticated way instead of the current simple shifting.
// If there is a count aggregate, bail out and leave for general subquery unnesting to deal. | ||
if agg_calls | ||
.iter() | ||
.any(|agg_call| agg_call.agg_kind == AggKind::Count) | ||
{ | ||
return None; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? is count very special here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here is the corner case. I didn't come up with an idea of how to deal with it now. If you have some ideas, feel free to share. It is also related to the TPCH Q17.
create table t (a int, b int);
create table t2 (c int, d int);
insert into t values (1, 2);
flush;
select * from t where t.a > (select count(*) from t2 where b = d);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When group by is empty, count would return 0 instead of null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can not find a way to rewrite it... But it should ok for Q17 because the aggregator is an AVG, we should choose one to optimize #14799
- maintain the AVG and other similar aggregator in plan node and delay their rewriting(currently RW rewrites it when creating the Agg plan node)
- consider the project-agg together in the rule later
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
PullUpCorrelatedPredicateAggRule
to unnest a common subquery pattern in tpch. Pull up correlated predicates from the right agg side of Apply to theon
clause of Join.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.