-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: (with Semi-join)support equal condition in temporal filter with complex expression #14098
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #14098 +/- ##
=======================================
Coverage 67.90% 67.90%
=======================================
Files 1554 1554
Lines 268655 268666 +11
=======================================
+ Hits 182417 182450 +33
+ Misses 86238 86216 -22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
- name: Temporal filter with equal condition | ||
sql: |- | ||
create table t1 (ts timestamp with time zone); | ||
select * from t1 where date_trunc('week', now()) = date_trunc('week',ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the condition changes, it looks like we will generate a large amount of data within a single barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we do not ban it for other hash join or dynamic filter with the scalar subquery, so at least we need to support it and solve those common issue in another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. This condition somehow looks like a nested loop join or no partition over window cc @fuyufjh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words if we allow this query, we should allow this too...
dev=> explain create materialized view mv as
select * from t1 where date_trunc('week', ts) = (select max(date_trunc('week', ts)) from t1);
NOTICE: Your session timezone is UTC. It was used in the interpretation of timestamps and dates in your query. If this is unintended, change your timezone to match that of your data's with `set timezone = [timezone]` or rewrite your query with an explicit timezone conversion, e.g. with `AT TIME ZONE`.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
StreamMaterialize { columns: [ts, t1._row_id(hidden), $expr1(hidden)], stream_key: [t1._row_id, $expr1], pk_columns: [t1._row_id, $expr1], pk_conflict: NoCheck }
└─StreamExchange { dist: HashShard(t1._row_id, $expr1) }
└─StreamHashJoin { type: Inner, predicate: $expr1 = max(max($expr2)) }
├─StreamExchange { dist: HashShard($expr1) }
│ └─StreamProject { exprs: [t1.ts, DateTrunc('week':Varchar, t1.ts, 'UTC':Varchar) as $expr1, t1._row_id] }
│ └─StreamTableScan { table: t1, columns: [ts, _row_id] }
└─StreamExchange { dist: HashShard(max(max($expr2))) }
└─StreamProject { exprs: [max(max($expr2))] }
└─StreamSimpleAgg { aggs: [max(max($expr2)), count] }
└─StreamExchange { dist: Single }
└─StreamHashAgg { group_key: [$expr3], aggs: [max($expr2), count] }
└─StreamProject { exprs: [DateTrunc('week':Varchar, t1.ts, 'UTC':Varchar) as $expr2, t1._row_id, Vnode(t1._row_id) as $expr3] }
└─StreamTableScan { table: t1, columns: [ts, _row_id] }
(13 rows)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should support this... Even though it's very likely to become a performance issue... We may warn the users explicitly.
But I am confused by the title: "temporal filter with complex expression". Here you actually run the query with HashJoin (left-semi)
, and it has nothing to do with temporal filter, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in the user's view, what he write in the SQL is a "temporal filter". SemiJoin or dynamic filter are just internal implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Additionally, for "temporal filter", I would also expect range conditions e.g.
select * from t1 where date_trunc('minute', now()) - interval '5 minutes' <= date_trunc('minute', ts);
Shall we unify them with unique implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Additionally, for "temporal filter", I would also expect range conditions e.g.
select * from t1 where date_trunc('minute', now()) - interval '5 minutes' <= date_trunc('minute', ts);
Shall we unify them with unique implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you plz fill the PR description? So that later developers can understand what the PR was doing.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
a part of #14084
This PR conver the equal temporal filter to SemiJoin and support expression on the
now()
if the expression can derived a watermark that can be used by StreamHashjoinExecutor for state cleaning.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.