-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(streaming): plan asof join #18683
Conversation
…hao/plan-asof-join
…hao/plan-asof-join
Can you add a PR description about semantics of ASOF JOIN, key implementation details etc... |
Please add some description, thx. |
create table t2 (v1 int, v2 int, v3 int primary key); | ||
|
||
statement ok | ||
create materialized view mv1 as SELECT t1.v1 t1_v1, t1.v2 t1_v2, t1.v3 t1_v3, t2.v1 t2_v1, t2.v2 t2_v2, t2.v3 t2_v3 FROM t1 ASOF LEFT JOIN t2 ON t1.v1 = t2.v1 and t1.v2 > t2.v2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use "LEFT ASOF JOIN" instead of ASOF LEFT JOIN
, because clickhouse uses this syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But duckdb use ASOF LEFT JOIN
https://duckdb.org/docs/guides/sql_features/asof_join.html. @st1page any insight on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, then I have no preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both can be easily supported, so no need to block merging on this 😆 . We can always support the other syntax later.
Can we enhance the example for the release note? The current example contains only one id which is not enough to illustrate the ASOF join behavior. BTW, for inner and outer join, I think we can use the same example to illustrate the difference as how duckdb document does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Could we also provide the output of the query in the release note? It will be referenced by users as an example.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM. Would you please add more test for those unsupported cases?
I found it will panic when their is only one inequal condition but no equal condition. We expect a error here.
Panicked when handling the request: assertion failed: predicate.has_eq()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
use crate::utils::ColIndexMappingRewriteExt; | ||
use crate::PlanRef; | ||
|
||
pub struct StreamJoinCommon; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh seems a module is enough.
if left_input_ref.index() < left_input_len && right_input_ref.index() >= left_input_len | ||
{ | ||
Ok(AsOfJoinDesc { | ||
left_idx: left_input_ref.index() as u32, | ||
right_idx: (right_input_ref.index() - left_input_len) as u32, | ||
inequality_type: Self::expr_type_to_comparison_type(expr_type)?.into(), | ||
}) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the expression must be like left_col [<|<=|>|>=] right_col
rather than right_col [<|<=|>|>=] left_col
? Here left/right_col
means a column from left/right side of the join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both should be valid expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
#17765
implement parser to parse new join types
AsOfInner
andAsofLeftOuter
.implement
StreamAsOfJoin
plan node, which should be created fromLogicalJoin::to_stream_asof_join
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
ASOF JOIN
Having a set of event data, to find and join the nearest record in a reference table by the event time or any ordered properties is call
ASOF JOIN
.Inner join
CREATE MATERIALIZED VIEW mv AS SELECT t1.v1 t1_v1 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 and t1.v2 <= t2.v2;
Outer join
CREATE MATERIALIZED VIEW mv AS SELECT t1.v1 t1_v1 FROM t1 ASOF LEFT JOIN t2 ON t1.v1 = t2.v1 and t1.v2 <= t2.v2;
Constraint
The join condition must contains at least 1 equal condition (e.g.
t1.v1 = t2.v1
) and exactly 1 inequality condition (>=, >, <=, <). The inequality condition applies for all data types that support inequality comparison while a time related type is more commonly used.Note: only streaming AsOf join is supported by now. The batch asof join is WIP.
Example Scenario:
We have two tables:
We want to join the stock prices with the nearest preceding market sentiment for each stock price based on time.
Tables:
stock_prices
market_data
We can use a
ASOF JOIN
to find the latest matching record inmarket_data
where themarket_time
is less than or equal to thestock_time
:We can use a
ASOF LEFT JOIN
to output records in the left table that have no matches in the right table.Result:
explanation:
TSLA and AMZN have matching records in market_data, so they show the closest preceding sentiment.
GOOG has no corresponding data in market_data, so the sentiment column is NULL.
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.