Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Use ORDER BY as part of MV's distribution key #19321

Open
xxchan opened this issue Nov 8, 2024 · 1 comment
Open

Discussion: Use ORDER BY as part of MV's distribution key #19321

xxchan opened this issue Nov 8, 2024 · 1 comment

Comments

@xxchan
Copy link
Member

xxchan commented Nov 8, 2024

context: https://risingwave-labs.slack.com/archives/C034TRPKN1F/p1731041466238309?thread_ts=1731032250.800469&cid=C034TRPKN1F

Some nodes:

  • dist_key just need to be subset of pk, not prefix.
  • MV pk is order_by | stream_key
  • before feat(optimizer): change stream join mv distribution key #13022, MV dist_key is upstream_dist_key (subset of stream_key). And we changed to whole stream_key.
  • For Index, we support distributed by. Since index is essentially the same as MV, there seem to be no reason not to also support it for MV
@github-actions github-actions bot added this to the release-2.2 milestone Nov 8, 2024
@xxchan xxchan changed the title Discussion: Use ORDER BY as MV's distribution key Discussion: Use ORDER BY as part of MV's distribution key Nov 8, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 13, 2024

An example to show what the dist key looks like now on main:

create table test_dist (v1 int, v2 int, v3 int, primary key (v1, v2));
create table test_dist2 (v1 int, v2 int, v3 int, primary key (v1, v2));
create table test_dist3 (v1 int, v2 int, v3 int, primary key (v1, v2));


describe test_dist;
-- primary key v1, v2
-- distribution key v1, v2

create materialized view test_dist_mv as select * from test_dist order by v1, v3;

describe test_dist_mv;
-- primary key v1, v3, v2
-- distribution key v1, v2

drop MATERIALIZED view test_join_dist;

create MATERIALIZED view test_join_dist as
select t1.v3 as a, t2.v3 as b, t3.v3 as c
from test_dist as t1
join test_dist2 as t2
on t1.v1 = t2.v1
join test_dist3 as t3
on t2.v1 = t3.v1;

describe test_join_dist;
-- primary key test_dist.v1, test_dist.v2, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution test_dist.v1, test_dist.v2, test_dist2.v2, test_dist3.v2, test_dist2.v1


create MATERIALIZED view test_join_dist2 as
select t1.v3 as a, t2.v3 as b, t3.v3 as c
from test_dist as t1
join test_dist2 as t2
on t1.v1 = t2.v1
join test_dist3 as t3
on t2.v1 = t3.v1
order by t1.v3, t1.v2;

describe test_join_dist2;
-- primary key test_dist.v3, test_dist.v2, test_dist.v1, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution key test_dist2.v1

ideally the dist key for test_dist and dist_join_dist2 can be:

describe test_dist_mv;
-- primary key v1, v3, v2
-- distribution key v1, v3, ...


describe test_join_dist2;
-- primary key test_dist.v3, test_dist.v2, test_dist.v1, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution key test_dist.v3, test_dist.v2, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants