Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): support agg group by simplify rule #12349

Merged
merged 2 commits into from
Sep 18, 2023

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Sep 15, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Example:

before:
select count(*) from sbtest1 group by id, k, c;

use functional dependencies  (id) ==> (k, c)

after:
select count(*) from sbtest1 group by id;

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Merging #12349 (d973896) into main (7baa27f) will increase coverage by 0.00%.
Report is 7 commits behind head on main.
The diff coverage is 98.33%.

@@           Coverage Diff           @@
##             main   #12349   +/-   ##
=======================================
  Coverage   69.90%   69.91%           
=======================================
  Files        1415     1416    +1     
  Lines      235524   235583   +59     
=======================================
+ Hits       164644   164708   +64     
+ Misses      70880    70875    -5     
Flag Coverage Δ
rust 69.91% <98.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/frontend/src/optimizer/rule/mod.rs 100.00% <ø> (ø)
...d/src/optimizer/rule/agg_group_by_simplify_rule.rs 98.14% <98.14%> (ø)
src/frontend/src/optimizer/logical_optimization.rs 97.96% <100.00%> (+0.01%) ⬆️

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, will we support that?

CREATE TABLE t(v int, k int primary key);
SELECT v, k from t group by k;
ERROR:  QueryError: Invalid input syntax: column must appear in the GROUP BY clause or be used in an aggregate function

@chenzl25
Copy link
Contributor Author

Furthermore, will we support that?

CREATE TABLE t(v int, k int primary key);
SELECT v, k from t group by k;
ERROR:  QueryError: Invalid input syntax: column must appear in the GROUP BY clause or be used in an aggregate function

To be compatible with PostgreSQL, I think we don't need to support it, although other databases like MySQL support this feature.

@chenzl25 chenzl25 added this pull request to the merge queue Sep 18, 2023
Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use first_value to replace redundant group by key, so that first_value could be purged by the following column pruning optimization.

How does column pruning purge the first_value. Seems like it is still present in some of the output plans?

Merged via the queue into main with commit cedaec9 Sep 18, 2023
6 of 7 checks passed
@chenzl25 chenzl25 deleted the dylan/support_agg_group_by_simplify_rule branch September 18, 2023 08:15
@stdrc
Copy link
Member

stdrc commented Sep 18, 2023

Maybe we need a any_value (exists in DuckDB while its semantic is first non-null value) or arbitrary_value instead of first_value. Currently streaming impl of first_value requires an order and thus a state table...

@chenzl25
Copy link
Contributor Author

Use first_value to replace redundant group by key, so that first_value could be purged by the following column pruning optimization.

How does column pruning purge the first_value. Seems like it is still present in some of the output plans?

If those columns haven't been used, they could be pruned, however, if they are still being used, they will present.

@chenzl25
Copy link
Contributor Author

Maybe we need a any_value (exists in DuckDB while its semantic is first non-null value) or arbitrary_value instead of first_value. Currently streaming impl of first_value requires an order and thus a state table...

Yes, but for each group key, this rule can ensure that there is always one row in the first_value state table corresponding to it.

@stdrc
Copy link
Member

stdrc commented Sep 18, 2023

Maybe we need a any_value (exists in DuckDB while its semantic is first non-null value) or arbitrary_value instead of first_value. Currently streaming impl of first_value requires an order and thus a state table...

Yes, but for each group key, this rule can ensure that there is always one row in the first_value state table corresponding to it.

Aren't there #input rows of each group rows of the same value in each group?

@st1page
Copy link
Contributor

st1page commented Sep 18, 2023

Yes, but for each group key, this rule can ensure that there is always one row in the first_value state table corresponding to it.

Exactly not 🥵 there are multiple records in each group

CREATE TABLE User{
    id int Primary key,
    name varchar,    
}

CREATE TABLE Events {
    user_id int,
}

SELECT user_id, name, count(*) FROM User 
Join Event On Events.user_id = User.id
Group by user_id, name;

Btw, this case need us do agg-join push down rule 🤔

Little-Wallace added a commit that referenced this pull request Sep 18, 2023
commit c82fc9c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Sep 18 08:37:33 2023 +0000

    chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#12359)

    Signed-off-by: dependabot[bot] <[email protected]>
    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Runji Wang <[email protected]>
    Co-authored-by: TennyZhuang <[email protected]>

commit cbdc1ac
Author: Huangjw <[email protected]>
Date:   Mon Sep 18 16:22:35 2023 +0800

    chore(ci): move release jobs to main-cron pipeline (#12339)

commit b37a19c
Author: Yuhao Su <[email protected]>
Date:   Mon Sep 18 16:18:01 2023 +0800

    feat(dashboard): add memory profiling (#12052)

commit 71d8170
Author: TennyZhuang <[email protected]>
Date:   Mon Sep 18 15:58:26 2023 +0800

    refactor(expr): allow defining functions in frontend (#12287)

    Signed-off-by: TennyZhuang <[email protected]>
    Co-authored-by: zwang28 <[email protected]>
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit cedaec9
Author: Dylan <[email protected]>
Date:   Mon Sep 18 15:54:10 2023 +0800

    feat(optimizer): support agg group by simplify rule (#12349)

commit 71d9b0b
Author: Noel Kwan <[email protected]>
Date:   Mon Sep 18 15:32:00 2023 +0800

    feat(meta): update StreamJob status on finish (#12342)

commit 784fe56
Author: zwang28 <[email protected]>
Date:   Mon Sep 18 14:47:49 2023 +0800

    fix(backup): ensure correct delta log order (#12371)

commit 711ecd5
Author: congyi wang <[email protected]>
Date:   Mon Sep 18 14:11:24 2023 +0800

    feat(state_table): add iterator sub range under a certain pk prefix (#12251)

commit 1877aed
Author: xiangjinwu <[email protected]>
Date:   Mon Sep 18 13:49:15 2023 +0800

    refactor(sink): impl SinkFormatter for AppendOnly and Upsert (#12321)

commit f304ed2
Author: xxchan <[email protected]>
Date:   Sun Sep 17 20:20:17 2023 +0800

    revert: Revert "chore: add platforms to hakari (#12333)" (#12363)

commit a975d93
Author: Bohan Zhang <[email protected]>
Date:   Sun Sep 17 19:04:24 2023 +0800

    fix: handle kafka sink message timeout error (#12350)

commit 8ef74ad
Author: Runji Wang <[email protected]>
Date:   Sat Sep 16 12:16:02 2023 +0800

    fix(udf): handle visibility of input chunks in UDTF (#12357)

    Signed-off-by: Runji Wang <[email protected]>

commit 31fdc26
Author: Xu <[email protected]>
Date:   Fri Sep 15 21:01:14 2023 -0400

    feat(expr): switch to `fancy-regex` crate & update the original version (#12329)

    Co-authored-by: xzhseh <[email protected]>

commit 0032145
Author: Runji Wang <[email protected]>
Date:   Fri Sep 15 16:57:25 2023 +0800

    refactor(expr): support variadic function in `#[function]` macro (#12178)

    Signed-off-by: Runji Wang <[email protected]>

commit 467ba4b
Author: stonepage <[email protected]>
Date:   Fri Sep 15 16:28:13 2023 +0800

    fix: stream backfill executor use correct schema (#12314)

    Co-authored-by: Noel Kwan <[email protected]>

commit c443197
Author: Dylan <[email protected]>
Date:   Fri Sep 15 16:22:13 2023 +0800

    feat(optimizer): support correlated column in order by (#12341)

commit 8a36ca3
Author: Noel Kwan <[email protected]>
Date:   Fri Sep 15 16:11:03 2023 +0800

    feat(meta): Add `creating_status` field for stream jobs (#12330)

commit bf5b14e
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 16:06:17 2023 +0800

    chore: lift decoding message size limit for ddl client (#12340)

commit c0060b2
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 15:32:14 2023 +0800

    feat(meta): add hummock config relevant tables to rw_catalog (#12337)

commit 59bb645
Author: xxchan <[email protected]>
Date:   Fri Sep 15 14:54:54 2023 +0800

    chore: add platforms to hakari (#12333)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 7baa27f
Author: Bugen Zhao <[email protected]>
Date:   Fri Sep 15 14:00:14 2023 +0800

    chore: split full debug info for release build (#12255)

    Signed-off-by: Bugen Zhao <[email protected]>

commit a99e6f3
Author: Richard Chien <[email protected]>
Date:   Fri Sep 15 13:58:19 2023 +0800

    fix(stream): fix pk indices of GroupTopN executors (#12304)

    Signed-off-by: Richard Chien <[email protected]>

commit 43c010e
Author: Croxx <[email protected]>
Date:   Fri Sep 15 11:59:41 2023 +0800

    chore: fix comment and metrics (#12331)

    Signed-off-by: MrCroxx <[email protected]>

commit 214118b
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Sep 15 10:03:14 2023 +0800

    chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#12322)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 41ebb2a
Author: Xu <[email protected]>
Date:   Thu Sep 14 22:02:08 2023 -0400

    fix(regexp): substraction overflow when incorrectly speicifying `start` (#12325)

commit a566cfe
Author: Xu <[email protected]>
Date:   Thu Sep 14 12:58:35 2023 -0400

    feat(expr): add `array_sum` (#12162)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 28bbf10
Author: Croxx <[email protected]>
Date:   Fri Sep 15 00:40:27 2023 +0800

    fix(ci): exclude tikv-jemalloc-sys in hakari check (#12320)

    Signed-off-by: MrCroxx <[email protected]>

commit 5aa5a47
Author: zwang28 <[email protected]>
Date:   Thu Sep 14 21:02:01 2023 +0800

    feat(meta): add hummock version relevant tables to rw_catalog (#12309)

commit a740364
Author: Huangjw <[email protected]>
Date:   Thu Sep 14 19:11:04 2023 +0800

    chore(ci): install locales in prebuilt image (#12311)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 0e72056
Author: StrikeW <[email protected]>
Date:   Thu Sep 14 18:42:34 2023 +0800

    refactor(jdbc-sink): execute statements in batch and set isolation level to RC (#12250)

commit 827ed5e
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:41 2023 +0800

    refactor(connector): migrate cdc source metric from connector to compute (#12283)

commit a934185
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:04 2023 +0800

    fix(optimizer): relax scan predicate pull up mapping inverse restriction (#12308)

commit db0c099
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:30:28 2023 +0800

    feat(stream): handling watermark in temporal join (#12302)

commit 1ecea63
Author: Bugen Zhao <[email protected]>
Date:   Thu Sep 14 16:43:14 2023 +0800

    refactor(risedev): split the steps for building and running playground (#12279)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: xxchan <[email protected]>

commit ae4b1f8
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Sep 14 08:41:29 2023 +0000

    chore(deps): Bump clap from 4.4.2 to 4.4.3 (#12245)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 7ca370a
Author: Croxx <[email protected]>
Date:   Thu Sep 14 16:24:19 2023 +0800

    feat(refill): fetch whole sst file when refilling (#12265)

    Signed-off-by: MrCroxx <[email protected]>

commit ec129b6
Author: Yuhao Su <[email protected]>
Date:   Thu Sep 14 16:04:37 2023 +0800

    chore: use cfg! to instead of #cfg[] for jemalloc control policy (#12307)

commit 9814af8
Author: Runji Wang <[email protected]>
Date:   Thu Sep 14 14:45:14 2023 +0800

    feat(expr): add `pg_sleep` function (#12294)

    Signed-off-by: Runji Wang <[email protected]>

commit 4525e67
Author: Noel Kwan <[email protected]>
Date:   Thu Sep 14 14:38:03 2023 +0800

    feat(stream): support source throttling (#12295)

commit 5ffd58d
Author: Dylan <[email protected]>
Date:   Thu Sep 14 14:35:03 2023 +0800

    refactor(connector): replace validate source rpc with jni (#12270)

commit 888f2dd
Author: Eric Fu <[email protected]>
Date:   Thu Sep 14 14:32:59 2023 +0800

    fix: panic when dumping memory profile (#12276)

Signed-off-by: Little-Wallace <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Use functional dependencies to simplify aggregation's group by
4 participants