feat(optimizer): improve scalar subqueries optimization time #16966

chenzl25 · 2024-05-28T09:44:56Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolve Improve: The optimization time grows exponentially as the number of subqueries increases #16952
Multi-scalar subqueries are very common queries written by users. However, in some special cases, it can lead to an exponential increase in optimization time as the number of scalar subqueries grows. The root cause is that multi-scalar subqueries are constructed as a chain of Apply. As a result, it is hard to find the original domain for the upper Apply operator. In this PR, we try to translate the Apply in a top-down order and simplify the domain by extracting the domain from the apply input. In this way, optimization time could linearly increase as the number of scalar subqueries grows.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

fuyufjh · 2024-05-28T13:53:03Z

src/frontend/src/optimizer/rule/translate_apply_rule.rs

+                    if let Some(join) = plan.as_logical_join() {
+                        Self::rewrite_join(
+                            join,
+                            left_idxs,
+                            offset,
+                            index_mapping,
+                            data_types,
+                            index,
+                        )
+                    } else if let Some(apply) = plan.as_logical_apply() {
+                        Self::rewrite_apply(
+                            apply,
+                            left_idxs,
+                            offset,
+                            index_mapping,
+                            data_types,
+                            index,
+                        )
+                    } else if let Some(scan) = plan.as_logical_scan() {
+                        Self::rewrite_scan(
+                            scan,
+                            left_idxs,
+                            offset,
+                            index_mapping,
+                            data_types,
+                            index,
+                        )


Are these lines duplicated with the lambda rewrite?

Let me refactor that. Thanks for pointing it out.

fuyufjh

LGTM

fuyufjh · 2024-05-28T13:55:51Z

src/frontend/planner_test/tests/testdata/output/with_ordinality.yaml

-                  └─BatchExchange { order: [], dist: HashShard(t.arr) }
-                    └─BatchScan { table: t, columns: [t.arr], distribution: SomeShard }
+            └─BatchExchange { order: [], dist: HashShard(t.arr) }
+              └─BatchScan { table: t, columns: [t.arr], distribution: SomeShard }


Curious why this plan gets improved?

Because the initial logical plan of this case is:

LogicalProject { exprs: [t.x, t.arr, unnest, ordinality, unnest, ordinality] } └─LogicalApply { type: Inner, on: true, correlated_id: 1 } ├─LogicalApply { type: Inner, on: true, correlated_id: 2 } │ ├─LogicalScan { table: t, columns: [x, arr, _row_id] } │ └─LogicalTableFunction { table_function: Unnest(CorrelatedInputRef { index: 1, correlated_id: 2 }) } └─LogicalTableFunction { table_function: Unnest(CorrelatedInputRef { index: 1, correlated_id: 1 }) }

It is the same shape as the multi scalar subqueries.

improve scalar subquery optimization time

2b6ebcd

chenzl25 requested review from xiangjinwu, fuyufjh and xxchan May 28, 2024 09:45

github-actions bot added the type/feature label May 28, 2024

chenzl25 requested a review from st1page May 28, 2024 09:45

typo

7e7ed83

fuyufjh reviewed May 28, 2024

View reviewed changes

fuyufjh approved these changes May 28, 2024

View reviewed changes

chenzl25 added 2 commits May 29, 2024 13:15

fmt

5006a66

refactor

0c794e9

chenzl25 enabled auto-merge May 29, 2024 05:22

chenzl25 added this pull request to the merge queue May 29, 2024

Merged via the queue into main with commit bd6454e May 29, 2024
27 of 28 checks passed

chenzl25 deleted the dylan/improve_scalar_subquery_optimization_time branch May 29, 2024 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimizer): improve scalar subqueries optimization time #16966

feat(optimizer): improve scalar subqueries optimization time #16966

chenzl25 commented May 28, 2024

fuyufjh May 28, 2024

chenzl25 May 29, 2024

fuyufjh left a comment

fuyufjh May 28, 2024

chenzl25 May 29, 2024

feat(optimizer): improve scalar subqueries optimization time #16966

feat(optimizer): improve scalar subqueries optimization time #16966

Conversation

chenzl25 commented May 28, 2024

What's changed and what's your intention?

Checklist

Documentation

Release note

fuyufjh May 28, 2024

Choose a reason for hiding this comment

chenzl25 May 29, 2024

Choose a reason for hiding this comment

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh May 28, 2024

Choose a reason for hiding this comment

chenzl25 May 29, 2024

Choose a reason for hiding this comment