-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): ban jsonb in aggregation stream key #14442
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
29 changes: 29 additions & 0 deletions
29
src/frontend/planner_test/tests/testdata/input/jsonb_in_stream_key.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
- name: jsonb in group by | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
create table t2 (v3 jsonb, v4 int); | ||
select v2 from t1 group by v2, v1; | ||
expected_outputs: | ||
- stream_error | ||
- name: jsonb in union | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
create table t2 (v3 jsonb, v4 int); | ||
select v1, v2 from t1 union select v3, v4 from t2; | ||
expected_outputs: | ||
- stream_error | ||
- name: jsonb in distinct | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
select distinct v1 from t1; | ||
expected_outputs: | ||
- stream_error | ||
- name: jsonb in TopN by group | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
SELECT v1 FROM ( | ||
SELECT v1, rank() OVER (PARTITION BY v1 ORDER BY v2) AS rank | ||
FROM t1) | ||
WHERE rank <= 2; | ||
expected_outputs: | ||
- stream_error |
34 changes: 34 additions & 0 deletions
34
src/frontend/planner_test/tests/testdata/output/jsonb_in_stream_key.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# This file is automatically generated. See `src/frontend/planner_test/README.md` for more information. | ||
- name: jsonb in group by | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
create table t2 (v3 jsonb, v4 int); | ||
select v2 from t1 group by v2, v1; | ||
stream_error: |- | ||
Not supported: Column t1.v1 should not be in the aggregation group key because it has data type Jsonb | ||
HINT: Use a large Jsonb as part of the stream key is unexpected. If you are sure your Jsonb is small, use `set XXX true` | ||
- name: jsonb in union | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
create table t2 (v3 jsonb, v4 int); | ||
select v1, v2 from t1 union select v3, v4 from t2; | ||
stream_error: |- | ||
Not supported: Column t1.v1 should not be in the union because it has data type Jsonb | ||
HINT: Use a large Jsonb as part of the stream key is unexpected. If you are sure your Jsonb is small, use `set XXX true` | ||
- name: jsonb in distinct | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
select distinct v1 from t1; | ||
stream_error: |- | ||
Not supported: Column t1.v1 should not be in the aggregation group key because it has data type Jsonb | ||
HINT: Use a large Jsonb as part of the stream key is unexpected. If you are sure your Jsonb is small, use `set XXX true` | ||
- name: jsonb in TopN by group | ||
sql: | | ||
create table t1 (v1 jsonb, v2 int); | ||
SELECT v1 FROM ( | ||
SELECT v1, rank() OVER (PARTITION BY v1 ORDER BY v2) AS rank | ||
FROM t1) | ||
WHERE rank <= 2; | ||
stream_error: |- | ||
Not supported: Column t1.v1 should not be in the over window partition key because it has data type Jsonb | ||
HINT: Use a large Jsonb as part of the stream key is unexpected. If you are sure your Jsonb is small, use `set XXX true` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
129 changes: 129 additions & 0 deletions
129
src/frontend/src/optimizer/plan_visitor/jsonb_stream_key_checker.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
// Copyright 2024 RisingWave Labs | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
use risingwave_common::catalog::FieldDisplay; | ||
use risingwave_common::types::DataType; | ||
|
||
use super::{DefaultBehavior, Merge}; | ||
use crate::optimizer::plan_node::generic::GenericPlanRef; | ||
use crate::optimizer::plan_node::{PlanNode, *}; | ||
use crate::optimizer::plan_visitor::PlanVisitor; | ||
|
||
#[derive(Debug, Clone, Default)] | ||
pub struct StreamKeyChecker; | ||
|
||
impl StreamKeyChecker { | ||
fn visit_inputs(&mut self, plan: &impl PlanNode) -> Option<String> { | ||
let results = plan.inputs().into_iter().map(|input| self.visit(input)); | ||
Self::default_behavior().apply(results) | ||
} | ||
} | ||
|
||
impl PlanVisitor for StreamKeyChecker { | ||
type Result = Option<String>; | ||
|
||
type DefaultBehavior = impl DefaultBehavior<Self::Result>; | ||
|
||
fn default_behavior() -> Self::DefaultBehavior { | ||
Merge(|a: Option<String>, b| a.or(b)) | ||
} | ||
|
||
fn visit_logical_dedup(&mut self, plan: &LogicalDedup) -> Self::Result { | ||
let input = plan.input(); | ||
let schema = input.schema(); | ||
let data_types = schema.data_types(); | ||
for idx in plan.dedup_cols() { | ||
if data_types[*idx] == DataType::Jsonb { | ||
return Some(format!( | ||
"Column {} should not be in the distinct key because it has data type Jsonb", | ||
FieldDisplay(&schema[*idx]) | ||
)); | ||
} | ||
} | ||
self.visit_inputs(plan) | ||
} | ||
|
||
fn visit_logical_top_n(&mut self, plan: &LogicalTopN) -> Self::Result { | ||
let input = plan.input(); | ||
let schema = input.schema(); | ||
let data_types = schema.data_types(); | ||
for idx in plan.group_key() { | ||
if data_types[*idx] == DataType::Jsonb { | ||
return Some(format!( | ||
"Column {} should not be in the TopN group key because it has data type Jsonb", | ||
FieldDisplay(&schema[*idx]) | ||
)); | ||
} | ||
} | ||
for idx in plan | ||
.topn_order() | ||
.column_orders | ||
.iter() | ||
.map(|c| c.column_index) | ||
{ | ||
if data_types[idx] == DataType::Jsonb { | ||
return Some(format!( | ||
"Column {} should not be in the TopN order key because it has data type Jsonb", | ||
FieldDisplay(&schema[idx]) | ||
)); | ||
} | ||
} | ||
self.visit_inputs(plan) | ||
} | ||
|
||
fn visit_logical_union(&mut self, plan: &LogicalUnion) -> Self::Result { | ||
for field in &plan.inputs()[0].schema().fields { | ||
if field.data_type() == DataType::Jsonb { | ||
return Some(format!( | ||
"Column {} should not be in the union because it has data type Jsonb", | ||
FieldDisplay(field) | ||
)); | ||
} | ||
} | ||
self.visit_inputs(plan) | ||
} | ||
|
||
fn visit_logical_agg(&mut self, plan: &LogicalAgg) -> Self::Result { | ||
let input = plan.input(); | ||
let schema = input.schema(); | ||
let data_types = schema.data_types(); | ||
for idx in plan.group_key().indices() { | ||
if data_types[idx] == DataType::Jsonb { | ||
return Some(format!("Column {} should not be in the aggregation group key because it has data type Jsonb", FieldDisplay(&schema[idx]))); | ||
yuhao-su marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
self.visit_inputs(plan) | ||
} | ||
|
||
fn visit_logical_over_window(&mut self, plan: &LogicalOverWindow) -> Self::Result { | ||
let input = plan.input(); | ||
let schema = input.schema(); | ||
let data_types = schema.data_types(); | ||
|
||
for func in plan.window_functions() { | ||
for idx in func.partition_by.iter().map(|e| e.index()) { | ||
if data_types[idx] == DataType::Jsonb { | ||
return Some(format!("Column {} should not be in the over window partition key because it has data type Jsonb", FieldDisplay(&schema[idx]))); | ||
} | ||
} | ||
|
||
for idx in func.order_by.iter().map(|c| c.column_index) { | ||
if data_types[idx] == DataType::Jsonb { | ||
return Some(format!("Column {} should not be in the over window order by key because it has data type Jsonb", FieldDisplay(&schema[idx]))); | ||
} | ||
} | ||
} | ||
self.visit_inputs(plan) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check the stream key at least after the logical optimization phase because after column pruning, some unnecessary columns check could be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan nodes after optimization can be different from the plan after binding and cause inaccurate error messages.
The columns checked are
group by
order by
partition by
, which is unlikely to be pruned.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for union, it seems hard to say? e.g.
select a, b, c from ( select * from t1 union all select * from t2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean `union` here ? actually I am not sure about the behavior here... can we prune the column here? in other words, I consider they are not equalevant ```SQL SELECT a,b FROM ( SELECT distinct a,b,c FROM t ) t;SELECT distinct a,b FROM FROM t;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed an issue about union all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agg can be created during subquery unnesting. See the sample query here: #7981 (comment)
Btw is join key a concern as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this or only handles user explicitly write agg queries. We need a fallback to handle other cases.
Yes, join can be a concern.