-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NO MERGE]fix: expression ownership in the temp table of subquery #168
Conversation
Have you tried this before? CREATE TABLE t1(pk int primary key, a int, b int);
CREATE TABLE t2(pk int primary key, a int, c int);
explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where a > 1);
PLAN
-------------------------------------------------------------------
Projection [t1.b] [Project] +
LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
Scan t1 -> [a, b] [SeqScan] +
Projection [t1.a] [Project] +
Filter (t1.a > 1), Is Having: false [Filter] +
Scan t2 -> [] [SeqScan] The problem is here: First bind t1 and t1.b, t1.a, then the subquery. When binding the subquery, it hit these code, and get t1.a but not t2.a. Lines 291 to 299 in f8488f4
|
You are a genius! Found so many issues and I fixed them, please check them out sir! let _ = fnck_sql
.run("CREATE TABLE t1(pk int primary key, a int, b int);")
.await?;
let _ = fnck_sql
.run("CREATE TABLE t2(pk int primary key, a int, c int);")
.await?;
let (schema, tuples) = fnck_sql
.run("explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where a > 1);")
.await?; +-------------------------------------------------------------------+
| PLAN |
+===================================================================+
| Projection [t1.b] [Project] |
| LeftSemi Join On t1.a = (t2.a) as (_temp_table_1_.a) [HashJoin] |
| Scan t1 -> [a, b] [SeqScan] |
| Projection [t2.a] [Project] |
| Filter (t2.a > 1), Is Having: false [Filter] |
| Scan t2 -> [a] [SeqScan] |
+-------------------------------------------------------------------+ |
@crwen |
consider this also: explain select a from t1 where a in (select t2.a from t1 as t2 where t2.pk > 10);
PLAN
-------------------------------------------------------------------
Projection [t1.a] [Project] +
LeftSemi Join On t1.a = (t1.a) as (_temp_table_1_.a) [HashJoin]+
Scan t1 -> [a] [SeqScan] +
Projection [t1.a] [Project] +
Filter (t1.pk > 10), Is Having: false [Filter] +
Scan t1 -> [pk, a] [SeqScan] When calling Lines 131 to 149 in f8488f4
|
Great, your observation skills are amazing. I fixed the problem. Now the pos in the case you gave seems to be the same as before, but it is correct. I checked it in the new commit. The corresponding relationship of pos. By the way, their pos may be the same, but they actually mean the positions in the respective left tuple and right tuple. This is because expr needs to be used in left_build: https://github.com/KipData/FnckSQL/blob/main/src/execution/volcano/dql/join/hash_join.rs#L119 best_plan plan: LogicalPlan {
operator: Project(
ProjectOperator {
exprs: [
Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
pos: 0,
},
],
},
),
childrens: [
LogicalPlan {
operator: Join(
JoinOperator {
on: On {
on: [
(
Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
pos: 0,
},
Reference {
expr: Alias {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
alias: Expr(
ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"_temp_table_1_",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
),
},
pos: 0,
},
),
],
filter: None,
},
join_type: LeftSemi,
},
),
childrens: [
LogicalPlan {
operator: Scan(
ScanOperator {
table_name: "t1",
primary_key: 0,
columns: [
(
1,
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
],
limit: (
None,
None,
),
index_infos: [
IndexInfo {
meta: IndexMeta {
id: 0,
column_ids: [
0,
],
table_name: "t1",
pk_ty: Integer,
name: "pk_pk",
ty: PrimaryKey,
},
range: None,
},
],
},
),
childrens: [],
physical_option: Some(
SeqScan,
),
_output_schema_ref: None,
},
LogicalPlan {
operator: Project(
ProjectOperator {
exprs: [
Alias {
expr: Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
pos: 0,
},
alias: Expr(
ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"_temp_table_1_",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
),
},
],
},
),
childrens: [
LogicalPlan {
operator: Project(
ProjectOperator {
exprs: [
Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
pos: 1,
},
],
},
),
childrens: [
LogicalPlan {
operator: Filter(
FilterOperator {
predicate: Binary {
op: Gt,
left_expr: Reference {
expr: ColumnRef(
ColumnCatalog {
summary: ColumnSummary {
id: Some(
0,
),
name: "pk",
table_name: Some(
"t1",
),
},
nullable: false,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: true,
is_unique: false,
default: None,
},
},
),
pos: 0,
},
right_expr: Constant(
Int32(0),
),
ty: Boolean,
},
having: false,
},
),
childrens: [
LogicalPlan {
operator: Scan(
ScanOperator {
table_name: "t1",
primary_key: 0,
columns: [
(
0,
ColumnCatalog {
summary: ColumnSummary {
id: Some(
0,
),
name: "pk",
table_name: Some(
"t1",
),
},
nullable: false,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: true,
is_unique: false,
default: None,
},
},
),
(
1,
ColumnCatalog {
summary: ColumnSummary {
id: Some(
1,
),
name: "a",
table_name: Some(
"t1",
),
},
nullable: true,
desc: ColumnDesc {
column_datatype: Integer,
is_primary: false,
is_unique: false,
default: None,
},
},
),
],
limit: (
None,
None,
),
index_infos: [
IndexInfo {
meta: IndexMeta {
id: 0,
column_ids: [
0,
],
table_name: "t1",
pk_ty: Integer,
name: "pk_pk",
ty: PrimaryKey,
},
range: Some(
Scope {
min: Excluded(
Int32(0),
),
max: Unbounded,
},
),
},
],
},
),
childrens: [],
physical_option: Some(
SeqScan,
),
_output_schema_ref: None,
},
],
physical_option: Some(
Filter,
),
_output_schema_ref: None,
},
],
physical_option: Some(
Project,
),
_output_schema_ref: None,
},
],
physical_option: Some(
Project,
),
_output_schema_ref: None,
},
],
physical_option: Some(
HashJoin,
),
_output_schema_ref: None,
},
],
physical_option: Some(
Project,
),
_output_schema_ref: None,
} |
It seems to project twice on temp table. And I don't think it can handle more complex situations. explain SELECT b FROM t1 WHERE a in (SELECT a FROM t2 where (select a from t1));
PLAN
-------------------------------------------------------------------
Projection [t1.b] [Project] +
LeftSemi Join On t1.a = (t2.a) as (_temp_table_2_.a) [HashJoin]+
Scan t1 -> [a, b] [SeqScan] +
Projection [(t2.a) as (_temp_table_2_.a)] [Project] +
Projection [t2.a] [Project] +
Inner Join Where _temp_table_1_.a [HashJoin] +
Scan t2 -> [a] [SeqScan] +
Projection [(t2.a) as (_temp_table_1_.a)] [Project] +
Projection [t2.a] [Project] +
Scan t1 -> [] [SeqScan] I think every query using their own binder context may be a solution, but it may be a big work. |
Yes, let me create an issue for this problem to avoid affecting your PR. |
For the two Projects, I referred to Datafusion’s TableAlias and Project. |
close by #164 |
What problem does this PR solve?
Fix code for:#164
Code changes
Check List
Tests
Side effects
Note for reviewer