-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support CORRESPONDING
specification in set operations
#17891
Conversation
src/frontend/src/binder/set_expr.rs
Outdated
"At least one column of the left side shall have a <column name> that is the \ | ||
<column name> of some column of the right side" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"At least one column of the left side shall have a <column name> that is the \ | |
<column name> of some column of the right side" | |
"When using Corresponding keywork, at least one column of the left side shall have a <column name> that is the \ | |
<column name> of some column of the right side" |
What does "spec" mean here? |
CORRESPONDING
clause in set operations
https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt CORRESPONDING is not a clause. It's a specification. |
src/frontend/src/binder/set_expr.rs
Outdated
for (idx, field) in set_expr.schema().fields.iter().enumerate() { | ||
if name2idx.insert(field.name.clone(), idx).is_some() { | ||
return Err(ErrorCode::InvalidInputSyntax(format!( | ||
"duplicated column name `{}` in a select list of a set operation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to indicate which side of which operation, instead of "a select list of a set operation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, better to add a test case for this error. I didn't see one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be more than 2 side. If we have a A union corresponding B union corresponding C
. It can be confusing to if B is a left or right side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Would that still be better than nothing, since we can exclude C
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful for debugging. But I'm just worried users will be confused about which one is left when they see three unioned query.
I do think printing the whole column list in error msg makes sense know.
src/frontend/src/binder/set_expr.rs
Outdated
"Every <column name> in the <corresponding column list> shall be a \ | ||
<column name> of both left and right side. Missing column: `{}`", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "corresponding column list" seems a little verbose. "corresponding clause"?
- Why using
<>
? This seems unnecessary and inconsistent with our other error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And better to indicate which side is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And better to indicate which side is missing.
Ditto
src/frontend/src/binder/set_expr.rs
Outdated
"When CORRESPONDING is specified, at least one column of the left side shall have a <column name> that is the \ | ||
<column name> of some column of the right side" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean
"When CORRESPONDING is specified, at least one column of the left side shall have a <column name> that is the \ | |
<column name> of some column of the right side" | |
"When CORRESPONDING is specified, left and right side should have at least one common column name" |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left or right side does not have column name, the columns in left or right have column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using <>?
I can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left or right side does not have column name, the columns in left or right have column names.
Sorry, I do not understand. Can you explain to me like I'm a normal user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left or right side are query
s. A query
does not have column name. The columns/select items in a query have column name.
left and right side should have at least one common column name
This can be rephrased to left and right side should have at least one column with identical column names
The original one I use is from the SQL 92 doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A query does not have column name. The columns/select items in a query have column name.
Well, this sounds too language lawyer to me.. 🤡
I think "A query's column name" is clear enough to me. Do you really think its vague? 🤔 (Specification needs to be more accurate though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really think its vague?
I can definitely understand that expression perfectly. But using more accurate language can avoid ambiguity for users, especially CORRESPONDING is not a wildly available feature is other DB.
I doubt the release note is not understandable. Could you try to improve it using clearer language? I can imagine some possible questions asked by users:
Besides, the error messages can be |
Thanks for the reviewing for the doc part. |
CORRESPONDING
clause in set operationsCORRESPONDING
specification in set operations
Do you think print the column name list in the error msg can help? @xxchan |
Well, it depends.. You can decide yourself according to the principles. (Make it easier for users to locate and fix the problem.) Or give up if you don't find any good ways. |
BTW, just searched a little bit why PG doesn't have this, and found a long thread about adding the feature and it was abandoned due to complexity: https://www.postgresql.org/message-id/flat/CAFj8pRCspwFhF1JVD-GKJS0rsojBC1r6okFmDtfVKiSu2XwD0g%40mail.gmail.com#f19dd33222923fc2310b4bd2c1b15c25
Not sure whether we have such kind of problems. |
I guess not, because we just inserted And I found an interesting sentence in the discussion:
https://arc.net/l/quote/elyhevcv So I guess we won't have problems since our executors don't have such level of optimizations. 🤣 |
db error: ERROR: Failed to run the query | ||
|
||
Caused by: | ||
Invalid input syntax: When CORRESPONDING is specified, at least one column of the left side shall have a column name that is the column name of some column of the right side in a UNION operation. Left side query column list: ("v1", "v2", "v4"). Right side query column list: ("vxx"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xxchan should remove the column list of this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can decide it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep this considering it can be fairly difficult to identify the faulty query without any information.
…hao/corresponding
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
support corresponding spec in set operation
#17451
Checklist
./risedev check
(or alias,./risedev c
)Documentation
set operation SQL syntax
query1 UNION/EXPECT/INTERSECT query2
requires query1 and query2 return the same number of columns and match the columns in left-to-right order (by column index).With corresponding specification, the set operation matches columns by the column names. Only columns that are in both side will be overlayed. It will suppress columns that are not in both side of the set operation. Columns are considered in both side if they have the same name or alias
You can specify the columns need to be overlayed in the set operation by putting the column names in
<corresponding column list>
. Only columns that are in both side and the corresponding column list will be overlayed.<corresponding spec>
can be used afterUNION/EXPECT/INTERSECT [ALL]
<corresponding spec> ::= CORRESPONDING [ BY <left paren> <corresponding column list> <right parent> ]
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.