-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095
Conversation
|
||
SqlAggFunction aggFunction; | ||
|
||
switch (getName()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this switch is really needed:
AggregatorType
is an enum; you could probably save it in the constructor - so that no string based switch is necessary- and probably the enum could have a method or something which returns the other
- or you may also accept a second
AggregatorType
in the constructor ( what should it be rewritten to) so there will be no question what the "other" should be....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to take the latter approach of passing the replacement SqlAggFunction in the constructor itself.
|
||
switch (operands.size()) { | ||
case 1: | ||
return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note:
you could probably use the list constructor of the call; so there is less duplication
List<SqlNode> newOperands = new ArrayList<SqlNode>();
newOperands.add(operands.get(0));
newOperands.add(new SqlIdentifier("__time", pos));
if (operands.size() == 2)
newOperands.add(operands.get(1));
return aggFunction.createCall(pos, newOperands);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - updated the code likewise.
@@ -340,5 +345,47 @@ private static class EarliestLatestSqlAggFunction extends SqlAggFunction | |||
Optionality.FORBIDDEN | |||
); | |||
} | |||
|
|||
@Override | |||
public SqlNode rewriteCall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if after this patch constructors of classes like LongLastAggregatorFactory
should still accept timeColumn
as null
or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I think it shouldn't. Maybe can take up those modifications in a separate PR.
return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( | ||
"__time", pos)); | ||
case 2: | ||
return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should validate the new call created as well - since we're hard coding the __time
now in the function call. It may or may not be present in the aggregation scope - which is actually a bad thing that the LATEST/EARLIEST invocations have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new call will be validated post the rewrite. But there was still the issue of validator treating rewritten query as originating from the user and flagging unrelatable col __time not found in any table (row x, col y)
error. I've now addressed it by introducing a custom SQLIdentifier class.
…ad creating a custom SQLIdentifier class for the Time column
} | ||
catch (CalciteContextException e) { | ||
if (e.getCause() instanceof SqlValidatorException) { | ||
throw DruidException.forPersona(DruidException.Persona.ADMIN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw DruidException.forPersona(DruidException.Persona.ADMIN) | |
throw DruidException.forPersona(DruidException.Persona.USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chose ADMIN
Persona here since the exceptions in existing tests such as testTimeColumnAggregationsOnLookups
had had the same -- not sure why though. If that's incorrect, shall update the tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason you are looking for: https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/QueryHandler.java#L692
However, in this case, it seems like we do know the exact reason why the query failed - It's due to the absence of a time column. So perhaps change the wording to be more assertive. Also, we should change it to user, because of the reason highlighted in the link above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the reason will always be absence of time column - what if there's a case of a join where __time
column is coming from both the tables, in that case I think it could also be an ambiguous
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what will admin do with these planning errors even if they are hints. The hints are meant for the end-user. I don't want to hold this PR so would let it merge as it is.
...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
Show resolved
Hide resolved
...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
Show resolved
Hide resolved
...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java
Show resolved
Hide resolved
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)
) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
Description
EARLIEST
andLATEST
operators implicitly reference the__time
column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the__time
column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.This change rewrites these operators to
EARLIEST_BY
andLATEST_BY
during query processing to make the reference explicit to Calcite.This PR has: