Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095

gargvishesh · 2023-10-05T09:49:40Z

Description

EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.

This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.

This PR has:

Fix code style

This reverts commit 7760674.

…atible()

kgyrtkirk · 2023-10-05T10:44:05Z

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

+
+      SqlAggFunction aggFunction;
+
+      switch (getName()) {


I wonder if this switch is really needed:

AggregatorType is an enum; you could probably save it in the constructor - so that no string based switch is necessary

and probably the enum could have a method or something which returns the other

or you may also accept a second AggregatorType in the constructor ( what should it be rewritten to) so there will be no question what the "other" should be....

Modified to take the latter approach of passing the replacement SqlAggFunction in the constructor itself.

kgyrtkirk · 2023-10-05T10:49:04Z

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

+
+      switch (operands.size()) {
+        case 1:
+          return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier(


note:
you could probably use the list constructor of the call; so there is less duplication

List<SqlNode> newOperands = new ArrayList<SqlNode>(); newOperands.add(operands.get(0)); newOperands.add(new SqlIdentifier("__time", pos)); if (operands.size() == 2) newOperands.add(operands.get(1)); return aggFunction.createCall(pos, newOperands);

Thanks - updated the code likewise.

kgyrtkirk · 2023-10-05T10:57:42Z

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

@@ -340,5 +345,47 @@ private static class EarliestLatestSqlAggFunction extends SqlAggFunction
          Optionality.FORBIDDEN
      );
    }
+
+    @Override
+    public SqlNode rewriteCall(


I wonder if after this patch constructors of classes like LongLastAggregatorFactory should still accept timeColumn as null or not

Good point! I think it shouldn't. Maybe can take up those modifications in a separate PR.

rohangarg · 2023-10-05T11:19:53Z

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

+          return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier(
+              "__time", pos));
+        case 2:
+          return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier(


I wonder if we should validate the new call created as well - since we're hard coding the __time now in the function call. It may or may not be present in the aggregation scope - which is actually a bad thing that the LATEST/EARLIEST invocations have.

The new call will be validated post the rewrite. But there was still the issue of validator treating rewritten query as originating from the user and flagging unrelatable col __time not found in any table (row x, col y) error. I've now addressed it by introducing a custom SQLIdentifier class.

…dator

…ad creating a custom SQLIdentifier class for the Time column

abhishekagarwal87 · 2023-10-11T09:17:48Z

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

+      }
+      catch (CalciteContextException e) {
+        if (e.getCause() instanceof SqlValidatorException) {
+          throw DruidException.forPersona(DruidException.Persona.ADMIN)


Suggested change

throw DruidException.forPersona(DruidException.Persona.ADMIN)

throw DruidException.forPersona(DruidException.Persona.USER)

Chose ADMIN Persona here since the exceptions in existing tests such as testTimeColumnAggregationsOnLookups had had the same -- not sure why though. If that's incorrect, shall update the tests as well.

This is the reason you are looking for: https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/QueryHandler.java#L692

However, in this case, it seems like we do know the exact reason why the query failed - It's due to the absence of a time column. So perhaps change the wording to be more assertive. Also, we should change it to user, because of the reason highlighted in the link above.

I'm not sure if the reason will always be absence of time column - what if there's a case of a join where __time column is coming from both the tables, in that case I think it could also be an ambiguous column.

I don't know what will admin do with these planning errors even if they are hints. The hints are meant for the end-user. I don't want to hold this PR so would let it merge as it is.

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java

…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)

) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)

…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.

gargvishesh added 5 commits October 4, 2023 15:30

Add tests for week, month and year granularities

7760674

Fix code style

Revert "Add tests for week, month and year granularities"

9100e22

This reverts commit 7760674.

Rewrite EARLIEST and LATEST functions to _%-BY counterparts.

521d1ea

Add comment

2af7ede

Add test

178bee6

github-actions bot added the Area - Querying label Oct 5, 2023

gargvishesh added 2 commits October 5, 2023 15:39

Merge branch 'master' into 37580-latest-to-latest-by

b4154b3

Update recent function change in Master notMsqCompatible -> msqIncomp…

8d36064

…atible()

kgyrtkirk reviewed Oct 5, 2023

View reviewed changes

rohangarg reviewed Oct 5, 2023

View reviewed changes

gargvishesh added 5 commits October 6, 2023 15:52

Address review comments and fix tests

14a0d64

Retain exceptions corresponding to EARLIEST/LATEST operators

6e23061

Introduce and use a new conversion boolean in Validator

354029e

Move convert field boolean to DruidSqlValidator from BaseDruidSqlVali…

b98ffcf

…dator

Reverting previous changes of a new DruidSqlValidator field and inste…

295d4a7

…ad creating a custom SQLIdentifier class for the Time column

kgyrtkirk approved these changes Oct 10, 2023

View reviewed changes

Fix style errors

0d9cba9

abhishekagarwal87 reviewed Oct 11, 2023

View reviewed changes

rohangarg reviewed Oct 11, 2023

View reviewed changes

...in/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestAnySqlAggregator.java Show resolved Hide resolved

abhishekagarwal87 approved these changes Oct 11, 2023

View reviewed changes

abhishekagarwal87 merged commit c6ca990 into apache:master Oct 11, 2023
81 checks passed

cryptoe mentioned this pull request Oct 12, 2023

28.0.0 backports #15139

Merged

abhishekagarwal87 added this to the 28.0 milestone Oct 19, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

kgyrtkirk mentioned this pull request Jan 12, 2024

Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities #15670

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095

Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095

gargvishesh commented Oct 5, 2023

kgyrtkirk Oct 5, 2023

gargvishesh Oct 6, 2023

kgyrtkirk Oct 5, 2023

gargvishesh Oct 6, 2023

kgyrtkirk Oct 5, 2023

gargvishesh Oct 6, 2023

rohangarg Oct 5, 2023

gargvishesh Oct 10, 2023 •

edited

Loading

abhishekagarwal87 Oct 11, 2023

gargvishesh Oct 11, 2023

LakshSingla Oct 11, 2023 •

edited

Loading

rohangarg Oct 11, 2023

abhishekagarwal87 Oct 11, 2023

	throw DruidException.forPersona(DruidException.Persona.ADMIN)
	throw DruidException.forPersona(DruidException.Persona.USER)

Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095

Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY #15095

Conversation

gargvishesh commented Oct 5, 2023

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gargvishesh Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Oct 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gargvishesh Oct 10, 2023 •

edited

Loading

LakshSingla Oct 11, 2023 •

edited

Loading