Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand order by field #1080

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shivanesabharwal
Copy link

@shivanesabharwal shivanesabharwal commented Apr 10, 2023

Fix #949

The order by field would throw an internal error as it was missing from the FieldInstanceResult field list. Seems that processing of those fields happens in expandFields, and should explicitly check whether the query has an order by field.

The core issue was that only fields in the segment field list were added to the result's field list, and the order by field is not in that list.

The resulting SQL for the test case provided is:

 SELECT 
       models."manufacturer" as "manufacturer",
       COUNT( 1) as "model_count"
    FROM malloytest.aircraft_models as models
    GROUP BY 1
    ORDER BY 2 desc NULLS LAST
    LIMIT 1

@shivanesabharwal shivanesabharwal changed the title malloy Expand order by field Apr 10, 2023
@shivanesabharwal shivanesabharwal marked this pull request as ready for review April 10, 2023 22:34
@@ -81,6 +81,8 @@ import {

import {Connection} from '../runtime_types';
import {AndChain, generateHash, indent} from './utils';
import { isReduceSegment } from './malloy_types';
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraneous imports

@@ -1055,7 +1057,7 @@ class FieldInstanceResult implements FieldInstance {
getField(name: string): FieldInstanceField {
const fi = this.allFields.get(name);
if (fi === undefined) {
throw new Error(`Internal Error, field Not defined ${name}`);
throw new Error(`Internal Error, field not defined ${name}`);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase this not

@@ -1929,7 +1931,18 @@ class QueryQuery extends QueryField {

expandFields(resultStruct: FieldInstanceResult) {
let resultIndex = 1;
for (const f of this.expandWildCards(resultStruct.firstSegment.fields)) {
let fields: QueryFieldDef[] = resultStruct.firstSegment.fields;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core premise of the PR is that we should be processing order by fields explicitly into the resultStruct, otherwise we only get them by chance if they are also included into the query via some other parameter

@lloydtabb
Copy link
Collaborator

@christopherswenson you are working in this space can you help?

@christopherswenson
Copy link
Contributor

Thanks @shivanesabharwal for your contribution! This PR (re?)raises some interesting/difficult questions.

I think this is actually a case where the error message about the field being missing from the FieldInstanceResult is slightly misleading.

Currently there isn't perfect agreement between the parser/translator/typechecker and the compiler/SQL generator, and this is one case where that manifests. Today, the translator doesn't actually check that fields referenced in an order_by follow "the rules." In other words, from the translator's perspective (in your example test case), order_by: manufacturer, order_by: 1, order_by: 100, order_by: model_count, and order_by: adfadfasd are all equally valid; however, that doesn't mean that they're all legal, as evidenced by the fact that only the first two actually compile.

There is active discussion going on between me, @lloydtabb, and @mtoy-googly-moogly about how we validate order_by.

Today my understanding is that an order_by must be the name of a field in the output space of the query, which is essentially a name of a column of the returned table.

According to that understanding, the query in your test case should give an error like "field model_count not found in the output space." The work I'm doing right now on SQL functions actually implements this check.

We're inheriting some of the baggage of SQL here, in that we can't order by a field without including it in the query (though we could theoretically generate an extra query stage to remove those fields after the fact). And it's not clear to me whether it should be possible to automatically add fields to a query by including them in an order_by like you're expecting in your test case.

For example, in this query:

query: models -> {
  group_by: manufacturer
  order_by: seats desc
}

It would certainly be confusing for seats to be automatically added as a group_by; and the language currently doesn't have a way of including a field in a "reduce" query (one with grouping) without actually grouping by that field. Furthermore, it's not even obvious that the query should include seats in the output table necessarily (or in your test case, model_count).

Some possible avenues that seem reasonable to pursue:

  • Add a select: (or similar) query operation to use a scalar field without grouping by it (design question: should this have the same keyword as project:?), then add the ability to reference a field name from the input space in an order_by, which would have the effect of adding that field to the query (as a select: for scalar fields or an aggregate: for measures)
  • Same as above, but also automatically add an additional intermediate stage to the query to remove the fields which are only referenced in an order_by.
  • Exactly what you've done in your PR, except that we only allow aggregates to be referenced this way.

@christopherswenson
Copy link
Contributor

Regarding #949, I'm not sure if @lloydtabb had any one of these options in mind (or something else) when he created the issue.

From previous discussions with him, I would find it a little surprising if he meant anything other than the second option.

@shivanesabharwal
Copy link
Author

Thanks so much for the thorough review Chris!

We're inheriting some of the baggage of SQL here, in that we can't order by a field without including it in the query (though we could theoretically generate an extra query stage to remove those fields after the fact). And it's not clear to me whether it should be possible to automatically add fields to a query by including them in an order_by like you're expecting in your test case.

Including it in the query SELECT seems to be a malloy feature in this case. I admit I expected the generated SQL to be something like

SELECT manufacturer
FROM aircraft_manufacturers 
GROUP BY 1 
ORDER BY COUNT(1) as model_count ;;

It seems this is running into the language limitation you outline:

the language currently doesn't have a way of including a field in a "reduce" query (one with grouping) without actually grouping by that field

but the SQL query above is valid, it's just that Malloy doesn't have a way of doing that right now? Similarly for the non-aggregate order_by case with order_by: seats desc I would have expected:

SELECT manufacturer
FROM aircraft_manufacturers
ORDER BY seats desc

no grouping, since there's no aggregation. I guess what I am saying is that if a user adds an aggregate to a query, they are implicitly asking us to choose a group by set for them, but I can understand that that carries a lot of baggage :)

I know Looker often generates order by fields and puts them in the SELECT stmt, but then simply does not return the row in the server response (it includes it as a sort_value). Seems similar to option 2.

My 2c is that any option that allows the user to avoid typing the field twice, i.e. in both a select: and an order_by would be preferred, but I understand that the user's convenience often trades off with code complexity.

In any case, it's clear that the scope of the problem is beyond this PR. Would you like me to pivot this in any way? It sounds like it's best to just wait until the semantics are settled.

@mtoy-googly-moogly
Copy link
Collaborator

Hmmm ... another thing we could do, for dialects which allow an expression in the order_by is to simply copy the definition of the order_by field into the order_by expression, SQL doesn't seem to mind expressions which reference columns in the FROM

WITH t as (SELECT 1 as n, 2 as m UNION ALL SELECT 3,4) SELECT n from t order by 10-m

It is weird parsing these though ...

group_by: x,y,z
order_by: x.  // in this example `x` is in the output space
group_by: x,y,z
order_by: 1.0 / w      // 'w' would be in this proposal, expected to be in the input space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Order by field appears to be broken.
4 participants