Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot nest: { group_by: array_column } #2065

Open
PeatBoy opened this issue Dec 24, 2024 · 4 comments · May be fixed by #2090
Open

Cannot nest: { group_by: array_column } #2065

PeatBoy opened this issue Dec 24, 2024 · 4 comments · May be fixed by #2090

Comments

@PeatBoy
Copy link

PeatBoy commented Dec 24, 2024

What happens?

nest run bug

To Reproduce

malloy:

source: tests is duckdb.table('../data/flights.parquet') -> {
    aggregate: tests_count is count()
    nest: nest_view is {
        group_by: carrier
    }
} extend {
    dimension: test_dim is @2024
}

run: tests -> {
    nest: nest_view
}

error:

Parser Error: syntax error at or near "as"
LINE 29: ...
    group_set,
    CASE WHEN group_set=0 THEN
      COUNT(1)
      END as "tests_count__0",
    CASE WHEN group_set=1 THEN
      base."carrier"
      END as "carrier__1"
  FROM '../data/flights.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,3
)
, __stage1 AS (
  SELECT
    MAX(CASE WHEN group_set=0 THEN "tests_count__0" END) as "tests_count",
    COALESCE(LIST({
      "carrier": "carrier__1"}  ORDER BY  "carrier__1" asc NULLS LAST) FILTER (WHERE group_set=1),[]) as "nest_view"
  FROM __stage0
)
, __stage2 AS (
  SELECT
    group_set
  FROM __stage1 as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1
)
SELECT
  ANY_VALUE(CASE WHEN group_set=1 THEN STRUCT_PACK("nest_view__1" as "nest_view")) as "nest_view"

SQL:

WITH __stage0 AS (
  SELECT
    group_set,
    CASE WHEN group_set=0 THEN
      COUNT(1)
      END as "tests_count__0",
    CASE WHEN group_set=1 THEN
      base."carrier"
      END as "carrier__1"
  FROM '../data/flights.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,3
)
, __stage1 AS (
  SELECT
    MAX(CASE WHEN group_set=0 THEN "tests_count__0" END) as "tests_count",
    COALESCE(LIST({
      "carrier": "carrier__1"}  ORDER BY  "carrier__1" asc NULLS LAST) FILTER (WHERE group_set=1),[]) as "nest_view"
  FROM __stage0
)
, __stage2 AS (
  SELECT
    group_set
  FROM __stage1 as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1
)
SELECT
  ANY_VALUE(CASE WHEN group_set=1 THEN STRUCT_PACK("nest_view__1" as "nest_view")) as "nest_view"
FROM __stage2

OS:

macos

Malloy Client:

web-based VSCode

Malloy Client Version:

0.2.1733519779

Database Connection:

DuckDB

@mtoy-googly-moogly
Copy link
Collaborator

Ok, there are a number of problems. The first problem is that no matter what else is going on, THIS is not how things should fail. There is no clue what is going on here.

image

@mtoy-googly-moogly
Copy link
Collaborator

mtoy-googly-moogly commented Dec 26, 2024

The next thing that is happening is that the OP appear to be new to Malloy, welcome! ... and not entirely clear on what a source and a view and a query are, as Malloy concepts.

Running run: tests -> { select: * } you will see an output with three columns, the count, the nested list of carriers, and test_dim. nest_thing in the run query is a result of running nest thing in source query, it is an array of records. It is NOT reference to the nest view, it is reference to the result. The run query, expanded out a little, is this ...

run: tests -> { nest: cause_error is ( group_by: ANY_COLUMN_WHICH_IS_AN_ARRAY }};

The bug is we can't have a group_by on an array column value, in a nest statement, which I am working on fixing.

I think OP probably means something like this ...

// add a measure, a view, and a dimension to flights
source: tests2 is duckdb.table('../data/flights.parquet') extend {
    measure: tests_count is count()
    view: nest_view is {
        group_by: carrier
     }   
    dimension: test_dim is @2024
}

// run a query using the measure and the view
run: tests2 -> {
  aggregate: tests_count
  nest: nest_view
}

Feel free to bug us on slack, we'd be happy to walk with you on the learning Malloy journey.

@mtoy-googly-moogly
Copy link
Collaborator

mtoy-googly-moogly commented Dec 26, 2024

Here's a pared down example

source: one_rec is duckdb.sql("SELECT {num:1} as num_rec")
run: one_rec -> { nest: next_rec is {group_by: num_rec }}

source: one_num is duckdb.sql("SELECT 1 as num")
run: one_num -> { nest: next_rec is {group_by: num }}

The first example gets the same "Parser Error: syntax error at or near "as" ad the OP report. The second one works.

The problem is the group_set is not generarted properly because the num_rec is a FieldInstanceResult and so the stage zero field generation skips it.

@mtoy-googly-moogly mtoy-googly-moogly changed the title A problem with nest view: Error in defining nest view in source and re-referencing NestView in run. Cannot nest: { group_by: array_column } Dec 26, 2024
@mtoy-googly-moogly mtoy-googly-moogly linked a pull request Jan 11, 2025 that will close this issue
@lloydtabb
Copy link
Collaborator

source: tests is duckdb.table('../data/flights.parquet') -> {
    aggregate: tests_count is count()
    nest: nest_view is {
        group_by: carrier
    }
} extend {
    dimension: test_dim is @2024
}

This feels wrong.

run: tests -> {
    nest: nest_view
}

Wouldn't you run:

run: tests -> {
   group_by: nest_view. -- or 'select: nest_view'
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants