Skip to content

Window Functions

Paul Rogers edited this page Oct 19, 2022 · 3 revisions

References

Syntax

Postgres

Postgres syntax:

function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )

where window_definition has syntax:

[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

frame_clause has the syntax of one of the following:

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

From Postgres docs:

where frame_start and frame_end can be one of

UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING

and frame_exclusion can be one of

EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS

Apache Drill

Drill uses Calcite, so this is likely to be the Calcite syntax as well. (Double check.) From the docs:

   window_function (expression) OVER (
   [ PARTITION BY expr_list ]
   [ ORDER BY order_list ][ frame_clause ] )

where function is one of the functions described, such as AVG(), and expr_list is:

   expression | column_name [, expr_list ]

and order_list is:

   expression | column_name [ASC | DESC] [ NULLS { FIRST | LAST } ] [, order_list ]

and the optional frame_clause is one of the following frames:

   { RANGE | ROWS } frame_start
   { RANGE | ROWS } BETWEEN frame_start AND frame_end

where frame_start is one of the following choices:

   UNBOUNDED PRECEDING
   CURRENT ROW

and frame_end is one of the following choices:

   CURRENT ROW
   UNBOUNDED FOLLOWING

Calcite

Per the docs:

window:
      windowName
  |   windowSpec

windowSpec:
      '('
      [ windowName ]
      [ ORDER BY orderItem [, orderItem ]* ]
      [ PARTITION BY expression [, expression ]* ]
      [
          RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
      |   ROWS numericExpression { PRECEDING | FOLLOWING }
      ]
      ')'

BigQuery

Per the docs:

function_name ( [ argument_list ] ) OVER over_clause

[over_clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_over_clause):
  { named_window | ( [ [window_specification](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_spec) ] ) }

window_specification:
  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  [ ORDER BY expression [ { ASC | DESC }  ] [, ...] ]
  [ [window_frame_clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) ]

window_frame_clause:
  { rows_range } { [frame_start](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) | [frame_between](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) }

[rows_range](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame):
  { ROWS | RANGE }

Structure

  • One window-related function
  • Window definition
    • Frame
    • Partition
    • Order By

The window definition defines a range of relative to the current row. To make that work, "future" rows have to be buffered back to the current row.

It appears that, conceptually, we

  • Gather the list of rows within the frame
  • Compute aggregates for the frame, if any
  • Iterate over the rows
  • Fill in aggregates
  • Compute per-row values (those that depend on offsets)

Since a window allows only one expression per window, the expression will be either an aggregate, or a relative expression. That is true at the syntax level. It would seem a good planner would combine functions with the same window into a single operator.

From Leis:

  • Window function evaluation is based on three simple and orthogonal concepts: partitioning, ordering, and framing.
  • The partition by clause partitions the input by one or more expressions into independent groups, and thereby restricts the window of a tuple.
  • Within each partition, the rows can be ordered using the order by clause.

General Approach

  • Define Calcite syntax
  • Understand Calcite logical plan
  • Define implementation design
  • Operator-based implementation
  • Wire up to Calcite plan
  • Optimize buffering: discard rows ASAP
  • Partition by

Worry about named, table window functions later.

Design Sketch

Window Op
|
Sort Op

Tasks

  • Common row format (Frames with column accessors?) - Done
    • Simple Java object array
    • With column accessor wrapper
    • Define Batch, BatchSchema concepts from earlier discussion
    • Readers and writers
  • Source of data, maybe from CSV.
    • Reuse existing CSV reader?
    • Output is a frame, or the above format?
  • Buffering solution
    • Buffer batches, with row offsets.
  • Ordering solution (sort- and pqueue-based)
    • Ordering in terms of column names or ordinals
    • Translated to ordinals for access
    • Comparators based on column accessors
  • Aggregations
  • Assemble output row
  • Convert from incoming row/batch format
    • From native (many) to batch format (single
  • Convert to outgoing row/batch format
    • From batch format to native (many)
  • Window with entire rows.
  • Agg function over rows.
  • Relative function.
  • Assemble output row.
  • Partitioned window function: streaming and buffered
Clone this wiki locally