Window Functions

References

Syntax

Postgres

function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )

where window_definition has syntax:

[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

frame_clause has the syntax of one of the following:

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

From Postgres docs:

where frame_start and frame_end can be one of

UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING

and frame_exclusion can be one of

EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS

Apache Drill

Drill uses Calcite, so this is likely to be the Calcite syntax as well. (Double check.) From the docs:

   window_function (expression) OVER (
   [ PARTITION BY expr_list ]
   [ ORDER BY order_list ][ frame_clause ] )

where function is one of the functions described, such as AVG(), and expr_list is:

   expression | column_name [, expr_list ]

and order_list is:

   expression | column_name [ASC | DESC] [ NULLS { FIRST | LAST } ] [, order_list ]

and the optional frame_clause is one of the following frames:

   { RANGE | ROWS } frame_start
   { RANGE | ROWS } BETWEEN frame_start AND frame_end

where frame_start is one of the following choices:

   UNBOUNDED PRECEDING
   CURRENT ROW

and frame_end is one of the following choices:

   CURRENT ROW
   UNBOUNDED FOLLOWING

Calcite

Per the docs:

window:
      windowName
  |   windowSpec

windowSpec:
      '('
      [ windowName ]
      [ ORDER BY orderItem [, orderItem ]* ]
      [ PARTITION BY expression [, expression ]* ]
      [
          RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
      |   ROWS numericExpression { PRECEDING | FOLLOWING }
      ]
      ')'

BigQuery

Per the docs:

function_name ( [ argument_list ] ) OVER over_clause

[over_clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_over_clause):
  { named_window | ( [ [window_specification](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_spec) ] ) }

window_specification:
  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  [ ORDER BY expression [ { ASC | DESC }  ] [, ...] ]
  [ [window_frame_clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) ]

window_frame_clause:
  { rows_range } { [frame_start](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) | [frame_between](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame) }

[rows_range](https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls#def_window_frame):
  { ROWS | RANGE }

Structure

One window-related function
Window definition
- Frame
- Partition
- Order By

The window definition defines a range of relative to the current row. To make that work, "future" rows have to be buffered back to the current row.

It appears that, conceptually, we

Gather the list of rows within the frame
Compute aggregates for the frame, if any
Iterate over the rows
Fill in aggregates
Compute per-row values (those that depend on offsets)

Since a window allows only one expression per window, the expression will be either an aggregate, or a relative expression. That is true at the syntax level. It would seem a good planner would combine functions with the same window into a single operator.

From Leis:

Window function evaluation is based on three simple and orthogonal concepts: partitioning, ordering, and framing.
The partition by clause partitions the input by one or more expressions into independent groups, and thereby restricts the window of a tuple.
Within each partition, the rows can be ordered using the order by clause.

General Approach

Define Calcite syntax
Understand Calcite logical plan
Define implementation design
Operator-based implementation
Wire up to Calcite plan
Optimize buffering: discard rows ASAP
Partition by

Worry about named, table window functions later.

Design Sketch

Window Op
|
Sort Op

Tasks

Common row format (Frames with column accessors?) - Done
- Simple Java object array
- With column accessor wrapper
- Define Batch, BatchSchema concepts from earlier discussion
- Readers and writers
Source of data, maybe from CSV.
- Reuse existing CSV reader?
- Output is a frame, or the above format?
Buffering solution
- Buffer batches, with row offsets.
Ordering solution (sort- and pqueue-based)
- Ordering in terms of column names or ordinals
- Translated to ordinals for access
- Comparators based on column accessors
Aggregations
Assemble output row
Convert from incoming row/batch format
- From native (many) to batch format (single
Convert to outgoing row/batch format
- From batch format to native (many)
Window with entire rows.
Agg function over rows.
Relative function.
Assemble output row.
Partitioned window function: streaming and buffered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly