Skip to content

Latest commit

 

History

History
311 lines (221 loc) · 10.4 KB

statistical_aggregate_functions.md

File metadata and controls

311 lines (221 loc) · 10.4 KB

Statistical aggregate functions

ZetaSQL supports the following statistical aggregate functions.

CORR

CORR(X1, X2 [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the Pearson coefficient of correlation of a set of number pairs. For each number pair, the first number is the dependent variable and the second number is the independent variable. The return result is between -1 and 1. A result of 0 indicates no correlation.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any input pairs that contain one or more NULL values. If there are fewer than two input pairs without NULL values, this function returns NULL.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

COVAR_POP

COVAR_POP(X1, X2 [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the population covariance of a set of number pairs. The first number is the dependent variable; the second number is the independent variable. The return result is between -Inf and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any input pairs that contain one or more NULL values. If there is no input pair without NULL values, this function returns NULL. If there is exactly one input pair without NULL values, this function returns 0.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

COVAR_SAMP

COVAR_SAMP(X1, X2 [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the sample covariance of a set of number pairs. The first number is the dependent variable; the second number is the independent variable. The return result is between -Inf and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any input pairs that contain one or more NULL values. If there are fewer than two input pairs without NULL values, this function returns NULL.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

STDDEV_POP

STDDEV_POP([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the population (biased) standard deviation of the values. The return result is between 0 and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any NULL inputs. If all inputs are ignored, this function returns NULL.

If this function receives a single non-NULL input, it returns 0.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. DISTINCT: Each distinct value of expression is aggregated only once into the result.
  3. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

STDDEV_SAMP

STDDEV_SAMP([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the sample (unbiased) standard deviation of the values. The return result is between 0 and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any NULL inputs. If there are fewer than two non-NULL inputs, this function returns NULL.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. DISTINCT: Each distinct value of expression is aggregated only once into the result.
  3. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

STDDEV

STDDEV([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

An alias of STDDEV_SAMP.

VAR_POP

VAR_POP([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the population (biased) variance of the values. The return result is between 0 and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any NULL inputs. If all inputs are ignored, this function returns NULL.

If this function receives a single non-NULL input, it returns 0.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. DISTINCT: Each distinct value of expression is aggregated only once into the result.
  3. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

VAR_SAMP

VAR_SAMP([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

Returns the sample (unbiased) variance of the values. The return result is between 0 and +Inf.

All numeric types are supported. If the input is NUMERIC or BIGNUMERIC then the internal aggregation is stable with the final output converted to a DOUBLE. Otherwise the input is converted to a DOUBLE before aggregation, resulting in a potentially unstable result.

This function ignores any NULL inputs. If there are fewer than two non-NULL inputs, this function returns NULL.

Optional Clauses

The clauses are applied in the following order:

  1. OVER: Specifies a window. See Analytic Functions.
  2. DISTINCT: Each distinct value of expression is aggregated only once into the result.
  3. HAVING MAX or HAVING MIN: Restricts the set of rows that the function aggregates by a maximum or minimum value. See HAVING MAX and HAVING MIN clause for details.

Return Data Type

DOUBLE

VARIANCE

VARIANCE([DISTINCT] expression [HAVING {MAX | MIN} expression2])  [OVER (...)]

Description

An alias of VAR_SAMP.