Skip to content

Commit

Permalink
docs: Restore SQL function examples (apache#17293)
Browse files Browse the repository at this point in the history
* docs: add examples for SQL functions (apache#16745)

* updating first batch of numeric functions

* First batch of functions

* addressing first few comments

* alphabetize list

* draft with suggestions applied

* minor discrepency expr -> <NUMERIC>

* changed raises to calculates

* Update docs/querying/sql-functions.md

* switch to underscore

* changed to exp(1) to match slack message

* adding html text for trademark symbol to .spelling

* fixed discrepancy between description and example

---------

Co-authored-by: Benedict Jin <[email protected]>
(cherry picked from commit 721a650)

* [docs] batch02 of updating functions (apache#16761)

* applying changes

* ensuring batch is updated

* Update docs/querying/sql-functions.md

* raise -> raises

* addressing review

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

---------

Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit ca78788)

* [Docs] batch 03 - trig functions (apache#16795)

* batch 03 - trig functions

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

* applying suggestions and corrections

---------

Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit 028ee23)

* [Docs]Batch04 - Bitwise numeric functions (apache#16805)

* Batch04 - Bitwise numeric functions

* Batch04 - Bitwise numeric functions

* minor fixes

* rewording bitwise_shift functions

* rewording bitwise_shift functions

* Update docs/querying/sql-functions.md

* applying suggestions

---------

Co-authored-by: Benedict Jin <[email protected]>
(cherry picked from commit 85a8a1d)

* [docs] batch 5 updating functions (apache#16812)

* batch 5

* Update docs/querying/sql-functions.md

* applying suggestions

---------

Co-authored-by: Benedict Jin <[email protected]>
(cherry picked from commit 3bb6d40)

* [Docs] Batch06: starting string functions (apache#16838)

* batch06, starting string functions

* addind space after Syntax

* quick change

* correcting spelling

* Update docs/querying/sql-functions.md

* Update sql-functions.md

* applying suggestions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

---------

Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit ebea34a)

* [Docs] Batch08: adding examples to string functions (apache#16871)

* batch08 completed

* reviewing batch08

* apply corrections suggestions by @FrankChen021

(cherry picked from commit 5b94839)

* [Docs] Batch07: adding examples to string functions (apache#16862)

* Lower,Upper,Lpad,Rpad,Parse_long

* up to REGEXP_EXTRACT

* batch 07 ready for review

* updated definitions in scalar

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

* rpad and lpad

* addressing comments

* minor fixes

* improving examples based on suggestions

* matched -> matches

* correcting typo

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

---------

Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit 7256953)

* [Docs] Batch09: only `lookup` (apache#16878)

* [Docs] Batch09: only `lookup`

* slight changes

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* applying suggestiontions

* Apply suggestions from code review

Co-authored-by: Victoria Lim <[email protected]>

* otherwise null -> otherwise returns null

* updating definition in sql-scalar.md

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

* hoping to re-run web checks

* change replaceMissingValueWith -> defaultValue

* Update docs/querying/sql-scalar.md

Co-authored-by: Katya Macedo  <[email protected]>

* acronym_to_name -> airportcode_to_name

* shortens `airportcode_to_name` to `code_to_name`

---------

Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit fda2d19)

* [docs] Batch10 date and time functions (apache#16900)

* just starting

* TIME_PARSE and TIME_FORMAT remaining

* fixing typo

* adding last two functions

* review sql-functions.md

* Apply suggestions from code review

Suggestions that were accepted as is

Co-authored-by: Katya Macedo  <[email protected]>

* Update docs/querying/sql-functions.md

Co-authored-by: Katya Macedo  <[email protected]>

* Update docs/querying/sql-functions.md

needed to confirm that it did indeed return as a number

Co-authored-by: Katya Macedo  <[email protected]>

* reviewing remaining suggestions

* addressing review for time_format

* Apply suggestions from code review

Accepted as is

Co-authored-by: Katya Macedo  <[email protected]>

* addressing final suggestion

* time_zone -> timezone

* timezone fix

---------

Co-authored-by: Katya Macedo <[email protected]>
(cherry picked from commit c4981e3)

* [docs] batch 12: reduction functions (apache#16930)

* [docs] batch 12: reduction functions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

* applying suggestions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

---------

Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
(cherry picked from commit c49dc83)

* [docs] Batch13 IP functions (apache#16947)

* new datasource

* reviewing before pr

* Update docs/querying/sql-functions.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

* Applying suggestions to IPV4_PARSE

---------

Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Charles Smith <[email protected]>
(cherry picked from commit ed81126)

* [docs] Batch11 date and time functions (apache#16926)

* first draft of functions

* minor improvments

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-scalar.md

* Apply suggestions from code review

Accepted as is

Co-authored-by: Katya Macedo  <[email protected]>

* applying next round of suggestions

* fixing missing column name

* addressing floor and ceil functions

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* re-wording TIMESTAMPADD

---------

Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
(cherry picked from commit 2d9e92c)

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

Co-authored-by: Benedict Jin <[email protected]>

* [docs] Batches 14-16, 18: HLL, Theta, Quantiles, other (apache#93)

Co-authored-by: Katya Macedo  <[email protected]>
Co-authored-by: edgar2020 <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Charles Smith <[email protected]>

* batches 20 21 24 25

* fix unnest list

* Add LISTAGG to spelling

* cherry pick batch 21

* cherry pick batch 21

---------

Co-authored-by: Edgar Melendrez <[email protected]>
Co-authored-by: Edgar Melendrez <[email protected]>
Co-authored-by: Benedict Jin <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Charles Smith <[email protected]>
  • Loading branch information
7 people authored and 317brian committed Jan 28, 2025
1 parent 60f234c commit a7c28ff
Show file tree
Hide file tree
Showing 7 changed files with 4,413 additions and 842 deletions.
2 changes: 1 addition & 1 deletion docs/ingestion/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ When performing type-aware schema discovery, Druid can discover all the columns
the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`,
`DOUBLE`, `ARRAY<STRING>`, `ARRAY<LONG>`, `ARRAY<DOUBLE>`, or `COMPLEX<json>` for nested data. For input formats with
native boolean types, Druid ingests these values as longs. Array typed columns can be queried using
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

Mixed type columns follow the same rules for schema differences between segments, and present as the _least_ restrictive
Expand Down
32 changes: 17 additions & 15 deletions docs/querying/sql-aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,19 @@ sidebar_label: "Aggregation functions"

You can use aggregation functions in the SELECT clause of any [Druid SQL](./sql.md) query.

In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and `STRING_AGG` accept the DISTINCT keyword.

:::info
The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation
functions can produce inconsistent results across the same query.

Functions that operate on an input type of "float" or "double" may also see these differences in aggregation
results across multiple query runs because of this. If precisely the same value is desired across multiple query runs,
consider using the `ROUND` function to smooth out the inconsistencies between queries.
:::

## Filter aggregations

Filter any aggregator using the FILTER clause, for example:

```
Expand All @@ -56,16 +69,7 @@ When no rows are selected, aggregation functions return their initial value. Thi
The initial value varies by aggregator. `COUNT` and the approximate count distinct sketch functions
always return 0 as the initial value.

In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and `STRING_AGG` accept the DISTINCT keyword.

:::info
The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation
functions can produce inconsistent results across the same query.

Functions that operate on an input type of "float" or "double" may also see these differences in aggregation
results across multiple query runs because of this. If precisely the same value is desired across multiple query runs,
consider using the `ROUND` function to smooth out the inconsistencies between queries.
:::
## General aggregation functions

|Function|Notes|Default|
|--------|-----|-------|
Expand All @@ -92,10 +96,8 @@ In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and
|`LATEST_BY(expr, timestampExpr, [maxBytesPerValue])`|Returns the latest value of `expr`.<br />The latest value of `expr` is taken from the row with the overall latest non-null value of `timestampExpr`.<br />If the overall latest non-null value of `timestampExpr` appears in multiple rows, the `expr` may be taken from any of those rows.<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to wasted memory.<br/>If `maxBytesPerValue`is omitted; it defaults to `1024`.<br /><br />Use `LATEST` instead of `LATEST_BY` on a table that has rollup enabled and was created with any variant of `EARLIEST`, `LATEST`, `EARLIEST_BY`, or `LATEST_BY`. In these cases, the intermediate type already stores the timestamp, and Druid ignores the value passed in `timestampExpr`. |`null`|
|`ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])`|Returns any value of `expr` including null. This aggregator can simplify and optimize the performance by returning the first encountered value (including `null`).<br /><br />If `expr` is a string or complex type `maxBytesPerValue` amount of space is allocated for the aggregation. Strings longer than this limit are truncated. The `maxBytesPerValue` parameter should be set as low as possible, since high values will lead to wasted memory.<br/>If `maxBytesPerValue` is omitted; it defaults to `1024`. `aggregateMultipleValues` is an optional boolean flag controls the behavior of aggregating a [multi-value dimension](./multi-value-dimensions.md). `aggregateMultipleValues` is set as true by default and returns the stringified array in case of a multi-value dimension. By setting it to false, function will return first value instead. |`null`|
|`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy dimension is included in a row, when using `GROUPING SETS`. Refer to [additional documentation](aggregations.md#grouping-aggregator) on how to infer this number.|N/A|
|`ARRAY_AGG(expr, [size])`|Collects all values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes). If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results within the output array may vary depending on processing order.|`null`|
|`ARRAY_AGG(DISTINCT expr, [size])`|Collects all distinct values of `expr` into an ARRAY, including null values, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_AGG` expression is not currently supported, and the ordering of results will be based on the default for the element type.|`null`|
|`ARRAY_CONCAT_AGG(expr, [size])`|Concatenates all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes). Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not currently supported, and the ordering of results within the output array may vary depending on processing order.|`null`|
|`ARRAY_CONCAT_AGG(DISTINCT expr, [size])`|Concatenates all distinct values of all array `expr` into a single ARRAY, with `size` in bytes limit on aggregation size (default of 1024 bytes) per aggregate. Input `expr` _must_ be an array. Null `expr` will be ignored, but any null values within an `expr` _will_ be included in the resulting array. If the aggregated array grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `ARRAY_CONCAT_AGG` expression is not currently supported, and the ordering of results will be based on the default for the element type.|`null`|
|`ARRAY_AGG([DISTINCT] expr, [size])`|Collects all values of the specified expression into an array. To include only unique values, specify `DISTINCT`. `size` determines the maximum aggregation size in bytes and defaults to 1024 bytes. If the resulting array exceeds the size limit, the query fails. `ORDER BY` is not supported. The order of elements in the output array may vary depending on the processing order.|`null`|
|`ARRAY_CONCAT_AGG([DISTINCT] expr, [size])`|Concatenates array inputs into a single array. To include only unique values, specify `DISTINCT`. `expr` must be an array. `size` determines the maximum aggregation size in bytes and defaults to 1024 bytes. If the resulting array exceeds the size limit, the query fails. Druid ignores null array expressions, but null values within arrays are included in the output. `ORDER BY` is not supported. The order of elements in the output array may vary depending on the processing order.|`null`|
|`STRING_AGG([DISTINCT] expr, [separator, [size]])`|Collects all values (or all distinct values) of `expr` into a single STRING, ignoring null values. Each value is joined by an optional `separator`, which must be a literal STRING. If the `separator` is not provided, strings are concatenated without a separator.<br /><br />An optional `size` in bytes can be supplied to limit aggregation size (default of 1024 bytes). If the aggregated string grows larger than the maximum size in bytes, the query will fail. Use of `ORDER BY` within the `STRING_AGG` expression is not currently supported, and the ordering of results within the output string may vary depending on processing order.|`null`|
|`LISTAGG([DISTINCT] expr, [separator, [size]])`|Synonym for `STRING_AGG`.|`null`|
|`BIT_AND(expr)`|Performs a bitwise AND operation on all input values.|`null`|
Expand All @@ -106,7 +108,7 @@ In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and

These functions create sketch objects that you can use to perform fast, approximate analyses.
For advice on choosing approximate aggregation functions, check out our [approximate aggregations documentation](aggregations.md#approx).
To operate on sketch objects, also see the [DataSketches post aggregator functions](sql-scalar.md#sketch-functions).
To operate on sketch objects, see the scalar [DataSketches post aggregator functions](sql-scalar.md#sketch-functions).

### HLL sketch functions

Expand Down
28 changes: 14 additions & 14 deletions docs/querying/sql-array-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,19 @@ The following table describes array functions. To learn more about array aggrega

|Function|Description|
|--------|-----|
|`ARRAY[expr1, expr2, ...]`|Constructs a SQL `ARRAY` literal from the expression arguments, using the type of the first argument as the output array type.|
|`ARRAY_LENGTH(arr)`|Returns length of the array expression.|
|`ARRAY_OFFSET(arr, long)`|Returns the array element at the 0-based index supplied, or null for an out of range index.|
|`ARRAY_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.|
|`ARRAY_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns true if `arr` contains `expr`. If `expr` is an array, returns true if `arr` contains all elements of `expr`. Otherwise returns false.|
|`ARRAY_OVERLAP(arr1, arr2)`|Returns true if `arr1` and `arr2` have any elements in common, else false.|
|`SCALAR_IN_ARRAY(expr, arr)`|Returns true if the scalar `expr` is present in `arr`. Otherwise, returns false if the scalar `expr` is non-null or `UNKNOWN` if the scalar `expr` is `NULL`.|
|`ARRAY[expr1, expr2, ...]`|Constructs a SQL `ARRAY` literal from the provided expression arguments. All arguments must be of the same type.|
|`ARRAY_APPEND(arr, expr)`|Appends the expression to the array. The source array type determines the resulting array type.|
|`ARRAY_CONCAT(arr1, arr2)`|Concatenates two arrays. The type of `arr1` determines the resulting array type.|
|`ARRAY_CONTAINS(arr, expr)`|Checks if the array contains the specified expression. If the specified expression is a scalar value, returns true if the source array contains the value. If the specified expression is an array, returns true if the source array contains all elements of the expression.|
|`ARRAY_LENGTH(arr)`|Returns the length of the array.|
|`ARRAY_OFFSET(arr, long)`|Returns the array element at the specified zero-based index. Returns null if the index is out of bounds.|
|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.|
|`ARRAY_ORDINAL(arr, long)`|Returns the array element at the specified one-based index. Returns null if the index is out of bounds.|
|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `null`.|
|`ARRAY_PREPEND(expr, arr)`|Adds `expr` to the beginning of `arr`, the resulting array type determined by the type of `arr`.|
|`ARRAY_APPEND(arr, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr`.|
|`ARRAY_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.|
|`ARRAY_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0-based index `start` (inclusive) to `end` (exclusive). Returns `null`, if `start` is less than 0, greater than length of `arr`, or greater than `end`.|
|`ARRAY_TO_STRING(arr, str)`|Joins all elements of `arr` by the delimiter specified by `str`.|
|`STRING_TO_ARRAY(str1, str2)`|Splits `str1` into an array on the delimiter specified by `str2`, which is a regular expression.|
|`ARRAY_TO_MV(arr)`|Converts an `ARRAY` of any type into a multi-value string `VARCHAR`.|
|`ARRAY_OVERLAP(arr1, arr2)`|Returns true if two arrays have any elements in common. Treats `NULL` values as known elements.|
|`ARRAY_PREPEND(expr, arr)`|Prepends the expression to the array. The source array type determines the resulting array type.|
|`ARRAY_SLICE(arr, start, end)`|Returns a subset of the array from the zero-based index `start` (inclusive) to `end` (exclusive). Returns null if `start` is less than 0, greater than the length of the array, or greater than `end`.|
|`ARRAY_TO_MV(arr)`|Converts an array of any type into a [multi-value string](sql-data-types.md#multi-value-strings).|
|`ARRAY_TO_STRING(arr, delimiter)`|Joins all elements of the array into a string using the specified delimiter.|
|`SCALAR_IN_ARRAY(expr, arr)`|Checks if the scalar value is present in the array. Returns false if the value is non-null, or `UNKNOWN` if the value is `NULL`. Returns `UNKNOWN` if the array is `NULL`.|
|`STRING_TO_ARRAY(string, delimiter)`|Splits the string into an array of substrings using the specified delimiter. The delimiter must be a valid regular expression.|
Loading

0 comments on commit a7c28ff

Please sign in to comment.