Merge pull request #4447 from soerenwolfers/quirks

Quirks
duckdb · Jan 8, 2025 · d65c44b · d65c44b
2 parents 8852f95 + d16f8ab
commit d65c44b
Show file tree

Hide file tree

Showing 3 changed files with 81 additions and 2 deletions.
diff --git a/_data/menu_docs_dev.json b/_data/menu_docs_dev.json
@@ -939,6 +939,10 @@
                 {
                   "page": "PostgreSQL Compatibility",
                   "url": "postgresql_compatibility"
+                },
+                {
+                  "page": "SQL Quirks",
+                  "url": "sql_quirks"
                 }
               ]
             },

diff --git a/docs/sql/dialect/sql_quirks.md b/docs/sql/dialect/sql_quirks.md
@@ -0,0 +1,75 @@
+---
+layout: docu
+title: SQL Quirks
+---
+
+Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies.  
+Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the [SQL Standard](https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/) and specifically to PostgreSQL's dialect (see the [“PostgreSQL Compatibility”]({% link docs/sql/dialect/postgresql_compatibility.md %}) page for exceptions).
+The rest may simply come down to different preferences, or we may even agree on what _should_ be done but just haven’t gotten around to it yet.
+
+Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples.
+
+## Aggregating Empty Groups
+
+On empty groups, the aggregate functions `sum`, `list`, and `string_agg` all return `NULL` instead of `0`, `[]` and `''`, respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate [`list_sum`]({% link docs/sql/functions/list.md %}#list_-rewrite-functions), but not by the DuckDB original [`list_dot_product`]({% link docs/sql/functions/list.md %}#list_dot_productlist1-list2) which returns `0` on empty lists.
+
+## Indexing
+
+To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (`row_number`, `rank`, `dense_rank`). However, similarly to PostgreSQL, [JSON features use a zero-based indexing]({% link docs/data/json/overview.md %}#indexing).
+
+> While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619).
+
+## Expressions
+
+### Results That May Surprise You
+
+<!-- markdownlint-disable MD056 -->
+
+| Expression                 | Result  | Note                                                                          |
+|----------------------------|---------|-------------------------------------------------------------------------------|
+| `-1^2`                     | `1`     | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., `-(1^2)` or the [`pow` function]({% link docs/sql/functions/numeric.md %}#powx-y) to avoid mistakes. |
+| `'t' = true`               | `true`  | Compatible with PostgreSQL.                                                   |
+| `1 = true`                 | `true`  | Not compatible with PostgreSQL.                                               |
+| `1 = '1.1'`                | `true`  | Not compatible with PostgreSQL.                                               |
+| `1 IN (0, NULL)`           | `NULL`  | Makes sense if you think of the `NULL`s in the input and output as `UNKNOWN`. |
+| `1 in [0, NULL]`           | `false` |                                                                               |
+| `if(NULL > 1, 2, 3)`       | `3`     | Even though `NULL > 1` is `NULL`.                                             |
+| `concat('abc', NULL)`      | `abc`   | `list_concat` behaves similarly.                                              |
+| `'abc' || NULL`            | `NULL`  |                                                                               |
+
+<!-- markdownlint-enable MD056 -->
+
+### `NaN` Values
+
+`'NaN'::FLOAT = 'NaN'::FLOAT` and `'NaN'::FLOAT > 3` violate IEEE-754 but mean floating point data types have a total order, like all other datatypes (beware the consequences for `greatest`/`least`)
+
+### `age` Function
+
+`age(x)` is `current_date - x` instead of `current_timestamp - x`. Another quirk inherited from PostgreSQL.
+
+### Extract Functions
+
+`list_extract` / `map_extract` return `NULL` on non-existing keys. `struct_extract` throws an error because keys of structs are like columns.
+
+## Clauses
+
+### Automatic Column Deduplication in `SELECT`
+
+Column names are deduplicated with the first occurrence shadowing the others:
+
+```sql
+CREATE TABLE tbl AS SELECT 1 AS a;
+SELECT a FROM (SELECT *, 2 AS a FROM tbl);
+```
+
+| a |
+|--:|
+| 1 |
+
+### Case Insensitivity for `SELECT`ing Columns
+
+Due to case-insensitivity, it's not possible to use `SELECT a FROM 'file.parquet'` when a column called `A` appears before the desired column `a` in `file.parquet`.
+
+### `USING SAMPLE`
+
+The `USING SAMPLE` clause is syntactically placed after the `WHERE` and `GROUP BY` clauses (same as the `LIMIT` clause) but is semantically applied before both (unlike the `LIMIT` clause).
diff --git a/docs/sql/functions/list.md b/docs/sql/functions/list.md
@@ -33,7 +33,7 @@ title: List Functions
 | [`list_intersect(list1, list2)`](#list_intersectlist1-list2) | Returns a list of all the elements that exist in both `l1` and `l2`, without duplicates. |
 | [`list_position(list, element)`](#list_positionlist-element) | Returns the index of the element if the list contains the element. If the element is not found, it returns `NULL`. |
 | [`list_prepend(element, list)`](#list_prependelement-list) | Prepends `element` to `list`. |
-| [`list_reduce(list, lambda)`](#list_reducelist-lambda) | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. |
+| [`list_reduce(list, lambda)`](#list_reducelist-lambda) | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619). |
 | [`list_resize(list, size[, value])`](#list_resizelist-size-value) | Resizes the list to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. |
 | [`list_reverse_sort(list)`](#list_reverse_sortlist) | Sorts the elements of the list in reverse order. See the [Sorting Lists]({% link docs/sql/functions/list.md %}#sorting-lists) section for more details about the `NULL` sorting order. |
 | [`list_reverse(list)`](#list_reverselist) | Reverses the list. |
@@ -279,7 +279,7 @@ title: List Functions
 
 <div class="nostroke_table"></div>
 
-| **Description** | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. |
+| **Description** | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619). |
 | **Example** | `list_reduce([4, 5, 6], (x, y) -> x + y)` |
 | **Result** | `15` |
 | **Aliases** | `array_reduce`, `reduce` |