Skip to content

Commit

Permalink
Merge pull request #4447 from soerenwolfers/quirks
Browse files Browse the repository at this point in the history
Quirks
  • Loading branch information
szarnyasg authored Jan 8, 2025
2 parents 8852f95 + d16f8ab commit d65c44b
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 2 deletions.
4 changes: 4 additions & 0 deletions _data/menu_docs_dev.json
Original file line number Diff line number Diff line change
Expand Up @@ -939,6 +939,10 @@
{
"page": "PostgreSQL Compatibility",
"url": "postgresql_compatibility"
},
{
"page": "SQL Quirks",
"url": "sql_quirks"
}
]
},
Expand Down
75 changes: 75 additions & 0 deletions docs/sql/dialect/sql_quirks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
layout: docu
title: SQL Quirks
---

Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies.
Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the [SQL Standard](https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/) and specifically to PostgreSQL's dialect (see the [“PostgreSQL Compatibility”]({% link docs/sql/dialect/postgresql_compatibility.md %}) page for exceptions).
The rest may simply come down to different preferences, or we may even agree on what _should_ be done but just haven’t gotten around to it yet.

Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples.

## Aggregating Empty Groups

On empty groups, the aggregate functions `sum`, `list`, and `string_agg` all return `NULL` instead of `0`, `[]` and `''`, respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate [`list_sum`]({% link docs/sql/functions/list.md %}#list_-rewrite-functions), but not by the DuckDB original [`list_dot_product`]({% link docs/sql/functions/list.md %}#list_dot_productlist1-list2) which returns `0` on empty lists.

## Indexing

To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (`row_number`, `rank`, `dense_rank`). However, similarly to PostgreSQL, [JSON features use a zero-based indexing]({% link docs/data/json/overview.md %}#indexing).

> While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619).
## Expressions

### Results That May Surprise You

<!-- markdownlint-disable MD056 -->

| Expression | Result | Note |
|----------------------------|---------|-------------------------------------------------------------------------------|
| `-1^2` | `1` | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., `-(1^2)` or the [`pow` function]({% link docs/sql/functions/numeric.md %}#powx-y) to avoid mistakes. |
| `'t' = true` | `true` | Compatible with PostgreSQL. |
| `1 = true` | `true` | Not compatible with PostgreSQL. |
| `1 = '1.1'` | `true` | Not compatible with PostgreSQL. |
| `1 IN (0, NULL)` | `NULL` | Makes sense if you think of the `NULL`s in the input and output as `UNKNOWN`. |
| `1 in [0, NULL]` | `false` | |
| `if(NULL > 1, 2, 3)` | `3` | Even though `NULL > 1` is `NULL`. |
| `concat('abc', NULL)` | `abc` | `list_concat` behaves similarly. |
| `'abc' || NULL` | `NULL` | |

<!-- markdownlint-enable MD056 -->

### `NaN` Values

`'NaN'::FLOAT = 'NaN'::FLOAT` and `'NaN'::FLOAT > 3` violate IEEE-754 but mean floating point data types have a total order, like all other datatypes (beware the consequences for `greatest`/`least`)

### `age` Function

`age(x)` is `current_date - x` instead of `current_timestamp - x`. Another quirk inherited from PostgreSQL.

### Extract Functions

`list_extract` / `map_extract` return `NULL` on non-existing keys. `struct_extract` throws an error because keys of structs are like columns.

## Clauses

### Automatic Column Deduplication in `SELECT`

Column names are deduplicated with the first occurrence shadowing the others:

```sql
CREATE TABLE tbl AS SELECT 1 AS a;
SELECT a FROM (SELECT *, 2 AS a FROM tbl);
```

| a |
|--:|
| 1 |

### Case Insensitivity for `SELECT`ing Columns

Due to case-insensitivity, it's not possible to use `SELECT a FROM 'file.parquet'` when a column called `A` appears before the desired column `a` in `file.parquet`.

### `USING SAMPLE`

The `USING SAMPLE` clause is syntactically placed after the `WHERE` and `GROUP BY` clauses (same as the `LIMIT` clause) but is semantically applied before both (unlike the `LIMIT` clause).
4 changes: 2 additions & 2 deletions docs/sql/functions/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ title: List Functions
| [`list_intersect(list1, list2)`](#list_intersectlist1-list2) | Returns a list of all the elements that exist in both `l1` and `l2`, without duplicates. |
| [`list_position(list, element)`](#list_positionlist-element) | Returns the index of the element if the list contains the element. If the element is not found, it returns `NULL`. |
| [`list_prepend(element, list)`](#list_prependelement-list) | Prepends `element` to `list`. |
| [`list_reduce(list, lambda)`](#list_reducelist-lambda) | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. |
| [`list_reduce(list, lambda)`](#list_reducelist-lambda) | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619). |
| [`list_resize(list, size[, value])`](#list_resizelist-size-value) | Resizes the list to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. |
| [`list_reverse_sort(list)`](#list_reverse_sortlist) | Sorts the elements of the list in reverse order. See the [Sorting Lists]({% link docs/sql/functions/list.md %}#sorting-lists) section for more details about the `NULL` sorting order. |
| [`list_reverse(list)`](#list_reverselist) | Reverses the list. |
Expand Down Expand Up @@ -279,7 +279,7 @@ title: List Functions

<div class="nostroke_table"></div>

| **Description** | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. |
| **Description** | Returns a single value that is the result of applying the lambda function to each element of the input list. See the [Lambda Functions]({% link docs/sql/functions/lambda.md %}#reduce) page for more details. While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619). |
| **Example** | `list_reduce([4, 5, 6], (x, y) -> x + y)` |
| **Result** | `15` |
| **Aliases** | `array_reduce`, `reduce` |
Expand Down

0 comments on commit d65c44b

Please sign in to comment.