-
Notifications
You must be signed in to change notification settings - Fork 357
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4447 from soerenwolfers/quirks
Quirks
- Loading branch information
Showing
3 changed files
with
81 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
--- | ||
layout: docu | ||
title: SQL Quirks | ||
--- | ||
|
||
Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies. | ||
Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the [SQL Standard](https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/) and specifically to PostgreSQL's dialect (see the [“PostgreSQL Compatibility”]({% link docs/sql/dialect/postgresql_compatibility.md %}) page for exceptions). | ||
The rest may simply come down to different preferences, or we may even agree on what _should_ be done but just haven’t gotten around to it yet. | ||
|
||
Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples. | ||
|
||
## Aggregating Empty Groups | ||
|
||
On empty groups, the aggregate functions `sum`, `list`, and `string_agg` all return `NULL` instead of `0`, `[]` and `''`, respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate [`list_sum`]({% link docs/sql/functions/list.md %}#list_-rewrite-functions), but not by the DuckDB original [`list_dot_product`]({% link docs/sql/functions/list.md %}#list_dot_productlist1-list2) which returns `0` on empty lists. | ||
|
||
## Indexing | ||
|
||
To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (`row_number`, `rank`, `dense_rank`). However, similarly to PostgreSQL, [JSON features use a zero-based indexing]({% link docs/data/json/overview.md %}#indexing). | ||
|
||
> While list functions use a 1-based indexing, `list_reduce` uses a 0-based indexing. This is a [known issue](https://github.com/duckdb/duckdb/issues/14619). | ||
## Expressions | ||
|
||
### Results That May Surprise You | ||
|
||
<!-- markdownlint-disable MD056 --> | ||
|
||
| Expression | Result | Note | | ||
|----------------------------|---------|-------------------------------------------------------------------------------| | ||
| `-1^2` | `1` | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., `-(1^2)` or the [`pow` function]({% link docs/sql/functions/numeric.md %}#powx-y) to avoid mistakes. | | ||
| `'t' = true` | `true` | Compatible with PostgreSQL. | | ||
| `1 = true` | `true` | Not compatible with PostgreSQL. | | ||
| `1 = '1.1'` | `true` | Not compatible with PostgreSQL. | | ||
| `1 IN (0, NULL)` | `NULL` | Makes sense if you think of the `NULL`s in the input and output as `UNKNOWN`. | | ||
| `1 in [0, NULL]` | `false` | | | ||
| `if(NULL > 1, 2, 3)` | `3` | Even though `NULL > 1` is `NULL`. | | ||
| `concat('abc', NULL)` | `abc` | `list_concat` behaves similarly. | | ||
| `'abc' || NULL` | `NULL` | | | ||
|
||
<!-- markdownlint-enable MD056 --> | ||
|
||
### `NaN` Values | ||
|
||
`'NaN'::FLOAT = 'NaN'::FLOAT` and `'NaN'::FLOAT > 3` violate IEEE-754 but mean floating point data types have a total order, like all other datatypes (beware the consequences for `greatest`/`least`) | ||
|
||
### `age` Function | ||
|
||
`age(x)` is `current_date - x` instead of `current_timestamp - x`. Another quirk inherited from PostgreSQL. | ||
|
||
### Extract Functions | ||
|
||
`list_extract` / `map_extract` return `NULL` on non-existing keys. `struct_extract` throws an error because keys of structs are like columns. | ||
|
||
## Clauses | ||
|
||
### Automatic Column Deduplication in `SELECT` | ||
|
||
Column names are deduplicated with the first occurrence shadowing the others: | ||
|
||
```sql | ||
CREATE TABLE tbl AS SELECT 1 AS a; | ||
SELECT a FROM (SELECT *, 2 AS a FROM tbl); | ||
``` | ||
|
||
| a | | ||
|--:| | ||
| 1 | | ||
|
||
### Case Insensitivity for `SELECT`ing Columns | ||
|
||
Due to case-insensitivity, it's not possible to use `SELECT a FROM 'file.parquet'` when a column called `A` appears before the desired column `a` in `file.parquet`. | ||
|
||
### `USING SAMPLE` | ||
|
||
The `USING SAMPLE` clause is syntactically placed after the `WHERE` and `GROUP BY` clauses (same as the `LIMIT` clause) but is semantically applied before both (unlike the `LIMIT` clause). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters