Skip to content

Commit

Permalink
Update array-functions-as-window.md
Browse files Browse the repository at this point in the history
Fixed Description
  • Loading branch information
TheraniAA authored Sep 21, 2024
1 parent 93334bb commit bb72b84
Showing 1 changed file with 31 additions and 33 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ description: >-
Using array functions to mimic window-functions alike behavior.
---

There are some usecases when you may want to mimic window functions using Arrays - as an optimization step, or to control the memory better / use on-disk spiling, or just if you have old ClickHouse® version.
There are cases where you may need to mimic window functions using arrays in ClickHouse. This could be for optimization purposes, to better manage memory, or to enable on-disk spilling, especially if you’re working with an older version of ClickHouse that doesn't natively support window functions.

## Running difference sample
Here’s an example demonstrating how to mimic a window function like runningDifference() using arrays:

#### Step 1: Create Sample Data
We’ll start by creating a test table with some sample data:

```sql
DROP TABLE IS EXISTS test_running_difference
Expand All @@ -22,10 +25,8 @@ SELECT
FROM numbers(100)


SELECT * FROM test_running_difference
```
SELECT * FROM test_running_difference;

```text
┌─id─┬──────────────────ts─┬────val─┐
02010-01-01 00:00:00-1209
12010-01-01 00:00:0043
Expand Down Expand Up @@ -132,13 +133,15 @@ SELECT * FROM test_running_difference
100 rows in set. Elapsed: 0.003 sec.

```
This table contains IDs, timestamps (ts), and values (val), where each id appears multiple times with different timestamps.

#### Step 2: Running Difference Example
If you try using runningDifference directly, it works block by block, which can be problematic when the data needs to be ordered or when group changes occur.


runningDifference works only in blocks & require ordered data & problematic when group changes
```sql
select id, val, runningDifference(val) from (select * from test_running_difference order by id, ts);
```

```
┌─id─┬────val─┬─runningDifference(val)─┐
0-12090
06683968048
Expand Down Expand Up @@ -246,13 +249,15 @@ select id, val, runningDifference(val) from (select * from test_running_differen
```


## Arrays !
The output may look inconsistent because runningDifference requires ordered data within blocks.

### 1. Group & Collect the data into array
#### Step 3: Using Arrays for Grouping and Calculation
Instead of using runningDifference, we can utilize arrays to group data, sort it, and apply similar logic more efficiently.

**Grouping Data into Arrays** -
You can group multiple columns into arrays by using the groupArray function. For example, to collect several columns as arrays of tuples, you can use the following query:

you can collect several column by building array of tuples:
```
```sql
SELECT
id,
groupArray(tuple(ts, val))
Expand Down Expand Up @@ -283,10 +288,9 @@ GROUP BY id
└────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

### Do needed ordering in each array

For example - by second element of tuple:
```
**Sorting Arrays** -
To sort the arrays by a specific element, for example, by the second element of the tuple, you can use the arraySort function:
```sql
SELECT
id,
arraySort(x -> (x.2), groupArray((ts, val)))
Expand Down Expand Up @@ -319,9 +323,11 @@ GROUP BY id
20 rows in set. Elapsed: 0.004 sec.
```

That can be rewritten like this:
This sorts each array by the val (second element of the tuple) for each id.

```
Simplified Sorting Example - We can rewrite the query in a more concise way using WITH clauses for better readability:

```sql
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows
Expand All @@ -332,9 +338,10 @@ FROM test_running_difference
GROUP BY id
```

### Apply needed logic arrayMap / arrayDifference etc
**Applying Calculations with Arrays** -
Once the data is sorted, you can apply array functions like arrayMap and arrayDifference to calculate differences between values in the arrays:

```
```sql
WITH
groupArray(tuple(ts, val)) as window_rows,
arraySort(x -> x.1, window_rows) as sorted_window_rows,
Expand All @@ -345,10 +352,7 @@ SELECT
sorted_window_rows_val_column_diff
FROM test_running_difference
GROUP BY id
```


```
┌─id─┬─sorted_window_rows_val_column_diff─┐
0 │ [0,68048,68243,72389,67860] │
1 │ [0,19397,17905,16978,18345] │
Expand Down Expand Up @@ -378,10 +382,8 @@ GROUP BY id
You can do also a lot of magic with arrayEnumerate and accessing different values by their ids.


### Now you can return you arrays back to rows


use arrayJoin
**Reverting Arrays Back to Rows** -
You can convert the arrays back into rows using arrayJoin:

```sql
WITH
Expand All @@ -396,9 +398,7 @@ SELECT
FROM test_running_difference
GROUP BY id
```


or ARRAY JOIN
Or use ARRAY JOIN to join the arrays back to the original structure:

```sql
SELECT
Expand All @@ -419,8 +419,6 @@ FROM test_running_difference
GROUP BY id
) as t1
ARRAY JOIN sorted_window_rows_val_column_diff as diff, sorted_window_rows_ts_column as ts

```


etc.
This allows you to manipulate and analyze data within arrays effectively, using powerful functions such as arrayMap, arrayDifference, and arrayEnumerate.

0 comments on commit bb72b84

Please sign in to comment.