Skip to content

Commit

Permalink
Merge pull request #108 from Altinity/ashwini-ahire7-patch-5
Browse files Browse the repository at this point in the history
Update state-and-merge-combinators.md
  • Loading branch information
Slach authored Dec 3, 2024
2 parents 9d5a59d + 4d3cb21 commit 48d8368
Showing 1 changed file with 25 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@ linkTitle: "-State & -Merge combinators"
description: >
-State & -Merge combinators
---
The ClickHouse® -State combinator doesn't actually store information about -If combinator, so aggregate functions with -If and without have the same serialized data.

The -State combinator in ClickHouse® does not store additional information about the -If combinator, which means that aggregate functions with and without -If have the same serialized data structure. This can be verified through various examples, as demonstrated below.

**Example 1**: maxIfState and maxState
In this example, we use the maxIfState and maxState functions on a dataset of numbers, serialize the result, and merge it using the maxMerge function.

```sql
$ clickhouse-local --query "SELECT maxIfState(number,number % 2) as x, maxState(number) as y FROM numbers(10) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(max,UInt64), y AggregateFunction(max,UInt64)" --query "SELECT maxMerge(x), maxMerge(y) FROM table"
Expand All @@ -13,7 +17,11 @@ $ clickhouse-local --query "SELECT maxIfState(number,number % 2) as x, maxState(
9 10
```

-State combinator have the same serialized data footprint regardless of parameters used in definition of aggregate function. That's true for quantile\* and sequenceMatch/sequenceCount functions.
In both cases, the -State combinator results in identical serialized data footprints, regardless of the conditions in the -If variant. The maxMerge function merges the state without concern for the original -If condition.

**Example 2**: quantilesTDigestIfState
Here, we use the quantilesTDigestIfState function to demonstrate that functions like quantile-based and sequence matching functions follow the same principle regarding serialized data consistency.


```sql
$ clickhouse-local --query "SELECT quantilesTDigestIfState(0.1,0.9)(number,number % 2) FROM numbers(1000000) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(quantileTDigestWeighted(0.5),UInt64,UInt8)" --query "SELECT quantileTDigestWeightedMerge(0.4)(x) FROM table"
Expand All @@ -22,6 +30,12 @@ $ clickhouse-local --query "SELECT quantilesTDigestIfState(0.1,0.9)(number,numbe
$ clickhouse-local --query "SELECT quantilesTDigestIfState(0.1,0.9)(number,number % 2) FROM numbers(1000000) FORMAT RowBinary" | clickhouse-local --input-format RowBinary --structure="x AggregateFunction(quantilesTDigestWeighted(0.5),UInt64,UInt8)" --query "SELECT quantilesTDigestWeightedMerge(0.4,0.8)(x) FROM table"
[400000,800000]

```

**Example 3**: Quantile Functions with -Merge
This example shows how the quantileState and quantileMerge functions work together to calculate a specific quantile.

```sql
SELECT quantileMerge(0.9)(x)
FROM
(
Expand All @@ -34,6 +48,9 @@ FROM
└───────────────────────┘
```

**Example 4**: sequenceMatch and sequenceCount Functions with -Merge
Finally, we demonstrate the behavior of sequenceMatchState and sequenceMatchMerge, as well as sequenceCountState and sequenceCountMerge, in ClickHouse.

```sql
SELECT
sequenceMatchMerge('(?2)(?3)')(x) AS `2_3`,
Expand All @@ -48,6 +65,11 @@ FROM
┌─2_3─┬─1_4─┬─1_2_3─┐
110
└─────┴─────┴───────┘
```

Similarly, sequenceCountState and sequenceCountMerge functions behave consistently when merging states:

```sql

SELECT
sequenceCountMerge('(?1)(?2)')(x) AS `2_3`,
Expand All @@ -64,3 +86,4 @@ FROM
302
└─────┴─────┴───────┘
```
ClickHouse's -State combinator stores serialized data in a consistent manner, irrespective of conditions used with -If. The same applies to a wide range of functions, including quantile and sequence-based functions. This behavior ensures that functions like maxMerge, quantileMerge, sequenceMatchMerge, and sequenceCountMerge work seamlessly, even across varied inputs.

0 comments on commit 48d8368

Please sign in to comment.