Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated links to various external resources #74

Merged
merged 1 commit into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Unfortunately not all queries can be killed.
A query pipeline is checking this flag before a switching to next block. If the pipeline has stuck somewhere in the middle it cannot be killed.
If a query does not stop, the only way to get rid of it is to restart ClickHouse.

See also
See also:

[https://github.com/ClickHouse/ClickHouse/issues/3964](https://github.com/ClickHouse/ClickHouse/issues/3964)
[https://github.com/ClickHouse/ClickHouse/issues/1576](https://github.com/ClickHouse/ClickHouse/issues/1576)
* [https://github.com/ClickHouse/ClickHouse/issues/3964](https://github.com/ClickHouse/ClickHouse/issues/3964)
* [https://github.com/ClickHouse/ClickHouse/issues/1576](https://github.com/ClickHouse/ClickHouse/issues/1576)

## How to replace a running query

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ So that works this way:
3. Here the sampling logic is applied: a) in case of `SAMPLE k` (`k` in `0..1` range) it adds conditions `WHERE sample_key < k * max_int_of_sample_key_type` b) in case of `SAMPLE k OFFSET m` it adds conditions `WHERE sample_key BETWEEN m * max_int_of_sample_key_type AND (m + k) * max_int_of_sample_key_type`c) in case of `SAMPLE N` (N>1) if first estimates how many rows are inside the range we need to read and based on that convert it to 3a case (calculate k based on number of rows in ranges and desired number of rows)
4. on the data returned by those other conditions are applied (so here the number of rows can be decreased here)

[Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355)
* [Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355)

## SAMPLE by

[Docs](https://clickhouse.yandex/docs/en/query_language/select/#select-sample-clause)
[Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355)
* [Docs](https://clickhouse.yandex/docs/en/query_language/select/#select-sample-clause)
* [Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355)

SAMPLE key
Must be:
Expand Down Expand Up @@ -56,4 +56,4 @@ SELECT count() FROM table WHERE ... AND cityHash64(some_high_card_key) % 10 = 0;
SELECT count() FROM table WHERE ... AND rand() % 10 = 0; -- Non-deterministic
```

ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key.
ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key. (To learn more about ClickHouse internals, [ClickHouse Administrator Training](https://altinity.com/clickhouse-training/) is available.)
67 changes: 64 additions & 3 deletions content/en/altinity-kb-queries-and-syntax/joins/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,70 @@ title: "JOINs"
linkTitle: "JOINs"
description: >
JOINs
aliases:
- /altinity-kb-queries-and-syntax/joins/join-table-engine/
---
See presentation:
Resources:

[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf)
* [Overview of JOINs (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf) - Presentation from Meetup 38 in 2019
* [Notes on JOIN options](https://excalidraw.com/#json=xX_heZcCu0whsDmC2Mdvo,ppbUVFpPz-flJu5ZDnwIPw)

https://excalidraw.com/#json=xX_heZcCu0whsDmC2Mdvo,ppbUVFpPz-flJu5ZDnwIPw
## Join Table Engine

The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining.

### Updates

It's possible to update rows with setting `join_any_take_last_row` enabled.

```sql
CREATE TABLE id_val_join
(
`id` UInt32,
`val` UInt8
)
ENGINE = Join(ANY, LEFT, id)
SETTINGS join_any_take_last_row = 1

Ok.

INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23);

Ok.

SELECT *
FROM
(
SELECT toUInt32(number) AS id
FROM numbers(4)
) AS n
ANY LEFT JOIN id_val_join USING (id)

┌─id─┬─val─┐
│ 0 │ 0 │
│ 1 │ 22 │
│ 2 │ 0 │
│ 3 │ 23 │
└────┴─────┘

INSERT INTO id_val_join VALUES (1,40)(2,24);

Ok.

SELECT *
FROM
(
SELECT toUInt32(number) AS id
FROM numbers(4)
) AS n
ANY LEFT JOIN id_val_join USING (id)

┌─id─┬─val─┐
│ 0 │ 0 │
│ 1 │ 40 │
│ 2 │ 24 │
│ 3 │ 23 │
└────┴─────┘
```

[Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/)
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ title: "JOIN table engine"
linkTitle: "JOIN table engine"
description: >
JOIN table engine
draft: true
---
The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining.

Expand Down Expand Up @@ -60,4 +61,4 @@ ANY LEFT JOIN id_val_join USING (id)
└────┴─────┘
```

[https://clickhouse.tech/docs/en/engines/table-engines/special/join/](https://clickhouse.tech/docs/en/engines/table-engines/special/join/)
[Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/)
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@ linkTitle: "Machine learning in ClickHouse"
description: >
Machine learning in ClickHouse
---
[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf)

[CatBoost / MindsDB / Fast.ai]({{<ref "../altinity-kb-integrations/catboost-mindsdb-fast.ai.md" >}})
Resources

[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf)
* [Machine Learning in ClickHouse](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf) - Presentation from 2019 (Meetup 31)
* [ML discussion: CatBoost / MindsDB / Fast.ai](../../altinity-kb-integrations/catboost-mindsdb-fast.ai) - Brief article from 2021
* [Machine Learning Forecase (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf) - Presentation from 2019 (Meetup 38)
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Elapsed: 0.005 sec. Processed 22.43 thousand rows

## Emulation of an inverted index using orderby projection

You can create an `orderby projection` and include all columns of a table, but if a table is very wide it will double of stored data. This expample demonstrate a trick, we create an `orderby projection` and include primary key columns and the target column and sort by the target column. This allows using subquery to find primary key values and after that to query the table using the primary key.
You can create an `orderby projection` and include all columns of a table, but if a table is very wide it will double of stored data. This example demonstrate a trick, we create an `orderby projection` and include primary key columns and the target column and sort by the target column. This allows using subquery to find [primary key values](../../engines/mergetree-table-engine-family/pick-keys/) and after that to query the table using the primary key.

```sql
CREATE TABLE test_a
Expand Down
6 changes: 4 additions & 2 deletions content/en/altinity-kb-queries-and-syntax/sampling-example.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
title: "Sampling Example"
linkTitle: "Sampling Example"
description: >
Clickhouse table sampling example
ClickHouse table sampling example
---
The most important idea about sampling that the primary index must have **low cardinality**. The following example demonstrates how sampling can be setup correctly, and an example if it being set up incorrectly as a comparison.
The most important idea about sampling that the primary index must have **LowCardinality**. (For more information, see [the Altinity Knowledge Base article on LowCardinality](../../altinity-kb-schema-design/lowcardinality) or [a ClickHouse user\'s lessons learned from LowCardinality](https://altinity.com/blog/2020-5-20-reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer)).

The following example demonstrates how sampling can be setup correctly, and an example if it being set up incorrectly as a comparison.

Sampling requires `sample by expression` . This ensures a range of sampled column types fit within a specified range, which ensures the requirement of low cardinality. In this example, I cannot use `transaction_id` because I can not ensure that the min value of `transaction_id = 0` and `max value = MAX_UINT64`. Instead, I used `cityHash64(transaction_id)`to expand the range within the minimum and maximum values.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,11 @@ SimpleAggregateFunction can be used for those aggregations when the function sta
</tbody>
</table>

See also
[https://github.com/ClickHouse/ClickHouse/pull/4629](https://github.com/ClickHouse/ClickHouse/pull/4629)
[https://github.com/ClickHouse/ClickHouse/issues/3852](https://github.com/ClickHouse/ClickHouse/issues/3852)
See also:

* [Altinity Knowledge Base article on AggregatingMergeTree](../../engines/mergetree-table-engine-family/aggregatingmergetree/)
* [https://github.com/ClickHouse/ClickHouse/pull/4629](https://github.com/ClickHouse/ClickHouse/pull/4629)
* [https://github.com/ClickHouse/ClickHouse/issues/3852](https://github.com/ClickHouse/ClickHouse/issues/3852)

### Q. How maxSimpleState combinator result differs from plain max?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ linkTitle: "Skip indexes"
description: >
Skip indexes
---
ClickHouse provides a type of index that in specific circumstances can significantly improve query speed. These structures are labeled "skip" indexes because they enable ClickHouse to skip reading significant chunks of data that are guaranteed to have no matching values.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases:
---
tested with 20.8.17.25

[https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/\#table_engine-mergetree-data_skipping-indexes](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-data_skipping-indexes)
[https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/\#table_engine-mergetree-data_skipping-indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-data_skipping-indexes)

### Let's create test data

Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/time-zones.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Important things to know:
2. Conversion from that UNIX timestamp to a human-readable form and reverse can happen on the client (for native clients) and on the server (for HTTP clients, and for some type of queries, like `toString(ts)`)
3. Depending on the place where that conversion happened rules of different timezones may be applied.
4. You can check server timezone using `SELECT timezone()`
5. clickhouse-client also by default tries to use server timezone (see also `--use_client_time_zone` flag)
5. [clickhouse-client](https://docs.altinity.com/altinitycloud/altinity-cloud-connections/clickhouseclient/) also by default tries to use server timezone (see also `--use_client_time_zone` flag)
6. If you want you can store the timezone name inside the data type, in that case, timestamp <-> human-readable time rules of that timezone will be applied.

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: >

Controlled by session level setting `send_logs_level`
Possible values: `'trace', 'debug', 'information', 'warning', 'error', 'fatal', 'none'`
Can be used with clickhouse-client in both interactive and non-interactive mode.
Can be used with [clickhouse-client](https://docs.altinity.com/altinitycloud/altinity-cloud-connections/clickhouseclient/) in both interactive and non-interactive mode.

```bash
$ clickhouse-client -mn --send_logs_level='trace' --query "SELECT sum(number) FROM numbers(1000)"
Expand Down
2 changes: 2 additions & 0 deletions content/en/altinity-kb-queries-and-syntax/ttl/modify-ttl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ description: >-
What happening during MODIFY or ADD TTL query.
---

*For a general overview of TTL, see the article [Putting Things Where They Belong Using New TTL Moves](https://altinity.com/blog/2020-3-23-putting-things-where-they-belong-using-new-ttl-moves).*

## ALTER TABLE tbl MODIFY (ADD) TTL:

It's 2 step process:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -457,3 +457,5 @@ OPTIMIZE TABLE test_ttl_group_by FINAL;
└────────┴─────────┴────────────┴────────────────┴────────────────┘

```

Also see the [Altinity Knowledge Base pages on the MergeTree table engine family](../../../engines/mergetree-table-engine-family).
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >
TTL Recompress example
---

*See also [the Altinity Knowledge Base article on testing different compression codecs](../../../altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs).*

## Example how to create a table and define recompression rules

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ FROM test_update
```

{{% alert title="Info" color="info" %}}
In case of Replicated installation, Dictionary should be created on all nodes and source tables should have ReplicatedMergeTree engine and be replicated across all nodes.
In case of Replicated installation, Dictionary should be created on all nodes and source tables should use the [ReplicatedMergeTree](../../altinity-kb-setup-and-maintenance/altinity-kb-converting-mergetree-to-replicated/) engine and be replicated across all nodes.
{{% /alert %}}

{{% alert title="Info" color="info" %}}
Expand Down
13 changes: 4 additions & 9 deletions content/en/altinity-kb-queries-and-syntax/window-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,12 @@ linkTitle: "Window functions"
description: >
Window functions
---
| Link | [blog.tinybird.co/2021/03/16/c…](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/) |
| :--- | :--- |
| Date | Mar 26, 2021 |

![Windows Function Slides](https://api.microlink.io/?adblock=false&meta=false&screenshot&element=%23screenshot&embed=screenshot.url&url=https%3A%2F%2Fcards.microlink.io%2F%3Fpreset%3Dtinybird%26subtitle%3Dtips%26text%3DWindow%2Bfunctions%252C%2Bnested%2Bdata%252C%2BA%2BPostgreSQL%2Bengine%2Band%2Bmore)
#### Resources:

[blog.tinybird.co/2021/03/16/c…](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/)

> An exploration on what's possible to do with the most recent experimental feature on ClickHouse - window functions, and an overview of other interesting feat...

[Windows Functions Blog Link](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/)
* [Tutorial: ClickHouse Window Functions](https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art)
* [Video: Fun with ClickHouse Window Functions](https://www.youtube.com/watch?v=sm_vUdMQz4s)
* [Blog: Battle of the Views: ClickHouse Window View vs. Live View](https://altinity.com/blog/battle-of-the-views-clickhouse-window-view-vs-live-view)

#### How Do I Simulate Window Functions Using Arrays on older versions of clickhouse?

Expand Down
Loading