From 64e83742c70c468b0456a2d421b77b04a689802c Mon Sep 17 00:00:00 2001 From: Doug Tidwell Date: Mon, 11 Mar 2024 16:40:07 -0400 Subject: [PATCH] Updated links to various external resources --- .../altinity-kb-kill-query.md | 6 +- .../altinity-kb-sample-by.md | 8 +-- .../joins/_index.md | 67 ++++++++++++++++++- .../joins/join-table-engine.md | 3 +- .../machine-learning-in-clickhouse.md | 7 +- .../projections-examples.md | 2 +- .../sampling-example.md | 6 +- ...-ifstate-for-simple-aggregate-functions.md | 8 ++- .../skip-indexes/_index.md | 1 + ...kip-index-bloom_filter-for-array-column.md | 2 +- .../time-zones.md | 2 +- .../troubleshooting.md | 2 +- .../ttl/modify-ttl.md | 2 + .../ttl/ttl-group-by-examples.md | 2 + .../ttl/ttl-recompress-example.md | 1 + .../update-via-dictionary.md | 2 +- .../window-functions.md | 13 ++-- 17 files changed, 101 insertions(+), 33 deletions(-) diff --git a/content/en/altinity-kb-queries-and-syntax/altinity-kb-kill-query.md b/content/en/altinity-kb-queries-and-syntax/altinity-kb-kill-query.md index 8cb6d48148..f3feed94c2 100644 --- a/content/en/altinity-kb-queries-and-syntax/altinity-kb-kill-query.md +++ b/content/en/altinity-kb-queries-and-syntax/altinity-kb-kill-query.md @@ -9,10 +9,10 @@ Unfortunately not all queries can be killed. A query pipeline is checking this flag before a switching to next block. If the pipeline has stuck somewhere in the middle it cannot be killed. If a query does not stop, the only way to get rid of it is to restart ClickHouse. -See also +See also: -[https://github.com/ClickHouse/ClickHouse/issues/3964](https://github.com/ClickHouse/ClickHouse/issues/3964) -[https://github.com/ClickHouse/ClickHouse/issues/1576](https://github.com/ClickHouse/ClickHouse/issues/1576) +* [https://github.com/ClickHouse/ClickHouse/issues/3964](https://github.com/ClickHouse/ClickHouse/issues/3964) +* [https://github.com/ClickHouse/ClickHouse/issues/1576](https://github.com/ClickHouse/ClickHouse/issues/1576) ## How to replace a running query diff --git a/content/en/altinity-kb-queries-and-syntax/altinity-kb-sample-by.md b/content/en/altinity-kb-queries-and-syntax/altinity-kb-sample-by.md index 6dfa4cecbe..bef2c67819 100644 --- a/content/en/altinity-kb-queries-and-syntax/altinity-kb-sample-by.md +++ b/content/en/altinity-kb-queries-and-syntax/altinity-kb-sample-by.md @@ -13,12 +13,12 @@ So that works this way: 3. Here the sampling logic is applied: a) in case of `SAMPLE k` (`k` in `0..1` range) it adds conditions `WHERE sample_key < k * max_int_of_sample_key_type` b) in case of `SAMPLE k OFFSET m` it adds conditions `WHERE sample_key BETWEEN m * max_int_of_sample_key_type AND (m + k) * max_int_of_sample_key_type`c) in case of `SAMPLE N` (N>1) if first estimates how many rows are inside the range we need to read and based on that convert it to 3a case (calculate k based on number of rows in ranges and desired number of rows) 4. on the data returned by those other conditions are applied (so here the number of rows can be decreased here) -[Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355) +* [Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355) ## SAMPLE by -[Docs](https://clickhouse.yandex/docs/en/query_language/select/#select-sample-clause) -[Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355) +* [Docs](https://clickhouse.yandex/docs/en/query_language/select/#select-sample-clause) +* [Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355) SAMPLE key Must be: @@ -56,4 +56,4 @@ SELECT count() FROM table WHERE ... AND cityHash64(some_high_card_key) % 10 = 0; SELECT count() FROM table WHERE ... AND rand() % 10 = 0; -- Non-deterministic ``` -ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key. \ No newline at end of file +ClickHouse will read more data from disk compared to an example with a good SAMPLE key, but it's more universal and can be used if you can't change table ORDER BY key. (To learn more about ClickHouse internals, [ClickHouse Administrator Training](https://altinity.com/clickhouse-training/) is available.) \ No newline at end of file diff --git a/content/en/altinity-kb-queries-and-syntax/joins/_index.md b/content/en/altinity-kb-queries-and-syntax/joins/_index.md index 7868e67f86..6f6267594a 100644 --- a/content/en/altinity-kb-queries-and-syntax/joins/_index.md +++ b/content/en/altinity-kb-queries-and-syntax/joins/_index.md @@ -3,9 +3,70 @@ title: "JOINs" linkTitle: "JOINs" description: > JOINs +aliases: + - /altinity-kb-queries-and-syntax/joins/join-table-engine/ --- -See presentation: +Resources: -[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf) +* [Overview of JOINs (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/join.pdf) - Presentation from Meetup 38 in 2019 +* [Notes on JOIN options](https://excalidraw.com/#json=xX_heZcCu0whsDmC2Mdvo,ppbUVFpPz-flJu5ZDnwIPw) -https://excalidraw.com/#json=xX_heZcCu0whsDmC2Mdvo,ppbUVFpPz-flJu5ZDnwIPw +## Join Table Engine + +The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining. + +### Updates + +It's possible to update rows with setting `join_any_take_last_row` enabled. + +```sql +CREATE TABLE id_val_join +( + `id` UInt32, + `val` UInt8 +) +ENGINE = Join(ANY, LEFT, id) +SETTINGS join_any_take_last_row = 1 + +Ok. + +INSERT INTO id_val_join VALUES (1,21)(1,22)(3,23); + +Ok. + +SELECT * +FROM +( + SELECT toUInt32(number) AS id + FROM numbers(4) +) AS n +ANY LEFT JOIN id_val_join USING (id) + +┌─id─┬─val─┐ +│ 0 │ 0 │ +│ 1 │ 22 │ +│ 2 │ 0 │ +│ 3 │ 23 │ +└────┴─────┘ + +INSERT INTO id_val_join VALUES (1,40)(2,24); + +Ok. + +SELECT * +FROM +( + SELECT toUInt32(number) AS id + FROM numbers(4) +) AS n +ANY LEFT JOIN id_val_join USING (id) + +┌─id─┬─val─┐ +│ 0 │ 0 │ +│ 1 │ 40 │ +│ 2 │ 24 │ +│ 3 │ 23 │ +└────┴─────┘ +``` + +[Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/) diff --git a/content/en/altinity-kb-queries-and-syntax/joins/join-table-engine.md b/content/en/altinity-kb-queries-and-syntax/joins/join-table-engine.md index 86a4453fad..1b0a6fb757 100644 --- a/content/en/altinity-kb-queries-and-syntax/joins/join-table-engine.md +++ b/content/en/altinity-kb-queries-and-syntax/joins/join-table-engine.md @@ -3,6 +3,7 @@ title: "JOIN table engine" linkTitle: "JOIN table engine" description: > JOIN table engine +draft: true --- The main purpose of JOIN table engine is to avoid building the right table for joining on each query execution. So it's usually used when you have a high amount of fast queries which share the same right table for joining. @@ -60,4 +61,4 @@ ANY LEFT JOIN id_val_join USING (id) └────┴─────┘ ``` -[https://clickhouse.tech/docs/en/engines/table-engines/special/join/](https://clickhouse.tech/docs/en/engines/table-engines/special/join/) +[Join table engine documentation](https://clickhouse.com/docs/en/engines/table-engines/special/join/) diff --git a/content/en/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse.md b/content/en/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse.md index 06b52f0340..0fafdb833e 100644 --- a/content/en/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse.md +++ b/content/en/altinity-kb-queries-and-syntax/machine-learning-in-clickhouse.md @@ -4,8 +4,9 @@ linkTitle: "Machine learning in ClickHouse" description: > Machine learning in ClickHouse --- -[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf) -[CatBoost / MindsDB / Fast.ai]({{}}) +Resources -[https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf) +* [Machine Learning in ClickHouse](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup31/ml.pdf) - Presentation from 2019 (Meetup 31) +* [ML discussion: CatBoost / MindsDB / Fast.ai](../../altinity-kb-integrations/catboost-mindsdb-fast.ai) - Brief article from 2021 +* [Machine Learning Forecase (Russian)](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup38/forecast.pdf) - Presentation from 2019 (Meetup 38) diff --git a/content/en/altinity-kb-queries-and-syntax/projections-examples.md b/content/en/altinity-kb-queries-and-syntax/projections-examples.md index c8cf845fce..4b3a5f42ec 100644 --- a/content/en/altinity-kb-queries-and-syntax/projections-examples.md +++ b/content/en/altinity-kb-queries-and-syntax/projections-examples.md @@ -63,7 +63,7 @@ Elapsed: 0.005 sec. Processed 22.43 thousand rows ## Emulation of an inverted index using orderby projection -You can create an `orderby projection` and include all columns of a table, but if a table is very wide it will double of stored data. This expample demonstrate a trick, we create an `orderby projection` and include primary key columns and the target column and sort by the target column. This allows using subquery to find primary key values and after that to query the table using the primary key. +You can create an `orderby projection` and include all columns of a table, but if a table is very wide it will double of stored data. This example demonstrate a trick, we create an `orderby projection` and include primary key columns and the target column and sort by the target column. This allows using subquery to find [primary key values](../../engines/mergetree-table-engine-family/pick-keys/) and after that to query the table using the primary key. ```sql CREATE TABLE test_a diff --git a/content/en/altinity-kb-queries-and-syntax/sampling-example.md b/content/en/altinity-kb-queries-and-syntax/sampling-example.md index 4c28707bd6..6548f39fdc 100644 --- a/content/en/altinity-kb-queries-and-syntax/sampling-example.md +++ b/content/en/altinity-kb-queries-and-syntax/sampling-example.md @@ -2,9 +2,11 @@ title: "Sampling Example" linkTitle: "Sampling Example" description: > - Clickhouse table sampling example + ClickHouse table sampling example --- -The most important idea about sampling that the primary index must have **low cardinality**. The following example demonstrates how sampling can be setup correctly, and an example if it being set up incorrectly as a comparison. +The most important idea about sampling that the primary index must have **LowCardinality**. (For more information, see [the Altinity Knowledge Base article on LowCardinality](../../altinity-kb-schema-design/lowcardinality) or [a ClickHouse user\'s lessons learned from LowCardinality](https://altinity.com/blog/2020-5-20-reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer)). + +The following example demonstrates how sampling can be setup correctly, and an example if it being set up incorrectly as a comparison. Sampling requires `sample by expression` . This ensures a range of sampled column types fit within a specified range, which ensures the requirement of low cardinality. In this example, I cannot use `transaction_id` because I can not ensure that the min value of `transaction_id = 0` and `max value = MAX_UINT64`. Instead, I used `cityHash64(transaction_id)`to expand the range within the minimum and maximum values. diff --git a/content/en/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions.md b/content/en/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions.md index f7bbb99eb3..a966920fc0 100644 --- a/content/en/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions.md +++ b/content/en/altinity-kb-queries-and-syntax/simplestateif-or-ifstate-for-simple-aggregate-functions.md @@ -77,9 +77,11 @@ SimpleAggregateFunction can be used for those aggregations when the function sta -See also -[https://github.com/ClickHouse/ClickHouse/pull/4629](https://github.com/ClickHouse/ClickHouse/pull/4629) -[https://github.com/ClickHouse/ClickHouse/issues/3852](https://github.com/ClickHouse/ClickHouse/issues/3852) +See also: + +* [Altinity Knowledge Base article on AggregatingMergeTree](../../engines/mergetree-table-engine-family/aggregatingmergetree/) +* [https://github.com/ClickHouse/ClickHouse/pull/4629](https://github.com/ClickHouse/ClickHouse/pull/4629) +* [https://github.com/ClickHouse/ClickHouse/issues/3852](https://github.com/ClickHouse/ClickHouse/issues/3852) ### Q. How maxSimpleState combinator result differs from plain max? diff --git a/content/en/altinity-kb-queries-and-syntax/skip-indexes/_index.md b/content/en/altinity-kb-queries-and-syntax/skip-indexes/_index.md index 760784d3b0..d44599af2f 100644 --- a/content/en/altinity-kb-queries-and-syntax/skip-indexes/_index.md +++ b/content/en/altinity-kb-queries-and-syntax/skip-indexes/_index.md @@ -4,3 +4,4 @@ linkTitle: "Skip indexes" description: > Skip indexes --- +ClickHouse provides a type of index that in specific circumstances can significantly improve query speed. These structures are labeled "skip" indexes because they enable ClickHouse to skip reading significant chunks of data that are guaranteed to have no matching values. \ No newline at end of file diff --git a/content/en/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column.md b/content/en/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column.md index 6cdf2bbb5e..f43d52a28d 100644 --- a/content/en/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column.md +++ b/content/en/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column.md @@ -8,7 +8,7 @@ aliases: --- tested with 20.8.17.25 -[https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/\#table_engine-mergetree-data_skipping-indexes](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-data_skipping-indexes) +[https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/\#table_engine-mergetree-data_skipping-indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-data_skipping-indexes) ### Let's create test data diff --git a/content/en/altinity-kb-queries-and-syntax/time-zones.md b/content/en/altinity-kb-queries-and-syntax/time-zones.md index 479b94eb50..dcc443f8eb 100644 --- a/content/en/altinity-kb-queries-and-syntax/time-zones.md +++ b/content/en/altinity-kb-queries-and-syntax/time-zones.md @@ -10,7 +10,7 @@ Important things to know: 2. Conversion from that UNIX timestamp to a human-readable form and reverse can happen on the client (for native clients) and on the server (for HTTP clients, and for some type of queries, like `toString(ts)`) 3. Depending on the place where that conversion happened rules of different timezones may be applied. 4. You can check server timezone using `SELECT timezone()` -5. clickhouse-client also by default tries to use server timezone (see also `--use_client_time_zone` flag) +5. [clickhouse-client](https://docs.altinity.com/altinitycloud/altinity-cloud-connections/clickhouseclient/) also by default tries to use server timezone (see also `--use_client_time_zone` flag) 6. If you want you can store the timezone name inside the data type, in that case, timestamp <-> human-readable time rules of that timezone will be applied. ```sql diff --git a/content/en/altinity-kb-queries-and-syntax/troubleshooting.md b/content/en/altinity-kb-queries-and-syntax/troubleshooting.md index d16e7cdd2b..5acdd7e673 100644 --- a/content/en/altinity-kb-queries-and-syntax/troubleshooting.md +++ b/content/en/altinity-kb-queries-and-syntax/troubleshooting.md @@ -8,7 +8,7 @@ description: > Controlled by session level setting `send_logs_level` Possible values: `'trace', 'debug', 'information', 'warning', 'error', 'fatal', 'none'` -Can be used with clickhouse-client in both interactive and non-interactive mode. +Can be used with [clickhouse-client](https://docs.altinity.com/altinitycloud/altinity-cloud-connections/clickhouseclient/) in both interactive and non-interactive mode. ```bash $ clickhouse-client -mn --send_logs_level='trace' --query "SELECT sum(number) FROM numbers(1000)" diff --git a/content/en/altinity-kb-queries-and-syntax/ttl/modify-ttl.md b/content/en/altinity-kb-queries-and-syntax/ttl/modify-ttl.md index 26564d1db2..5b769aaf47 100644 --- a/content/en/altinity-kb-queries-and-syntax/ttl/modify-ttl.md +++ b/content/en/altinity-kb-queries-and-syntax/ttl/modify-ttl.md @@ -6,6 +6,8 @@ description: >- What happening during MODIFY or ADD TTL query. --- +*For a general overview of TTL, see the article [Putting Things Where They Belong Using New TTL Moves](https://altinity.com/blog/2020-3-23-putting-things-where-they-belong-using-new-ttl-moves).* + ## ALTER TABLE tbl MODIFY (ADD) TTL: It's 2 step process: diff --git a/content/en/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples.md b/content/en/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples.md index c02ec8a046..1f21e82091 100644 --- a/content/en/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples.md +++ b/content/en/altinity-kb-queries-and-syntax/ttl/ttl-group-by-examples.md @@ -457,3 +457,5 @@ OPTIMIZE TABLE test_ttl_group_by FINAL; └────────┴─────────┴────────────┴────────────────┴────────────────┘ ``` + +Also see the [Altinity Knowledge Base pages on the MergeTree table engine family](../../../engines/mergetree-table-engine-family). \ No newline at end of file diff --git a/content/en/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example.md b/content/en/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example.md index 64539b8fcd..d16fb93f23 100644 --- a/content/en/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example.md +++ b/content/en/altinity-kb-queries-and-syntax/ttl/ttl-recompress-example.md @@ -5,6 +5,7 @@ description: > TTL Recompress example --- +*See also [the Altinity Knowledge Base article on testing different compression codecs](../../../altinity-kb-schema-design/codecs/altinity-kb-how-to-test-different-compression-codecs).* ## Example how to create a table and define recompression rules diff --git a/content/en/altinity-kb-queries-and-syntax/update-via-dictionary.md b/content/en/altinity-kb-queries-and-syntax/update-via-dictionary.md index 2d81cb6847..0a6a1c8343 100644 --- a/content/en/altinity-kb-queries-and-syntax/update-via-dictionary.md +++ b/content/en/altinity-kb-queries-and-syntax/update-via-dictionary.md @@ -101,7 +101,7 @@ FROM test_update ``` {{% alert title="Info" color="info" %}} -In case of Replicated installation, Dictionary should be created on all nodes and source tables should have ReplicatedMergeTree engine and be replicated across all nodes. +In case of Replicated installation, Dictionary should be created on all nodes and source tables should use the [ReplicatedMergeTree](../../altinity-kb-setup-and-maintenance/altinity-kb-converting-mergetree-to-replicated/) engine and be replicated across all nodes. {{% /alert %}} {{% alert title="Info" color="info" %}} diff --git a/content/en/altinity-kb-queries-and-syntax/window-functions.md b/content/en/altinity-kb-queries-and-syntax/window-functions.md index 6e55afca69..6b23f8df91 100644 --- a/content/en/altinity-kb-queries-and-syntax/window-functions.md +++ b/content/en/altinity-kb-queries-and-syntax/window-functions.md @@ -4,17 +4,12 @@ linkTitle: "Window functions" description: > Window functions --- -| Link | [blog.tinybird.co/2021/03/16/c…](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/) | -| :--- | :--- | -| Date | Mar 26, 2021 | -![Windows Function Slides](https://api.microlink.io/?adblock=false&meta=false&screenshot&element=%23screenshot&embed=screenshot.url&url=https%3A%2F%2Fcards.microlink.io%2F%3Fpreset%3Dtinybird%26subtitle%3Dtips%26text%3DWindow%2Bfunctions%252C%2Bnested%2Bdata%252C%2BA%2BPostgreSQL%2Bengine%2Band%2Bmore) +#### Resources: -[blog.tinybird.co/2021/03/16/c…](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/) - -> An exploration on what's possible to do with the most recent experimental feature on ClickHouse - window functions, and an overview of other interesting feat... - -[Windows Functions Blog Link](https://blog.tinybird.co/2021/03/16/coming-soon-on-clickhouse-window-functions/) +* [Tutorial: ClickHouse Window Functions](https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art) +* [Video: Fun with ClickHouse Window Functions](https://www.youtube.com/watch?v=sm_vUdMQz4s) +* [Blog: Battle of the Views: ClickHouse Window View vs. Live View](https://altinity.com/blog/battle-of-the-views-clickhouse-window-view-vs-live-view) #### How Do I Simulate Window Functions Using Arrays on older versions of clickhouse?