Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaned up page metadata, added links to related resources #117

Merged
merged 1 commit into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions content/en/altinity-kb-queries-and-syntax/pivot-unpivot.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ title: "PIVOT / UNPIVOT"
linkTitle: "PIVOT / UNPIVOT"
description: >
PIVOT / UNPIVOT
keywords:
- clickhouse pivot
- clickhouse unpivot
---
## PIVOT

Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
---
title: "System tables ate my disk"
linkTitle: "System tables ate my disk"
linkTitle: "Regulating the size of System tables"
description: >
System tables ate my disk
When the ClickHouse® SYSTEM database gets out of hand
keywords:
- clickhouse system tables
---
> **Note 1:** System database stores virtual tables (**parts**, **tables,** **columns, etc.**) and \***_log** tables.
>
> Virtual tables do not persist on disk. They reflect ClickHouse® memory (c++ structures). They cannot be changed or removed.
>
> Log tables are named with postfix \***_log** and have the MergeTree engine. ClickHouse does not use information stored in these tables, this data is for you only.
> Log tables are named with postfix \***_log** and have the [MergeTree engine](/engines/mergetree-table-engine-family/). ClickHouse does not use information stored in these tables, this data is for you only.
>
> You can drop / rename / truncate \***_log** tables at any time. ClickHouse will recreate them in about 7 seconds (flush period).

> **Note 2:** Log tables with numeric postfixes (_1 / 2 / 3 ...) `query_log_1 query_thread_log_3` are results of ClickHouse upgrades (or other changes of schemas of these tables). When a new version of ClickHouse starts and discovers that a system log table's schema is incompatible with a new schema, then ClickHouse renames the old *_log table to the name with the prefix and creates a table with the new schema. You can drop such tables if you don't need such historic data.
> **Note 2:** Log tables with numeric postfixes (_1 / 2 / 3 ...) `query_log_1 query_thread_log_3` are results of [ClickHouse upgrades](https://altinity.com/clickhouse-upgrade-overview/) (or other changes of schemas of these tables). When a new version of ClickHouse starts and discovers that a system log table's schema is incompatible with a new schema, then ClickHouse renames the old *_log table to the name with the prefix and creates a table with the new schema. You can drop such tables if you don't need such historic data.

## You can disable all / any of them

Expand Down Expand Up @@ -99,7 +101,7 @@ Important part here is a daily partitioning `PARTITION BY (event_date)` in this

Usual TTL processing (when table partitioned by toYYYYMM and TTL by day) is heavy CPU / Disk I/O consuming operation which re-writes data parts without expired rows.

You can add TTL without ClickHouse restart (and table dropping or renaming):
You can [add TTL without ClickHouse restart](/altinity-kb-queries-and-syntax/ttl/modify-ttl/) (and table dropping or renaming):

```sql
ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 14 DAY;
Expand All @@ -122,7 +124,7 @@ $ cat /etc/clickhouse-server/config.d/query_log_ttl.xml
</query_log>
</clickhouse>
```
💡 For the clickhouse-operator, the above method of using only the `<engine>` tag without `<ttl>` or `<partition>` is recommended, because of possible configuration clashes.
💡 For the [clickhouse-operator](https://github.com/Altinity/clickhouse-operator/blob/master/README.md), the above method of using only the `<engine>` tag without `<ttl>` or `<partition>` is recommended, because of possible configuration clashes.

After that you need to restart ClickHouse and *if using old clickhouse versions like 20 or less*, drop or rename the existing system.query_log table and then CH creates a new table with these settings. This is automatically done in newer versions 21+.

Expand Down
33 changes: 15 additions & 18 deletions content/en/engines/mergetree-table-engine-family/pick-keys.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for the MergeTree family table"
linkTitle: "Proper ordering and partitioning MergeTree tables"
linkTitle: "Properly ordering and partitioning MergeTree tables"
keywords:
- order by clickhouse
- clickhouse partition by
- order by clickhouse
- clickhouse partition by
weight: 100
description: >-
How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for MergeTree tables.
Optimizing ClickHouse® MergeTree tables
---

Good `order by` usually have 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
Expand All @@ -16,13 +16,13 @@ Practical approach to create an good ORDER BY for a table:
1. Pick the columns you use in filtering always
2. The most important for filtering and the lowest cardinal should be the left-most. Typically it's something like `tenant_id`
3. Next column is more cardinal, less important. It can be rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
4. repeat p.3 once again (or few times)
5. if you added already all columns important for filtering and you still not addressing a single row with you pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
6. if you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
4. Repeat step 3 once again (or a few times)
5. If you already added all columns important for filtering and you're still not addressing a single row with your pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
special variants of MergeTree may require special ORDER BY to make the record unique etc.
7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse) it usually make sense to put timestamp as latest column in ORDER BY, it helps with putting the same data near by for better locality. There is only 2 major patterns for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). First one is useful when your often query small part of table partition. (table partitioned by months and your read only 1-4 days 90% of times)

Some examples of good order by
Some examples of good `ORDER BY`:
```
ORDER BY (tenantid, site_id, utm_source, clientid, timestamp)
```
Expand All @@ -32,7 +32,7 @@ ORDER BY (site_id, toStartOfHour(timestamp), sessionid, timestamp )
PRIMARY KEY (site_id, toStartOfHour(timestamp), sessionid)
```


(FWIW, the Altinity blog has [a great article on the LowCardinality datatype](https://altinity.com/blog/2019-3-27-low-cardinality).)

### For Summing / Aggregating

Expand Down Expand Up @@ -93,7 +93,7 @@ ORDER BY col1, col2
FORMAT `Null`
```

Here for the filtering it will use the skipping index to select the parts `WHERE col1 > xxx` and the result wont be need to be ordered because the `ORDER BY` in the query aligns with the `ORDER BY` in the table and the data is already ordered in disk.
Here for the filtering it will use the skipping index to select the parts `WHERE col1 > xxx` and the result won't be need to be ordered because the `ORDER BY` in the query aligns with the `ORDER BY` in the table and the data is already ordered in disk. (FWIW, Alexander Zaitsev and Mikhail Filimonov wrote [a great post on skipping indexes and how they work](https://altinity.com/blog/clickhouse-black-magic-skipping-indices) for the Altinity blog.)

```bash
executeQuery: (from [::ffff:192.168.11.171]:39428, user: admin) SELECT * FROM order_test WHERE col1 > toDateTime('2020-10-01') ORDER BY col1,col2 FORMAT Null; (stage: Complete)
Expand Down Expand Up @@ -199,6 +199,8 @@ Ok.

## PARTITION BY

Things to consider:

* Good size for single partition is something like 1-300Gb.
* For Summing/Replacing a bit smaller (400Mb-40Gb)
* Better to avoid touching more that few dozens of partitions with typical SELECT query.
Expand Down Expand Up @@ -227,12 +229,7 @@ PARTITION BY userid % 16

For the small tables (smaller than few gigabytes) partitioning is usually not needed at all (just skip `PARTITION BY` expression when you create the table).

### See also

[How to change ORDER BY](/altinity-kb-schema-design/change-order-by/)

### ClickHouse Anti-Patterns. Learning from Users' Mistakes

A short talk by Mikhail Filimonov
## See also

https://youtu.be/DP7l6Swkskw?t=3777
* [How to change ORDER BY](/altinity-kb-schema-design/change-order-by/)
* [ClickHouse Anti-Patterns: Learning from Users\' Mistakes](https://youtu.be/DP7l6Swkskw?t=3777), a short talk by Mikhail Filimonov
Loading