Altinity · DougTidwell · Dec 5, 2024 · Dec 5, 2024
diff --git a/content/en/altinity-kb-queries-and-syntax/pivot-unpivot.md b/content/en/altinity-kb-queries-and-syntax/pivot-unpivot.md
@@ -3,6 +3,9 @@ title: "PIVOT / UNPIVOT"
 linkTitle: "PIVOT / UNPIVOT"
 description: >
     PIVOT / UNPIVOT
+keywords: 
+  - clickhouse pivot
+  - clickhouse unpivot
 ---
 ## PIVOT
 

diff --git a/...t/en/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk.md b/...t/en/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk.md
@@ -1,18 +1,20 @@
 ---
 title: "System tables ate my disk"
-linkTitle: "System tables ate my disk"
+linkTitle: "Regulating the size of System tables"
 description: >
-    System tables ate my disk
+    When the ClickHouse® SYSTEM database gets out of hand
+keywords: 
+  - clickhouse system tables
 ---
 > **Note 1:** System database stores virtual tables (**parts**, **tables,** **columns, etc.**) and \***_log** tables.
 >
 > Virtual tables do not persist on disk. They reflect ClickHouse® memory (c++ structures). They cannot be changed or removed.
 >
-> Log tables are named with postfix \***_log** and have the MergeTree engine. ClickHouse does not use information stored in these tables, this data is for you only.
+> Log tables are named with postfix \***_log** and have the [MergeTree engine](/engines/mergetree-table-engine-family/). ClickHouse does not use information stored in these tables, this data is for you only.
 >
 > You can drop / rename / truncate \***_log** tables at any time. ClickHouse will recreate them in about 7 seconds (flush period).
 
-> **Note 2:** Log tables with numeric postfixes (_1 / 2 / 3 ...) `query_log_1 query_thread_log_3` are results of ClickHouse upgrades (or other changes of schemas of these tables). When a new version of ClickHouse starts and discovers that a system log table's schema is incompatible with a new schema, then ClickHouse renames the old *_log table to the name with the prefix and creates a table with the new schema. You can drop such tables if you don't need such historic data.
+> **Note 2:** Log tables with numeric postfixes (_1 / 2 / 3 ...) `query_log_1 query_thread_log_3` are results of [ClickHouse upgrades](https://altinity.com/clickhouse-upgrade-overview/) (or other changes of schemas of these tables). When a new version of ClickHouse starts and discovers that a system log table's schema is incompatible with a new schema, then ClickHouse renames the old *_log table to the name with the prefix and creates a table with the new schema. You can drop such tables if you don't need such historic data.
 
 ## You can disable all / any of them
 
@@ -99,7 +101,7 @@ Important part here is a daily partitioning `PARTITION BY (event_date)` in this
 
 Usual TTL processing (when table partitioned by toYYYYMM and TTL by day) is heavy CPU / Disk I/O consuming operation which re-writes data parts without expired rows.
 
-You can add TTL without ClickHouse restart (and table dropping or renaming):
+You can [add TTL without ClickHouse restart](/altinity-kb-queries-and-syntax/ttl/modify-ttl/) (and table dropping or renaming):
 
 ```sql
 ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 14 DAY;
@@ -122,7 +124,7 @@ $ cat /etc/clickhouse-server/config.d/query_log_ttl.xml
     </query_log>
 </clickhouse>
 ```
-💡 For the clickhouse-operator, the above method of using only the `<engine>` tag without `<ttl>` or `<partition>` is recommended, because of possible configuration clashes.
+💡 For the [clickhouse-operator](https://github.com/Altinity/clickhouse-operator/blob/master/README.md), the above method of using only the `<engine>` tag without `<ttl>` or `<partition>` is recommended, because of possible configuration clashes.
 
 After that you need to restart ClickHouse and *if using old clickhouse versions like 20 or less*, drop or rename the existing system.query_log table and then CH creates a new table with these settings. This is automatically done in newer versions 21+.
 

diff --git a/content/en/engines/mergetree-table-engine-family/pick-keys.md b/content/en/engines/mergetree-table-engine-family/pick-keys.md
@@ -1,12 +1,12 @@
 ---
 title: "How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for the MergeTree family table"
-linkTitle: "Proper ordering and partitioning MergeTree tables"
+linkTitle: "Properly ordering and partitioning MergeTree tables"
 keywords:
-- order by clickhouse
-- clickhouse partition by
+  - order by clickhouse
+  - clickhouse partition by
 weight: 100
 description: >-
-     How to pick an ORDER BY / PRIMARY KEY / PARTITION BY for MergeTree tables.
+  Optimizing ClickHouse® MergeTree tables
 ---
 
 Good `order by` usually have 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
@@ -16,13 +16,13 @@ Practical approach to create an good ORDER BY for a table:
 1. Pick the columns you use in filtering always
 2. The most important for filtering and the lowest cardinal should be the left-most. Typically it's something like `tenant_id`
 3. Next column is more cardinal, less important. It can be rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
-4. repeat p.3 once again (or few times)
-5. if you added already all columns important for filtering and you still not addressing a single row with you pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
-6. if you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
+4. Repeat step 3 once again (or a few times)
+5. If you already added all columns important for filtering and you're still not addressing a single row with your pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
+6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
 special variants of MergeTree may require special ORDER BY to make the record unique etc.
 7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse) it usually make sense to put timestamp as latest column in ORDER BY, it helps with putting the same data near by for better locality. There is only 2 major patterns  for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). First one is useful when your often query small part of table partition. (table partitioned by months and your read only 1-4 days 90% of times)
 
-Some examples of good order by
+Some examples of good `ORDER BY`: 
 ```
 ORDER BY (tenantid, site_id, utm_source, clientid, timestamp)
 ```
@@ -32,7 +32,7 @@ ORDER BY (site_id, toStartOfHour(timestamp), sessionid, timestamp )
 PRIMARY KEY (site_id, toStartOfHour(timestamp), sessionid)
 ```
 
-
+(FWIW, the Altinity blog has [a great article on the LowCardinality datatype](https://altinity.com/blog/2019-3-27-low-cardinality).)
 
 ### For Summing / Aggregating
 
@@ -93,7 +93,7 @@ ORDER BY col1, col2
 FORMAT `Null`
 ```
 
-Here for the filtering it will use the skipping index to select the parts `WHERE col1 > xxx` and the result wont be need to be ordered because the `ORDER BY` in the query aligns with the `ORDER BY` in the table and the data is already ordered in disk. 
+Here for the filtering it will use the skipping index to select the parts `WHERE col1 > xxx` and the result won't be need to be ordered because the `ORDER BY` in the query aligns with the `ORDER BY` in the table and the data is already ordered in disk. (FWIW, Alexander Zaitsev and Mikhail Filimonov wrote [a great post on skipping indexes and how they work](https://altinity.com/blog/clickhouse-black-magic-skipping-indices) for the Altinity blog.)
 
 ```bash
 executeQuery: (from [::ffff:192.168.11.171]:39428, user: admin) SELECT * FROM order_test WHERE col1 > toDateTime('2020-10-01') ORDER BY col1,col2 FORMAT Null; (stage: Complete)
@@ -199,6 +199,8 @@ Ok.
 
 ## PARTITION BY 
 
+Things to consider: 
+
 * Good size for single partition is something like 1-300Gb.
 * For Summing/Replacing a bit smaller (400Mb-40Gb)
 * Better to avoid touching more that few dozens of partitions with typical SELECT query.
@@ -227,12 +229,7 @@ PARTITION BY userid % 16
 
 For the small tables (smaller than few gigabytes) partitioning is usually not needed at all (just skip `PARTITION BY` expression when you create the table).
 
-### See also
-
-[How to change ORDER BY](/altinity-kb-schema-design/change-order-by/)
-
-### ClickHouse Anti-Patterns. Learning from Users' Mistakes
-
-A short talk by Mikhail Filimonov
+## See also
 
-https://youtu.be/DP7l6Swkskw?t=3777
+* [How to change ORDER BY](/altinity-kb-schema-design/change-order-by/)
+* [ClickHouse Anti-Patterns: Learning from Users\' Mistakes](https://youtu.be/DP7l6Swkskw?t=3777), a short talk by Mikhail Filimonov