Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clickhouse] Enable system.metric_log and system.asynchronous_metric_log tables #7100

Merged
merged 2 commits into from
Nov 19, 2024

Conversation

karencfv
Copy link
Contributor

@karencfv karencfv commented Nov 19, 2024

Overview

In order to reliably roll out a replicated ClickHouse cluster, we'll be running a set of long running testing (stage 1 of RFD #468).

ClickHouse provides a system database with several tables with information about the system. For monitoring purposes, the system.asynchronous_metric_log and system.metric_log tables are particularly useful. With them we can retrieve information about queries per second, CPU usage, memory usage etc. Full lists of available metrics here and here

During our long running testing I'd like to give these tables a TTL of 30 days. Once we are confident the system is stable and we roll out the cluster to all racks, we can reduce TTL to 7 or 14 days.

This PR will only enable the tables themselves. There will be follow up PRs to actually retrieve the data we'll be monitoring

Manual testing

Queries per second

oxz_clickhouse_eecd32cc-ebf2-4196-912f-5bb440b104a0.local :) SELECT toStartOfInterval(event_time, INTERVAL 60 SECOND) AS t, avg(ProfileEvent_Query)
FROM system.metric_log
WHERE event_date >= toDate(now() - 86400) AND event_time >= now() - 86400
GROUP BY t
ORDER BY t WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

SELECT
    toStartOfInterval(event_time, toIntervalSecond(60)) AS t,
    avg(ProfileEvent_Query)
FROM system.metric_log
WHERE (event_date >= toDate(now() - 86400)) AND (event_time >= (now() - 86400))
GROUP BY t
ORDER BY t ASC WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

Query id: 1b91946b-fe8b-4074-bc94-f071f72f55f5

┌────────────────────t─┬─avg(ProfileEvent_Query)─┐
│ 2024-11-18T06:40:00Z │      1.3571428571428572 │
│ 2024-11-18T06:41:00Z │      1.3666666666666667 │
│ 2024-11-18T06:42:00Z │      1.3666666666666667 │
│ 2024-11-18T06:43:00Z │      1.3666666666666667 │
│ 2024-11-18T06:44:00Z │      1.3666666666666667 │

Disk usage

oxz_clickhouse_eecd32cc-ebf2-4196-912f-5bb440b104a0.local :) SELECT toStartOfInterval(event_time, INTERVAL 60 SECOND) AS t, avg(value)
FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - 86400) AND event_time >= now() - 86400
AND metric = 'DiskUsed_default'
GROUP BY t
ORDER BY t WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

SELECT
    toStartOfInterval(event_time, toIntervalSecond(60)) AS t,
    avg(value)
FROM system.asynchronous_metric_log
WHERE (event_date >= toDate(now() - 86400)) AND (event_time >= (now() - 86400)) AND (metric = 'DiskUsed_default')
GROUP BY t
ORDER BY t ASC WITH FILL STEP 60
SETTINGS date_time_output_format = 'iso'

Query id: bcf9cd9b-7fea-4aea-866b-d69e60a7c0b6

┌────────────────────t─┬─────────avg(value)─┐
│ 2024-11-18T06:42:00Z │  860941425.7777778 │
│ 2024-11-18T06:43:00Z │  865134523.7333333 │
│ 2024-11-18T06:44:00Z │          871888896 │
│ 2024-11-18T06:45:00Z │  874408891.7333333 │
│ 2024-11-18T06:46:00Z │          878761984 │
│ 2024-11-18T06:47:00Z │  881646933.3333334 │
│ 2024-11-18T06:48:00Z │  883998788.2666667 │

Related #6953

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

@karencfv karencfv merged commit 7947f75 into oxidecomputer:main Nov 19, 2024
16 checks passed
@karencfv karencfv deleted the enable-monitoring branch November 19, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants