-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add offline schema updates for ClickHouse #4394
Conversation
Will this PR close #4048 as well? |
@karencfv Yep, I think so. I still need do a bunch of testing and verification of the actual binary, but I've got a good set of tests around the internal implementation of that in the |
15e32e2
to
83f75ae
Compare
Nice!!! 🎉 |
0aa4356
to
caed2ee
Compare
Testing setupIn addition to the unit tests inside the actual database client, I also tested this several times on my local developer machine. I started with the control plane built and installed from bnaecker@shale : ~/omicron $ pfexec zlogin oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189
[Connected to zone 'oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189' pts/3]
The illumos Project helios-2.0.22248 October 2023
root@oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189:~# ipadm
ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
lo0/v6 static ok ::1/128
oxControlService19/ll addrconf ok fe80::8:20ff:fe9e:c080%oxControlService19/10
oxControlService19/omicron6 static ok fd00:1122:3344:101::e/64
root@oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189:~# /opt/oxide/clickhouse/clickhouse client --database oximeter --host fd00:1122:3344:101::e
ClickHouse client version 22.8.9.1.
Connecting to database oximeter at fd00:1122:3344:101::e:9000 as user default.
Connected to ClickHouse server version 22.8.9 revision 54460.
oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189 :) select * from version;
SELECT *
FROM version
Query id: 90d30c65-6927-41ec-a91e-7e9ba5e71c42
┌─value─┬─────────────────────timestamp─┐
│ 2 │ 2023-10-31 19:03:05.000000000 │
└───────┴───────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189 :)
bnaecker@shale : ~/omicron $ pfexec zlogin oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606 'tail $(svcs -L oximeter)'
{"msg":"unrolling 131 total samples","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.754578608Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.767781162Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_i64","n_rows":40}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.772416835Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_string","n_rows":198}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.776443983Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_uuid","n_rows":325}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.777773065Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_bool","n_rows":6}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.779191948Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_cumulativei64","n_rows":70}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.780728493Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_histogramf64","n_rows":34}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:15.782027185Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"id":"535d567e-f5d8-42b9-92bc-c24dfc88d607","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_i64","n_rows":21}
{"msg":"collecting from producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:20.995449246Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"producer_id":"52028f5e-8f19-461a-8624-1b1450f26654","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collected 3 total results","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:09:20.996114992Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":4644,"producer_id":"52028f5e-8f19-461a-8624-1b1450f26654","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"} Installing new codeTo test the new upgrade code, I then rebuilt One of the new features of [ Oct 31 19:17:36 Enabled. ]
[ Oct 31 19:17:36 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/oximeter/bin/oximeter run /var/svc/manifest/site/oximeter/config.toml --address [fd00:1122:3344:101::d]:12223 --id f3a5ade4-a165-4ecd-a365-e57b3cdf3606 &"). ]
[ Oct 31 19:17:36 Method "start" exited with status 0. ]
note: configured to log to "/dev/stdout"
{"msg":"registered DTrace probes","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:36.165385878Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"starting oximeter server","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:17:36.165806662Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:760"}
{"msg":"new DNS resolver","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:17:36.165924353Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","file":"internal-dns/src/resolver.rs:60","addresses":"[[fd00:1122:3344:1::1]:53, [fd00:1122:3344:2::1]:53, [fd00:1122:3344:3::1]:53, [fd00:1122:3344:4::1]:53, [fd00:1122:3344:5::1]:53]"}
{"msg":"creating ClickHouse client","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:36.165945493Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"lookup srv","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:36.165964013Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","dns_name":"_clickhouse._tcp.control-plane.oxide.internal"}
{"msg":"failed to create ClickHouse client","v":0,"name":"oximeter","level":40,"time":"2023-10-31T19:17:36.24154484Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:775","error":"Database(DatabaseVersionMismatch { expected: 3, found: 2 })","retry_after":"368.429321ms"}
{"msg":"creating ClickHouse client","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:36.611656623Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"lookup srv","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:36.611706913Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","dns_name":"_clickhouse._tcp.control-plane.oxide.internal"}
{"msg":"failed to create ClickHouse client","v":0,"name":"oximeter","level":40,"time":"2023-10-31T19:17:36.65835238Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:775","error":"Database(DatabaseVersionMismatch { expected: 3, found: 2 })","retry_after":"657.142555ms"}
{"msg":"creating ClickHouse client","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:17:37.316457226Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288} The DB is still at version 2, but the new Upgrading the schemaNext I tested the actual schema upgrade tool. In the root@oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606:~# /opt/oxide/oximeter/bin/clickhouse-schema-updater --host [fd00:1122:3344:101::e]:8123 --schema-directory /opt/oxide/oximeter/schema ls
Latest version: 2
Available versions:
2 (reported by database)
3 (expected by oximeter) Then I applied the update: root@oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606:~# /opt/oxide/oximeter/bin/clickhouse-schema-updater --host [fd00:1122:3344:101::e]:8123 --schema-directory /opt/oxide/oximeter/schema --log-level debug up 3
Oct 31 19:20:41.189 DEBG starting upgrade to desired version 3, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.189 DEBG reading entries from schema dir, dir: /opt/oxide/oximeter/schema/single-node, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.189 DEBG skipping non-directory, name: db-wipe.sql, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.189 DEBG skipping non-directory, name: db-init.sql, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.190 DEBG valid version dir, ver: 3, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.190 DEBG valid version dir, ver: 2, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.190 DEBG reading SQL files from schema dir, dir: /opt/oxide/oximeter/schema/single-node/3, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.195 DEBG apply schema upgrade file, filename: up, path: /opt/oxide/oximeter/schema/single-node/3/up.sql, version: 3, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Oct 31 19:20:41.211 DEBG successfully applied schema upgrade file, name: up, path: /opt/oxide/oximeter/schema/single-node/3/up.sql, version: 3, id: 06ba6496-b954-4670-93e4-775bbf85f553, component: clickhouse-client, unit: clickhouse_schema_updater
Upgrade to oximeter database version 3 complete
root@oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606:~# /opt/oxide/oximeter/bin/clickhouse-schema-updater --host [fd00:1122:3344:101::e]:8123 --schema-directory /opt/oxide/oximeter/schema ls
Latest version: 3
Available versions:
2
3 (reported by database) (expected by oximeter) And over in the {"msg":"failed to create ClickHouse client","v":0,"name":"oximeter","level":40,"time":"2023-10-31T19:18:59.682117682Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:775","error":"Database(DatabaseVersionMismatch { expected: 3, found: 2 })","retry_after":"72.772067321s"}
{"msg":"creating ClickHouse client","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:20:12.456604915Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"lookup srv","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:20:12.456668875Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","dns_name":"_clickhouse._tcp.control-plane.oxide.internal"}
{"msg":"failed to create ClickHouse client","v":0,"name":"oximeter","level":40,"time":"2023-10-31T19:20:12.502534124Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:775","error":"Database(DatabaseVersionMismatch { expected: 3, found: 2 })","retry_after":"214.749169268s"}
{"msg":"creating ClickHouse client","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.256377372Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"lookup srv","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.256941857Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","dns_name":"_clickhouse._tcp.control-plane.oxide.internal"}
{"msg":"registered endpoint","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.293174086Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","path":"/info","method":"GET"}
{"msg":"registered endpoint","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.293202856Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","path":"/producers","method":"GET"}
{"msg":"registered endpoint","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.293214806Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","path":"/producers","method":"POST"}
{"msg":"registered endpoint","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.293224947Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","path":"/producers/{producer_id}","method":"DELETE"}
{"msg":"listening","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.293234887Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:195"}
{"msg":"successfully registered DTrace USDT probes","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.293243447Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot"}
{"msg":"contacting nexus","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.318548253Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288}
{"msg":"lookup_ipv6 srv","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.318566804Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"DnsResolver","dns_name":"_nexus._tcp.control-plane.oxide.internal"}
{"msg":"accepted connection","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.366801445Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:769","remote_addr":"[fd00:1122:3344:101::b]:49263"}
{"msg":"registered new metric producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.372961992Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","address":"[fd00:1122:3344:101::2]:12224","producer_id":"02582ccd-7613-4ca5-81e7-f88b1cc3b76b"}
{"msg":"request completed","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.373039013Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"uri":"/producers","method":"POST","req_id":"7430e7a6-dd98-4877-8934-41126ec74229","remote_addr":"[fd00:1122:3344:101::b]:49263","local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:853","latency_us":2853,"response_code":"204"}
{"msg":"registered new metric producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.373260895Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","address":"[fd00:1122:3344:101::c]:12221","producer_id":"52028f5e-8f19-461a-8624-1b1450f26654"}
{"msg":"request completed","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.373290286Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"uri":"/producers","method":"POST","req_id":"6f8d78cb-1d49-4ef8-90a3-1261ee8af4f2","remote_addr":"[fd00:1122:3344:101::b]:49263","local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:853","latency_us":61,"response_code":"204"}
{"msg":"registered new metric producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.373484947Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","address":"[fd00:1122:3344:101::a]:12221","producer_id":"a5869a7a-0e53-485e-9834-71f1b0223c07"}
{"msg":"request completed","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.373509848Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"uri":"/producers","method":"POST","req_id":"c1ce20e1-11ec-4541-9b1f-37229001213d","remote_addr":"[fd00:1122:3344:101::b]:49263","local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:853","latency_us":67,"response_code":"204"}
{"msg":"registered new metric producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.37376873Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","address":"[fd00:1122:3344:101::b]:12221","producer_id":"d49aee67-71d0-467d-bbcd-037c2c3d8219"}
{"msg":"request completed","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.37381235Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"uri":"/producers","method":"POST","req_id":"56be562d-514f-46ce-8e8a-7c21b702e838","remote_addr":"[fd00:1122:3344:101::b]:49263","local_addr":"[fd00:1122:3344:101::d]:12223","component":"dropshot","file":"/home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/fa728d0/dropshot/src/server.rs:853","latency_us":131,"response_code":"204"}
{"msg":"oximeter registered with nexus","v":0,"name":"oximeter","level":30,"time":"2023-10-31T19:23:47.374598988Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"file":"oximeter/collector/src/lib.rs:847","id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606"}
{"msg":"starting oximeter collection task","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.450307176Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"02582ccd-7613-4ca5-81e7-f88b1cc3b76b","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","interval":"10s"}
{"msg":"starting oximeter collection task","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.451321376Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"a5869a7a-0e53-485e-9834-71f1b0223c07","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","interval":"10s"}
{"msg":"starting oximeter collection task","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.455914459Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"d49aee67-71d0-467d-bbcd-037c2c3d8219","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","interval":"10s"}
{"msg":"starting oximeter collection task","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:47.456929018Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"52028f5e-8f19-461a-8624-1b1450f26654","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","interval":"10s"}
{"msg":"collecting from producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.450271514Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"02582ccd-7613-4ca5-81e7-f88b1cc3b76b","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collecting from producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.450543787Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"a5869a7a-0e53-485e-9834-71f1b0223c07","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collected 1 total results","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.453288512Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"02582ccd-7613-4ca5-81e7-f88b1cc3b76b","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collecting from producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.456045988Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"d49aee67-71d0-467d-bbcd-037c2c3d8219","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collected 3 total results","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.456101749Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"a5869a7a-0e53-485e-9834-71f1b0223c07","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collecting from producer","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.456612023Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"52028f5e-8f19-461a-8624-1b1450f26654","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collected 3 total results","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.457072068Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"d49aee67-71d0-467d-bbcd-037c2c3d8219","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"collected 3 total results","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:23:57.457468241Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"producer_id":"52028f5e-8f19-461a-8624-1b1450f26654","component":"collection-task","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"inserting 131 samples into database","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.287703157Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"component":"results-sink","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"unrolling 131 total samples","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.287768058Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"retrieving timeseries schema from database","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.302708728Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent"}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.331015343Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_i64","n_rows":40}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.333399395Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_string","n_rows":198}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.335948469Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.fields_uuid","n_rows":325}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.337388872Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_bool","n_rows":6}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.338724085Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_cumulativei64","n_rows":70}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.340514711Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_histogramf64","n_rows":34}
{"msg":"inserted rows into table","v":0,"name":"oximeter","level":20,"time":"2023-10-31T19:24:02.341768803Z","hostname":"oxz_oximeter_f3a5ade4-a165-4ecd-a365-e57b3cdf3606","pid":22288,"id":"ab78e276-a1ea-4002-bfaf-9674f0f00e14","component":"clickhouse-client","collector_id":"f3a5ade4-a165-4ecd-a365-e57b3cdf3606","component":"oximeter-agent","table_name":"oximeter.measurements_i64","n_rows":21} There are unit tests to verify this, but we can indeed see that the schema has been changed to fix #4369: oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189 :) describe table timeseries_schema;
DESCRIBE TABLE timeseries_schema
Query id: cad647e1-4831-49a8-b63b-c99b15145905
┌─name────────────┬─type───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ timeseries_name │ String │ │ │ │ │ │
│ fields.name │ Array(String) │ │ │ │ │ │
│ fields.type │ Array(Enum8('Bool' = 1, 'I64' = 2, 'IpAddr' = 3, 'String' = 4, 'Uuid' = 6, 'I8' = 7, 'U8' = 8, 'I16' = 9, 'U16' = 10, 'I32' = 11, 'U32' = 12, 'U64' = 13)) │ │ │ │ │ │
│ fields.source │ Array(Enum8('Target' = 1, 'Metric' = 2)) │ │ │ │ │ │
│ datum_type │ Enum8('Bool' = 1, 'I64' = 2, 'F64' = 3, 'String' = 4, 'Bytes' = 5, 'CumulativeI64' = 6, 'CumulativeF64' = 7, 'HistogramI64' = 8, 'HistogramF64' = 9, 'I8' = 10, 'U8' = 11, 'I16' = 12, 'U16' = 13, 'I32' = 14, 'U32' = 15, 'U64' = 16, 'F32' = 17, 'CumulativeU64' = 18, 'CumulativeF32' = 19, 'HistogramI8' = 20, 'HistogramU8' = 21, 'HistogramI16' = 22, 'HistogramU16' = 23, 'HistogramI32' = 24, 'HistogramU32' = 25, 'HistogramU64' = 26, 'HistogramF32' = 27) │ │ │ │ │ │
│ created │ DateTime64(9, 'UTC') │ │ │ │ │ │
└─────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
6 rows in set. Elapsed: 0.001 sec.
oxz_clickhouse_de7eb711-5b35-4b64-894c-7d3a4764c189 :) |
3dc050d
to
2063d47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Really looking forward to having this feature.
Has this been tested manually on a replicated set up as well?
In the past we talked about generating the SQL files dynamically instead of having multiple hardcoded files, which will also aid in testing (#3982). With this approach, we seem to be relying more on hardcoded SQL files. How do you see us moving forward with both schema updates and dynamic SQL?
The replicated tests in this PR are annoyingly flaky. I'm betting there is some port conflict with other tests, since we still hardcode all port numbers in the replicated cluster setup code. I'm not able to reproduce the failure locally though, and can't really verify that since we do not currently store the tempfiles the ClickHouse processes write their logs to on failure. I'll try to repro or insert some code in the tests to dump those files to the test log we do save, but I think the real way to resolve this is to dynamically create the cluster with any free port numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what's going on here, the serial_test crate is not compatible with nextest. See #4149 (comment). My best guess is that some of your ports are clashing.
I'm not sure it's a good idea to have some of the tests running against a single cluster and other tests spinning up their own clusters; we should settle on one format or the other.
I'm concerned that by changing the format and spinning up so many clusters serially, the testing times will go through the roof.
That said, if the testing times aren't affected, then sure why not. But then, we should change all tests to avoid inconsistency.
Allocating ports dynamically sounds good, but heads up that there are many complications with this approach. Configuration files for each node are not all the same, there are many ports to allocate, and I had to reallocate some ports due to the fact that we are running all nodes in a single server.
Thanks @karencfv that's a helpful link! I'll see if I can get test groups working for this PR. I would really like to move everything over to running against their own cluster, the way we do for single-node tests. Looking at the tests themselves, much of the time is spent waiting for the DB to start; find the other cluster members; and possible distribute the SQL. I don't know for sure, but I don't see an obvious reason why that can't all be parallelized.
Yeah, I'm sure it will be tricky and frustrating to get this working. That said, there is definitely support for dynamic reconfiguration of a cluster, so I think it's technically possible to have each node learn of the IPs and ports of the others. |
Can we move this work to a separate PR? This change in testing format feels out of scope for offline schema updates. What do you think? |
I'm not planning on doing that all now, no, only the tests I have here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work with the tests @bnaecker ! I know testing replicated mode can be pretty frustrating.
Left a few comments and just want to bump a question I had in one of my previous comments:
In the past we talked about generating the SQL files dynamically instead of having multiple hardcoded files, which will also aid in testing (#3982). With this approach, we seem to be relying more on hardcoded SQL files. How do you see us moving forward with both schema updates and dynamic SQL?
async fn test_read_schema_upgrade_sql_files() { | ||
let logctx = test_setup_log("test_read_schema_upgrade_sql_files"); | ||
let log = &logctx.log; | ||
const REPLICATED: bool = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test for replicated as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't actually require a database, it's just using this to find the right schema subdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want to make sure it also finds the replicated schema subdirectory as well.
I've not thought too much about it, but it doesn't seem to preclude generating the SQL. For example, one could imagine always generating the last version of the SQL from a build.rs script that uses the current supported measurement and field types. |
Do you see us also keeping the SQL upgrade files in the directories as well? |
Probably? I don't know how one would generate upgrade SQL statements programmatically, or if it's worth trying to do so. What I'm trying to prevent is the issues we've seen like missing tables; missing enum variants; distributed tables pointing at the wrong local table; etc. Those seem to apply to the last SQL more than any upgrade statements. |
- Some cleanup around issuing multiple SQL statements from a file - Create directory structure for storing schema updates modeled after CRDB up.sql files, but using integer versions, and move all existing SQL into version 2 - Add version 3, which fixes #4369, but does not apply it yet - Add methods in the client for listing, reading, and applying one or more updates to the oximeter database from the upgrade files - Add tests for upgrade application - Add `clickhouse-schema-updater` binary for running them on demand - Modify `oximeter-collector` to _not_ wipe / reinit the DB on startup if the version has change, but instead wait for the version to be equal to what it is compiled against. This relies on updates from the developer being applied before `oximeter` will continue.
- Schema upgrade tests for replicated tables - Add missing tables to replicated SQL
- Ensure sequential / contiguous version numbers - Bump keeper startup timeout
6b3d866
to
725bc99
Compare
Gentle ping on this one @smklein, I'd love your feedback before merging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay, looks good to me!
- Static schema validation regex - Simplify update tool instructions - Disallow more data-modifying statements during schema updates
oximeter
is missing field types in its schema table #4369, but does not apply it yetclickhouse-schema-updater
binary for running them on demandoximeter-collector
to not wipe / reinit the DB on startup if the version has change, but instead wait for the version to be equal to what it is compiled against. This relies on updates from the developer being applied beforeoximeter
will continue.