Totally unscientific and mostly unrealistic benchmark that go-faster/ch project uses to understand performance.
The main goal is to measure minimal client overhead (CPU, RAM) to read data, i.e. data blocks deserialization and transfer.
Please see Notes for more details about results.
SELECT number FROM system.numbers_mt LIMIT 500000000
500000000 rows in set. Elapsed: 0.503 sec.
Processed 500.07 million rows,
4.00 GB (993.26 million rows/s., 7.95 GB/s.)
Note: due to row-oriented design of most libraries, overhead per single row is significantly higher, so results can be slightly surprising.
Name | Time | RAM | Ratio |
---|---|---|---|
go-faster/ch (Go) | 356ms | 9M | ~1x |
clickhouse-client (C++) | 387ms | 91M | ~1x |
vahid-sohrabloo/chconn (Go) | 416ms | 9M | ~1x |
clickhouse-cpp (C++) | 523ms | 6.9M | 1.47x |
clickhouse_driver (Rust) | 614ms | 9M | 1.72x |
clickhouse-go (Go) | 3.1s | 85M | 8.7x |
uptrace (Go) | 9.3s | 5G | 20x |
clickhouse-jdbc (Java, HTTP) | 10s | 702M | 28x |
loyd/clickhouse.rs (Rust, HTTP) | 10s | 7.2M | 28x |
clickhouse-driver (Python) | 37s | 60M | 106x |
mailru/go-clickhouse (Go, HTTP) | 4m13s | 13M | 729x |
See RESULTS.md and RESULTS.slow.md.
Keeping `go-faster/ch`, `clickhouse-client` and `vahid-sohrabloo/chconn` to `~1x`, they are mostly equal.Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
go-faster |
598.8 ± 92.2 | 356.9 | 792.8 | 1.07 ± 0.33 |
clickhouse-client |
561.9 ± 149.5 | 387.8 | 1114.2 | 1.00 |
clickhouse-cpp |
574.4 ± 35.9 | 523.3 | 707.4 | 1.02 ± 0.28 |
We are selecting best results, however C++ client has lower dispersion.
I've measured my localhost performance using iperf3
, getting 10 GiB/s,
this correlates with top results.
For example, one of go-faster/ch results is 390ms 500000000 rows 4.0 GB 10 GB/s
.
I've also implemented mock server in Go that simulates ClickHouse server to reduce
overhead, because currently the main bottleneck in this test is server itself (and probably localhost).
The go-faster/ch was able
to achieve 257ms 500000000 rows 4.0 GB 16 GB/s
which should be maximum
possible burst result, but I'm not 100% sure.
On go-faster/ch micro-benchmarks I'm getting up to 27 GB/s, not accounting of any network overhead (i.e. inmemory).