Skip to content

Commit

Permalink
feat: adjust WAL purge default configurations (#5107)
Browse files Browse the repository at this point in the history
* feat: adjust WAL purge default configurations

* fix: config

* feat: change raft engine file_size default to 128Mib
  • Loading branch information
killme2008 authored and evenyag committed Dec 20, 2024
1 parent 85d72a3 commit 3edf231
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 37 deletions.
22 changes: 11 additions & 11 deletions config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `standalone` | The running mode of the datanode. It can be `standalone` or `distributed`. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. |
| `default_timezone` | String | Unset | The default timezone of the server. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
Expand Down Expand Up @@ -61,9 +61,9 @@
| `wal` | -- | -- | The WAL options. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
Expand Down Expand Up @@ -286,12 +286,12 @@
| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. |
| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. |
| `store_addrs` | Array | -- | Store server address default to etcd store. |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `backend` | String | `EtcdStore` | The datastore for meta server. |
| `selector` | String | `round_robin` | Datanode selector type.<br/>- `round_robin` (default value)<br/>- `lease_based`<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `backend` | String | `EtcdStore` | The datastore for meta server. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
Expand Down Expand Up @@ -356,14 +356,14 @@
| `node_id` | Integer | Unset | The datanode identifier and should be unique in the cluster. |
| `require_lease_before_startup` | Bool | `false` | Start services after regions have obtained leases.<br/>It will block the datanode start if it can't receive leases in the heartbeat from metasrv. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. |
| `rpc_addr` | String | Unset | Deprecated, use `grpc.addr` instead. |
| `rpc_hostname` | String | Unset | Deprecated, use `grpc.hostname` instead. |
| `rpc_runtime_size` | Integer | Unset | Deprecated, use `grpc.runtime_size` instead. |
| `rpc_max_recv_message_size` | String | Unset | Deprecated, use `grpc.rpc_max_recv_message_size` instead. |
| `rpc_max_send_message_size` | String | Unset | Deprecated, use `grpc.rpc_max_send_message_size` instead. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
Expand Down Expand Up @@ -398,9 +398,9 @@
| `wal` | -- | -- | The WAL options. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
Expand Down
11 changes: 5 additions & 6 deletions config/datanode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,6 @@ require_lease_before_startup = false
## By default, it provides services after all regions have been initialized.
init_regions_in_background = false

## Enable telemetry to collect anonymous usage data.
enable_telemetry = true

## Parallelism of initializing regions.
init_regions_parallelism = 16

Expand All @@ -42,6 +39,8 @@ rpc_max_recv_message_size = "512MB"
## @toml2docs:none-default
rpc_max_send_message_size = "512MB"

## Enable telemetry to collect anonymous usage data. Enabled by default.
#+ enable_telemetry = true

## The HTTP server options.
[http]
Expand Down Expand Up @@ -143,15 +142,15 @@ dir = "/tmp/greptimedb/wal"

## The size of the WAL segment file.
## **It's only used when the provider is `raft_engine`**.
file_size = "256MB"
file_size = "128MB"

## The threshold of the WAL size to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_threshold = "4GB"
purge_threshold = "1GB"

## The interval to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_interval = "10m"
purge_interval = "1m"

## The read batch size.
## **It's only used when the provider is `raft_engine`**.
Expand Down
16 changes: 8 additions & 8 deletions config/metasrv.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ server_addr = "127.0.0.1:3002"
## Store server address default to etcd store.
store_addrs = ["127.0.0.1:2379"]

## If it's not empty, the metasrv will store all data with this key prefix.
store_key_prefix = ""

## The datastore for meta server.
backend = "EtcdStore"

## Datanode selector type.
## - `round_robin` (default value)
## - `lease_based`
Expand All @@ -20,20 +26,14 @@ selector = "round_robin"
## Store data in memory.
use_memory_store = false

## Whether to enable greptimedb telemetry.
enable_telemetry = true

## If it's not empty, the metasrv will store all data with this key prefix.
store_key_prefix = ""

## Whether to enable region failover.
## This feature is only available on GreptimeDB running on cluster mode and
## - Using Remote WAL
## - Using shared storage (e.g., s3).
enable_region_failover = false

## The datastore for meta server.
backend = "EtcdStore"
## Whether to enable greptimedb telemetry. Enabled by default.
#+ enable_telemetry = true

## The runtime options.
#+ [runtime]
Expand Down
12 changes: 6 additions & 6 deletions config/standalone.example.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
## The running mode of the datanode. It can be `standalone` or `distributed`.
mode = "standalone"

## Enable telemetry to collect anonymous usage data.
enable_telemetry = true

## The default timezone of the server.
## @toml2docs:none-default
default_timezone = "UTC"
Expand All @@ -18,6 +15,9 @@ init_regions_parallelism = 16
## The maximum current queries allowed to be executed. Zero means unlimited.
max_concurrent_queries = 0

## Enable telemetry to collect anonymous usage data. Enabled by default.
#+ enable_telemetry = true

## The runtime options.
#+ [runtime]
## The number of threads to execute the runtime for global read operations.
Expand Down Expand Up @@ -147,15 +147,15 @@ dir = "/tmp/greptimedb/wal"

## The size of the WAL segment file.
## **It's only used when the provider is `raft_engine`**.
file_size = "256MB"
file_size = "128MB"

## The threshold of the WAL size to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_threshold = "4GB"
purge_threshold = "1GB"

## The interval to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_interval = "10m"
purge_interval = "1m"

## The read batch size.
## **It's only used when the provider is `raft_engine`**.
Expand Down
6 changes: 3 additions & 3 deletions src/common/wal/src/config/raft_engine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ impl Default for RaftEngineConfig {
fn default() -> Self {
Self {
dir: None,
file_size: ReadableSize::mb(256),
purge_threshold: ReadableSize::gb(4),
purge_interval: Duration::from_secs(600),
file_size: ReadableSize::mb(128),
purge_threshold: ReadableSize::gb(1),
purge_interval: Duration::from_secs(60),
read_batch_size: 128,
sync_write: false,
enable_log_recycle: true,
Expand Down
6 changes: 3 additions & 3 deletions tests-integration/tests/http.rs
Original file line number Diff line number Diff line change
Expand Up @@ -886,9 +886,9 @@ with_metric_engine = true
[wal]
provider = "raft_engine"
file_size = "256MiB"
purge_threshold = "4GiB"
purge_interval = "10m"
file_size = "128MiB"
purge_threshold = "1GiB"
purge_interval = "1m"
read_batch_size = 128
sync_write = false
enable_log_recycle = true
Expand Down

0 comments on commit 3edf231

Please sign in to comment.