Skip to content

Commit

Permalink
feat!: enable read cache and write cache when using remote object sto…
Browse files Browse the repository at this point in the history
…res (#5093)

* feat: enable read cache and write cache when using remote object stores

* feat: make read cache be aware of remote store names

* chore: docs

* chore: apply review suggestions

* chore: trim write cache path

---------

Co-authored-by: Yingwen <[email protected]>
  • Loading branch information
killme2008 and evenyag authored Dec 10, 2024
1 parent ce86ba3 commit 03a2832
Show file tree
Hide file tree
Showing 14 changed files with 132 additions and 50 deletions.
16 changes: 8 additions & 8 deletions config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Cache configuration for object storage such as 'S3' etc. It is recommended to configure it when using object storage for better performance.<br/>The local file cache directory. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
Expand Down Expand Up @@ -131,9 +131,9 @@
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.experimental_write_cache_size` | String | `1GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/object_cache/write`. |
| `region_engine.mito.experimental_write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.experimental_write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
Expand Down Expand Up @@ -420,7 +420,7 @@
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Cache configuration for object storage such as 'S3' etc. It is recommended to configure it when using object storage for better performance.<br/>The local file cache directory. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
Expand Down Expand Up @@ -458,9 +458,9 @@
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.experimental_write_cache_size` | String | `1GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/object_cache/write`. |
| `region_engine.mito.experimental_write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.experimental_write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
Expand Down
14 changes: 7 additions & 7 deletions config/datanode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -294,14 +294,14 @@ data_home = "/tmp/greptimedb/"
## - `Oss`: the data is stored in the Aliyun OSS.
type = "File"

## Cache configuration for object storage such as 'S3' etc. It is recommended to configure it when using object storage for better performance.
## The local file cache directory.
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling.
## @toml2docs:none-default
cache_path = "/path/local_cache"
#+ cache_path = ""

## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "1GiB"
cache_capacity = "5GiB"

## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
Expand Down Expand Up @@ -476,14 +476,14 @@ auto_flush_interval = "1h"
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"

## Whether to enable the experimental write cache. It is recommended to enable it when using object storage for better performance.
## Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_experimental_write_cache = false

## File system path for write cache, defaults to `{data_home}/write_cache`.
## File system path for write cache, defaults to `{data_home}/object_cache/write`.
experimental_write_cache_path = ""

## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
experimental_write_cache_size = "1GiB"
experimental_write_cache_size = "5GiB"

## TTL for write cache.
## @toml2docs:none-default
Expand Down
14 changes: 7 additions & 7 deletions config/standalone.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -332,14 +332,14 @@ data_home = "/tmp/greptimedb/"
## - `Oss`: the data is stored in the Aliyun OSS.
type = "File"

## Cache configuration for object storage such as 'S3' etc. It is recommended to configure it when using object storage for better performance.
## The local file cache directory.
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling.
## @toml2docs:none-default
cache_path = "/path/local_cache"
#+ cache_path = ""

## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "1GiB"
cache_capacity = "5GiB"

## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
Expand Down Expand Up @@ -514,14 +514,14 @@ auto_flush_interval = "1h"
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"

## Whether to enable the experimental write cache. It is recommended to enable it when using object storage for better performance.
## Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_experimental_write_cache = false

## File system path for write cache, defaults to `{data_home}/write_cache`.
## File system path for write cache, defaults to `{data_home}/object_cache/write`.
experimental_write_cache_path = ""

## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
experimental_write_cache_size = "1GiB"
experimental_write_cache_size = "5GiB"

## TTL for write cache.
## @toml2docs:none-default
Expand Down
28 changes: 27 additions & 1 deletion src/datanode/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ use servers::heartbeat_options::HeartbeatOptions;
use servers::http::HttpOptions;
use servers::Mode;

pub const DEFAULT_OBJECT_STORE_CACHE_SIZE: ReadableSize = ReadableSize::gb(1);
pub const DEFAULT_OBJECT_STORE_CACHE_SIZE: ReadableSize = ReadableSize::gb(5);

/// Default data home in file storage
const DEFAULT_DATA_HOME: &str = "/tmp/greptimedb";
Expand Down Expand Up @@ -60,6 +60,11 @@ impl ObjectStoreConfig {
}
}

/// Returns true when it's a remote object storage such as AWS s3 etc.
pub fn is_object_storage(&self) -> bool {
!matches!(self, Self::File(_))
}

/// Returns the object storage configuration name, return the provider name if it's empty.
pub fn config_name(&self) -> &str {
let name = match self {
Expand Down Expand Up @@ -91,6 +96,13 @@ pub struct StorageConfig {
pub providers: Vec<ObjectStoreConfig>,
}

impl StorageConfig {
/// Returns true when the default storage config is a remote object storage service such as AWS S3, etc.
pub fn is_object_storage(&self) -> bool {
self.store.is_object_storage()
}
}

impl Default for StorageConfig {
fn default() -> Self {
Self {
Expand Down Expand Up @@ -452,6 +464,20 @@ mod tests {
assert_eq!("S3", s3_config.provider_name());
}

#[test]
fn test_is_object_storage() {
let store = ObjectStoreConfig::default();
assert!(!store.is_object_storage());
let s3_config = ObjectStoreConfig::S3(S3Config::default());
assert!(s3_config.is_object_storage());
let oss_config = ObjectStoreConfig::Oss(OssConfig::default());
assert!(oss_config.is_object_storage());
let gcs_config = ObjectStoreConfig::Gcs(GcsConfig::default());
assert!(gcs_config.is_object_storage());
let azblob_config = ObjectStoreConfig::Azblob(AzblobConfig::default());
assert!(azblob_config.is_object_storage());
}

#[test]
fn test_secstr() {
let toml_str = r#"
Expand Down
8 changes: 7 additions & 1 deletion src/datanode/src/datanode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -428,10 +428,16 @@ impl DatanodeBuilder {
async fn build_mito_engine(
opts: &DatanodeOptions,
object_store_manager: ObjectStoreManagerRef,
config: MitoConfig,
mut config: MitoConfig,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Result<MitoEngine> {
if opts.storage.is_object_storage() {
// Enable the write cache when setting object storage
config.enable_experimental_write_cache = true;
info!("Configured 'enable_experimental_write_cache=true' for mito engine.");
}

let mito_engine = match &opts.wal {
DatanodeWalConfig::RaftEngine(raft_engine_config) => MitoEngine::new(
&opts.storage.data_home,
Expand Down
1 change: 1 addition & 0 deletions src/datanode/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
// limitations under the License.

#![feature(assert_matches)]
#![feature(let_chains)]

pub mod alive_keeper;
pub mod config;
Expand Down
63 changes: 45 additions & 18 deletions src/datanode/src/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,20 @@ mod fs;
mod gcs;
mod oss;
mod s3;

use std::path::Path;
use std::sync::Arc;
use std::time::Duration;
use std::{env, path};

use common_base::readable_size::ReadableSize;
use common_telemetry::{info, warn};
use object_store::layers::{LruCacheLayer, RetryInterceptor, RetryLayer};
use object_store::services::Fs;
use object_store::util::{join_dir, normalize_dir, with_instrument_layers};
use object_store::{Access, Error, HttpClient, ObjectStore, ObjectStoreBuilder};
use object_store::{Access, Error, HttpClient, ObjectStore, ObjectStoreBuilder, OBJECT_CACHE_DIR};
use snafu::prelude::*;

use crate::config::{HttpClientConfig, ObjectStoreConfig, DEFAULT_OBJECT_STORE_CACHE_SIZE};
use crate::error::{self, Result};
use crate::error::{self, CreateDirSnafu, Result};

pub(crate) async fn new_raw_object_store(
store: &ObjectStoreConfig,
Expand Down Expand Up @@ -68,7 +67,7 @@ pub(crate) async fn new_object_store_without_cache(
) -> Result<ObjectStore> {
let object_store = new_raw_object_store(store, data_home).await?;
// Enable retry layer and cache layer for non-fs object storages
let object_store = if !matches!(store, ObjectStoreConfig::File(..)) {
let object_store = if store.is_object_storage() {
// Adds retry layer
with_retry_layers(object_store)
} else {
Expand All @@ -85,8 +84,8 @@ pub(crate) async fn new_object_store(
) -> Result<ObjectStore> {
let object_store = new_raw_object_store(&store, data_home).await?;
// Enable retry layer and cache layer for non-fs object storages
let object_store = if !matches!(store, ObjectStoreConfig::File(..)) {
let object_store = if let Some(cache_layer) = build_cache_layer(&store).await? {
let object_store = if store.is_object_storage() {
let object_store = if let Some(cache_layer) = build_cache_layer(&store, data_home).await? {
// Adds cache layer
object_store.layer(cache_layer)
} else {
Expand All @@ -105,44 +104,72 @@ pub(crate) async fn new_object_store(

async fn build_cache_layer(
store_config: &ObjectStoreConfig,
data_home: &str,
) -> Result<Option<LruCacheLayer<impl Access>>> {
let (cache_path, cache_capacity) = match store_config {
let (name, mut cache_path, cache_capacity) = match store_config {
ObjectStoreConfig::S3(s3_config) => {
let path = s3_config.cache.cache_path.as_ref();
let path = s3_config.cache.cache_path.clone();
let name = &s3_config.name;
let capacity = s3_config
.cache
.cache_capacity
.unwrap_or(DEFAULT_OBJECT_STORE_CACHE_SIZE);
(path, capacity)
(name, path, capacity)
}
ObjectStoreConfig::Oss(oss_config) => {
let path = oss_config.cache.cache_path.as_ref();
let path = oss_config.cache.cache_path.clone();
let name = &oss_config.name;
let capacity = oss_config
.cache
.cache_capacity
.unwrap_or(DEFAULT_OBJECT_STORE_CACHE_SIZE);
(path, capacity)
(name, path, capacity)
}
ObjectStoreConfig::Azblob(azblob_config) => {
let path = azblob_config.cache.cache_path.as_ref();
let path = azblob_config.cache.cache_path.clone();
let name = &azblob_config.name;
let capacity = azblob_config
.cache
.cache_capacity
.unwrap_or(DEFAULT_OBJECT_STORE_CACHE_SIZE);
(path, capacity)
(name, path, capacity)
}
ObjectStoreConfig::Gcs(gcs_config) => {
let path = gcs_config.cache.cache_path.as_ref();
let path = gcs_config.cache.cache_path.clone();
let name = &gcs_config.name;
let capacity = gcs_config
.cache
.cache_capacity
.unwrap_or(DEFAULT_OBJECT_STORE_CACHE_SIZE);
(path, capacity)
(name, path, capacity)
}
_ => (None, ReadableSize(0)),
_ => unreachable!("Already checked above"),
};

if let Some(path) = cache_path {
// Enable object cache by default
// Set the cache_path to be `${data_home}/object_cache/read/{name}` by default
// if it's not present
if cache_path.is_none() {
let object_cache_path = join_dir(data_home, OBJECT_CACHE_DIR);
let read_cache_path = join_dir(&object_cache_path, "read");
let read_cache_path = join_dir(&read_cache_path, &name.to_lowercase());
tokio::fs::create_dir_all(Path::new(&read_cache_path))
.await
.context(CreateDirSnafu {
dir: &read_cache_path,
})?;

info!(
"The object storage cache path is not set for '{}', using the default path: '{}'",
name, &read_cache_path
);

cache_path = Some(read_cache_path);
}

if let Some(path) = cache_path.as_ref()
&& !path.trim().is_empty()
{
let atomic_temp_dir = join_dir(path, ".tmp/");
clean_temp_dir(&atomic_temp_dir)?;

Expand Down
Loading

0 comments on commit 03a2832

Please sign in to comment.