-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #14 from DanRoscigno/update2
update
- Loading branch information
Showing
17 changed files
with
963 additions
and
372 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
versioned_docs/version-3.1/assets/commonMarkdown/sharedDataCNconf.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
|
||
**Before starting CNs**, add the following configuration items in the CN configuration file **cn.conf**: | ||
|
||
```Properties | ||
starlet_port = <starlet_port> | ||
storage_root_path = <storage_root_path> | ||
``` | ||
|
||
#### starlet_port | ||
|
||
The CN heartbeat service port for the StarRocks shared-data cluster. Default value: `9070`. | ||
|
||
#### storage_root_path | ||
|
||
The storage volume directory that the local cached data depends on and the medium type of the storage. Multiple volumes are separated by semicolon (;). If the storage medium is SSD, add `,medium:ssd` at the end of the directory. If the storage medium is HDD, add `,medium:hdd` at the end of the directory. Example: `/data1,medium:hdd;/data2,medium:ssd`. | ||
|
||
The default value for `storage_root_path` is `${STARROCKS_HOME}/storage`. | ||
|
||
Local cache is effective when queries are frequent and the data being queried is recent, but there are cases that you may wish to turn off the local cache completely. | ||
|
||
- In a Kubernetes environment with CN pods that scale up and down in number on demand, the pods may not have storage volumes attached. | ||
- When the data being queried is in a data lake in remote storage and most of it is archive (old) data. If the queries are infrequent the data cache will have a low hit ratio and the benefit may not be worth having the cache. | ||
|
||
To turn off the data cache set: | ||
|
||
```Properties | ||
storage_root_path = | ||
``` | ||
|
||
> **NOTE** | ||
> | ||
> The data is cached under the directory **`<storage_root_path>/starlet_cache`**. |
20 changes: 20 additions & 0 deletions
20
versioned_docs/version-3.1/assets/commonMarkdown/sharedDataIntro.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
This topic describes how to deploy and use a shared-data StarRocks cluster. This feature is supported from v3.0 for S3 compatible storage and v3.1 for Azure Blob Storage. | ||
|
||
> **NOTE** | ||
> | ||
> StarRocks version 3.1 brings some changes to the shared-data deployment and configuration. Please use this document if you are running version 3.1 or higher. | ||
> | ||
> If you are running version 3.0 please use the | ||
[3.0 documentation](https://docs.starrocks.io/en-us/3.0/deployment/deploy_shared_data). | ||
|
||
The shared-data StarRocks cluster is specifically engineered for the cloud on the premise of separation of storage and compute. It allows data to be stored in object storage (for example, AWS S3, Google GCS, Azure Blob Storage, and MinIO). You can achieve not only cheaper storage and better resource isolation, but elastic scalability for your cluster. The query performance of the shared-data StarRocks cluster aligns with that of a shared-nothing StarRocks cluster when the local disk cache is hit. | ||
|
||
In version 3.1 and higher the StarRocks shared-data cluster is made up of Frontend Engines (FEs) and Compute Nodes (CNs). The CNs replace the classic Backend Engines (BEs) in shared-data clusters. | ||
|
||
Compared to the classic shared-nothing StarRocks architecture, separation of storage and compute offers a wide range of benefits. By decoupling these components, StarRocks provides: | ||
|
||
- Inexpensive and seamlessly scalable storage. | ||
- Elastic scalable compute. Because data is not stored in Compute Nodes (CNs), scaling can be done without data migration or shuffling across nodes. | ||
- Local disk cache for hot data to boost query performance. | ||
- Asynchronous data ingestion into object storage, allowing a significant improvement in loading performance. | ||
|
98 changes: 98 additions & 0 deletions
98
versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUse.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
|
||
For more information on how to create a storage volume for other object storages and set the default storage volume, see [CREATE STORAGE VOLUME](../../sql-reference/sql-statements/Administration/CREATE%20STORAGE%20VOLUME.md) and [SET DEFAULT STORAGE VOLUME](../../sql-reference/sql-statements/Administration/SET%20DEFAULT%20STORAGE%20VOLUME.md). | ||
|
||
### Create a database and a cloud-native table | ||
|
||
After you create a default storage volume, you can then create a database and a cloud-native table using this storage volume. | ||
|
||
Currently, shared-data StarRocks clusters support the following table types: | ||
|
||
- Duplicate Key table | ||
- Aggregate table | ||
- Unique Key table | ||
- Primary Key table (Currently, the primary key persistent index is not supported.) | ||
|
||
The following example creates a database `cloud_db` and a table `detail_demo` based on Duplicate Key table type, enables the local disk cache, sets the hot data validity duration to one month, and disables asynchronous data ingestion into object storage: | ||
|
||
```SQL | ||
CREATE DATABASE cloud_db; | ||
USE cloud_db; | ||
CREATE TABLE IF NOT EXISTS detail_demo ( | ||
recruit_date DATE NOT NULL COMMENT "YYYY-MM-DD", | ||
region_num TINYINT COMMENT "range [-128, 127]", | ||
num_plate SMALLINT COMMENT "range [-32768, 32767] ", | ||
tel INT COMMENT "range [-2147483648, 2147483647]", | ||
id BIGINT COMMENT "range [-2^63 + 1 ~ 2^63 - 1]", | ||
password LARGEINT COMMENT "range [-2^127 + 1 ~ 2^127 - 1]", | ||
name CHAR(20) NOT NULL COMMENT "range char(m),m in (1-255) ", | ||
profile VARCHAR(500) NOT NULL COMMENT "upper limit value 65533 bytes", | ||
ispass BOOLEAN COMMENT "true/false") | ||
DUPLICATE KEY(recruit_date, region_num) | ||
DISTRIBUTED BY HASH(recruit_date, region_num) | ||
PROPERTIES ( | ||
"storage_volume" = "def_volume", | ||
"datacache.enable" = "true", | ||
"datacache.partition_duration" = "1 MONTH", | ||
"enable_async_write_back" = "false" | ||
); | ||
``` | ||
|
||
> **NOTE** | ||
> | ||
> The default storage volume is used when you create a database or a cloud-native table in a shared-data StarRocks cluster if no storage volume is specified. | ||
In addition to the regular table `PROPERTIES`, you need to specify the following `PROPERTIES` when creating a table for shared-data StarRocks cluster: | ||
|
||
#### datacache.enable | ||
|
||
Whether to enable the local disk cache. | ||
|
||
- `true` (Default) When this property is set to `true`, the data to be loaded is simultaneously written into the object storage and the local disk (as the cache for query acceleration). | ||
- `false` When this property is set to `false`, the data is loaded only into the object storage. | ||
|
||
> **NOTE** | ||
> | ||
> In version 3.0 this property was named `enable_storage_cache`. | ||
> | ||
> To enable the local disk cache, you must specify the directory of the disk in the CN configuration item `storage_root_path`. | ||
#### datacache.partition_duration | ||
|
||
The validity duration of the hot data. When the local disk cache is enabled, all data is loaded into the cache. When the cache is full, StarRocks deletes the less recently used data from the cache. When a query needs to scan the deleted data, StarRocks checks if the data is within the duration of validity. If the data is within the duration, StarRocks loads the data into the cache again. If the data is not within the duration, StarRocks does not load it into the cache. This property is a string value that can be specified with the following units: `YEAR`, `MONTH`, `DAY`, and `HOUR`, for example, `7 DAY` and `12 HOUR`. If it is not specified, all data is cached as the hot data. | ||
|
||
> **NOTE** | ||
> | ||
> In version 3.0 this property was named `storage_cache_ttl`. | ||
> | ||
> This property is available only when `datacache.enable` is set to `true`. | ||
#### enable_async_write_back | ||
|
||
Whether to allow data to be written into object storage asynchronously. Default: `false`. | ||
- `true` When this property is set to `true`, the load task returns success as soon as the data is written into the local disk cache, and the data is written into the object storage asynchronously. This allows better loading performance, but it also risks data reliability under potential system failures. | ||
- `false` (Default) When this property is set to `false`, the load task returns success only after the data is written into both object storage and the local disk cache. This guarantees higher availability but leads to lower loading performance. | ||
|
||
### View table information | ||
|
||
You can view the information of tables in a specific database using `SHOW PROC "/dbs/<db_id>"`. See [SHOW PROC](../../sql-reference/sql-statements/Administration/SHOW%20PROC.md) for more information. | ||
|
||
Example: | ||
|
||
```Plain | ||
mysql> SHOW PROC "/dbs/xxxxx"; | ||
+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ | ||
| TableId | TableName | IndexNum | PartitionColumnName | PartitionNum | State | Type | LastConsistencyCheckTime | ReplicaCount | PartitionType | StoragePath | | ||
+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ | ||
| 12003 | detail_demo | 1 | NULL | 1 | NORMAL | CLOUD_NATIVE | NULL | 8 | UNPARTITIONED | s3://xxxxxxxxxxxxxx/1/12003/ | | ||
+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ | ||
``` | ||
|
||
The `Type` of a table in shared-data StarRocks cluster is `CLOUD_NATIVE`. In the field `StoragePath`, StarRocks returns the object storage directory where the table is stored. | ||
|
||
### Load data into a shared-data StarRocks cluster | ||
|
||
Shared-data StarRocks clusters support all loading methods provided by StarRocks. See [Overview of data loading](../../loading/Loading_intro.md) for more information. | ||
|
||
### Query in a shared-data StarRocks cluster | ||
|
||
Tables in a shared-data StarRocks cluster support all types of queries provided by StarRocks. See StarRocks [SELECT](../../sql-reference/sql-statements/data-manipulation/SELECT.md) for more information. |
13 changes: 13 additions & 0 deletions
13
versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUseIntro.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
|
||
The usage of shared-data StarRocks clusters is also similar to that of a classic shared-nothing StarRocks cluster, except that the shared-data cluster uses storage volumes and cloud-native tables to store data in object storage. | ||
|
||
### Create default storage volume | ||
|
||
You can use the built-in storage volumes that StarRocks automatically creates, or you can manually create and set the default storage volume. This section describes how to manually create and set the default storage volume. | ||
|
||
> **NOTE** | ||
> | ||
> If your shared-data StarRocks cluster is upgraded from v3.0, you do not need to define a default storage volume because StarRocks created one with the object storage-related properties you specified in the FE configuration file **fe.conf**. You can still create new storage volumes with other object storage resources and set the default storage volume differently. | ||
To give your shared-data StarRocks cluster permission to store data in your object storage, you must reference a storage volume when you create databases or cloud-native tables. A storage volume consists of the properties and credential information of the remote data storage. If you have deployed a new shared-data StarRocks cluster and disallow StarRocks to create a built-in storage volume (by specifying `enable_load_volume_from_conf` as `false`), you must define a default storage volume before you can create databases and tables in the cluster. | ||
|
Oops, something went wrong.