From 91b87a0e86e2ec7ec3261fca2f4c180a260ce32a Mon Sep 17 00:00:00 2001 From: DanRoscigno Date: Fri, 6 Oct 2023 10:32:43 -0400 Subject: [PATCH] update Signed-off-by: DanRoscigno --- .../administration/Configuration.md | 2 +- .../administration/Configuration.md | 2 +- .../assets/commonMarkdown/sharedDataCNconf.md | 32 ++ .../assets/commonMarkdown/sharedDataIntro.md | 20 + .../assets/commonMarkdown/sharedDataUse.md | 98 +++++ .../commonMarkdown/sharedDataUseIntro.md | 13 + .../deployment/deploy_shared_data.md | 364 ------------------ .../deployment/deployment_overview.md | 2 +- .../deployment/shared_data/azure.md | 165 ++++++++ .../version-3.1/deployment/shared_data/gcs.md | 172 +++++++++ .../deployment/shared_data/minio.md | 171 ++++++++ .../version-3.1/deployment/shared_data/s3.md | 284 ++++++++++++++ .../Administration/CREATE STORAGE VOLUME.md | 2 +- .../data-definition/CREATE TABLE.md | 2 +- .../table_design/Data_distribution.md | 2 +- .../table_design/expression_partitioning.md | 2 +- .../table_design/list_partitioning.md | 2 +- 17 files changed, 963 insertions(+), 372 deletions(-) create mode 100644 versioned_docs/version-3.1/assets/commonMarkdown/sharedDataCNconf.md create mode 100644 versioned_docs/version-3.1/assets/commonMarkdown/sharedDataIntro.md create mode 100644 versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUse.md create mode 100644 versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUseIntro.md delete mode 100644 versioned_docs/version-3.1/deployment/deploy_shared_data.md create mode 100644 versioned_docs/version-3.1/deployment/shared_data/azure.md create mode 100644 versioned_docs/version-3.1/deployment/shared_data/gcs.md create mode 100644 versioned_docs/version-3.1/deployment/shared_data/minio.md create mode 100644 versioned_docs/version-3.1/deployment/shared_data/s3.md diff --git a/versioned_docs/version-3.0/administration/Configuration.md b/versioned_docs/version-3.0/administration/Configuration.md index c82e5a3967..81462496af 100644 --- a/versioned_docs/version-3.0/administration/Configuration.md +++ b/versioned_docs/version-3.0/administration/Configuration.md @@ -1769,7 +1769,7 @@ BE static parameters are as follows. #### user_function_dir -- **Default**: `${STARROCKS_HOME}/lib/udfi` +- **Default**: `${STARROCKS_HOME}/lib/udf` - **Unit**: N/A - **Description**: The directory used to store User-defined Functions (UDFs). diff --git a/versioned_docs/version-3.1/administration/Configuration.md b/versioned_docs/version-3.1/administration/Configuration.md index e7c7b2bf2d..b875208855 100644 --- a/versioned_docs/version-3.1/administration/Configuration.md +++ b/versioned_docs/version-3.1/administration/Configuration.md @@ -1752,7 +1752,7 @@ BE static parameters are as follows. #### user_function_dir -- **Default**: `${STARROCKS_HOME}/lib/udfi` +- **Default**: `${STARROCKS_HOME}/lib/udf` - **Unit**: N/A - **Description**: The directory used to store User-defined Functions (UDFs). diff --git a/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataCNconf.md b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataCNconf.md new file mode 100644 index 0000000000..9517823db7 --- /dev/null +++ b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataCNconf.md @@ -0,0 +1,32 @@ + +**Before starting CNs**, add the following configuration items in the CN configuration file **cn.conf**: + +```Properties +starlet_port = +storage_root_path = +``` + +#### starlet_port + +The CN heartbeat service port for the StarRocks shared-data cluster. Default value: `9070`. + +#### storage_root_path + +The storage volume directory that the local cached data depends on and the medium type of the storage. Multiple volumes are separated by semicolon (;). If the storage medium is SSD, add `,medium:ssd` at the end of the directory. If the storage medium is HDD, add `,medium:hdd` at the end of the directory. Example: `/data1,medium:hdd;/data2,medium:ssd`. + +The default value for `storage_root_path` is `${STARROCKS_HOME}/storage`. + +Local cache is effective when queries are frequent and the data being queried is recent, but there are cases that you may wish to turn off the local cache completely. + +- In a Kubernetes environment with CN pods that scale up and down in number on demand, the pods may not have storage volumes attached. +- When the data being queried is in a data lake in remote storage and most of it is archive (old) data. If the queries are infrequent the data cache will have a low hit ratio and the benefit may not be worth having the cache. + +To turn off the data cache set: + +```Properties +storage_root_path = +``` + +> **NOTE** +> +> The data is cached under the directory **`/starlet_cache`**. diff --git a/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataIntro.md b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataIntro.md new file mode 100644 index 0000000000..7959dbc056 --- /dev/null +++ b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataIntro.md @@ -0,0 +1,20 @@ +This topic describes how to deploy and use a shared-data StarRocks cluster. This feature is supported from v3.0 for S3 compatible storage and v3.1 for Azure Blob Storage. + +> **NOTE** +> +> StarRocks version 3.1 brings some changes to the shared-data deployment and configuration. Please use this document if you are running version 3.1 or higher. +> +> If you are running version 3.0 please use the +[3.0 documentation](https://docs.starrocks.io/en-us/3.0/deployment/deploy_shared_data). + +The shared-data StarRocks cluster is specifically engineered for the cloud on the premise of separation of storage and compute. It allows data to be stored in object storage (for example, AWS S3, Google GCS, Azure Blob Storage, and MinIO). You can achieve not only cheaper storage and better resource isolation, but elastic scalability for your cluster. The query performance of the shared-data StarRocks cluster aligns with that of a shared-nothing StarRocks cluster when the local disk cache is hit. + +In version 3.1 and higher the StarRocks shared-data cluster is made up of Frontend Engines (FEs) and Compute Nodes (CNs). The CNs replace the classic Backend Engines (BEs) in shared-data clusters. + +Compared to the classic shared-nothing StarRocks architecture, separation of storage and compute offers a wide range of benefits. By decoupling these components, StarRocks provides: + +- Inexpensive and seamlessly scalable storage. +- Elastic scalable compute. Because data is not stored in Compute Nodes (CNs), scaling can be done without data migration or shuffling across nodes. +- Local disk cache for hot data to boost query performance. +- Asynchronous data ingestion into object storage, allowing a significant improvement in loading performance. + diff --git a/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUse.md b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUse.md new file mode 100644 index 0000000000..6810dcf2e4 --- /dev/null +++ b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUse.md @@ -0,0 +1,98 @@ + +For more information on how to create a storage volume for other object storages and set the default storage volume, see [CREATE STORAGE VOLUME](../../sql-reference/sql-statements/Administration/CREATE%20STORAGE%20VOLUME.md) and [SET DEFAULT STORAGE VOLUME](../../sql-reference/sql-statements/Administration/SET%20DEFAULT%20STORAGE%20VOLUME.md). + +### Create a database and a cloud-native table + +After you create a default storage volume, you can then create a database and a cloud-native table using this storage volume. + +Currently, shared-data StarRocks clusters support the following table types: + +- Duplicate Key table +- Aggregate table +- Unique Key table +- Primary Key table (Currently, the primary key persistent index is not supported.) + +The following example creates a database `cloud_db` and a table `detail_demo` based on Duplicate Key table type, enables the local disk cache, sets the hot data validity duration to one month, and disables asynchronous data ingestion into object storage: + +```SQL +CREATE DATABASE cloud_db; +USE cloud_db; +CREATE TABLE IF NOT EXISTS detail_demo ( + recruit_date DATE NOT NULL COMMENT "YYYY-MM-DD", + region_num TINYINT COMMENT "range [-128, 127]", + num_plate SMALLINT COMMENT "range [-32768, 32767] ", + tel INT COMMENT "range [-2147483648, 2147483647]", + id BIGINT COMMENT "range [-2^63 + 1 ~ 2^63 - 1]", + password LARGEINT COMMENT "range [-2^127 + 1 ~ 2^127 - 1]", + name CHAR(20) NOT NULL COMMENT "range char(m),m in (1-255) ", + profile VARCHAR(500) NOT NULL COMMENT "upper limit value 65533 bytes", + ispass BOOLEAN COMMENT "true/false") +DUPLICATE KEY(recruit_date, region_num) +DISTRIBUTED BY HASH(recruit_date, region_num) +PROPERTIES ( + "storage_volume" = "def_volume", + "datacache.enable" = "true", + "datacache.partition_duration" = "1 MONTH", + "enable_async_write_back" = "false" +); +``` + +> **NOTE** +> +> The default storage volume is used when you create a database or a cloud-native table in a shared-data StarRocks cluster if no storage volume is specified. + +In addition to the regular table `PROPERTIES`, you need to specify the following `PROPERTIES` when creating a table for shared-data StarRocks cluster: + +#### datacache.enable + +Whether to enable the local disk cache. + +- `true` (Default) When this property is set to `true`, the data to be loaded is simultaneously written into the object storage and the local disk (as the cache for query acceleration). +- `false` When this property is set to `false`, the data is loaded only into the object storage. + +> **NOTE** +> +> In version 3.0 this property was named `enable_storage_cache`. +> +> To enable the local disk cache, you must specify the directory of the disk in the CN configuration item `storage_root_path`. + +#### datacache.partition_duration + +The validity duration of the hot data. When the local disk cache is enabled, all data is loaded into the cache. When the cache is full, StarRocks deletes the less recently used data from the cache. When a query needs to scan the deleted data, StarRocks checks if the data is within the duration of validity. If the data is within the duration, StarRocks loads the data into the cache again. If the data is not within the duration, StarRocks does not load it into the cache. This property is a string value that can be specified with the following units: `YEAR`, `MONTH`, `DAY`, and `HOUR`, for example, `7 DAY` and `12 HOUR`. If it is not specified, all data is cached as the hot data. + +> **NOTE** +> +> In version 3.0 this property was named `storage_cache_ttl`. +> +> This property is available only when `datacache.enable` is set to `true`. + +#### enable_async_write_back + +Whether to allow data to be written into object storage asynchronously. Default: `false`. +- `true` When this property is set to `true`, the load task returns success as soon as the data is written into the local disk cache, and the data is written into the object storage asynchronously. This allows better loading performance, but it also risks data reliability under potential system failures. +- `false` (Default) When this property is set to `false`, the load task returns success only after the data is written into both object storage and the local disk cache. This guarantees higher availability but leads to lower loading performance. + +### View table information + +You can view the information of tables in a specific database using `SHOW PROC "/dbs/"`. See [SHOW PROC](../../sql-reference/sql-statements/Administration/SHOW%20PROC.md) for more information. + +Example: + +```Plain +mysql> SHOW PROC "/dbs/xxxxx"; ++---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ +| TableId | TableName | IndexNum | PartitionColumnName | PartitionNum | State | Type | LastConsistencyCheckTime | ReplicaCount | PartitionType | StoragePath | ++---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ +| 12003 | detail_demo | 1 | NULL | 1 | NORMAL | CLOUD_NATIVE | NULL | 8 | UNPARTITIONED | s3://xxxxxxxxxxxxxx/1/12003/ | ++---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ +``` + +The `Type` of a table in shared-data StarRocks cluster is `CLOUD_NATIVE`. In the field `StoragePath`, StarRocks returns the object storage directory where the table is stored. + +### Load data into a shared-data StarRocks cluster + +Shared-data StarRocks clusters support all loading methods provided by StarRocks. See [Overview of data loading](../../loading/Loading_intro.md) for more information. + +### Query in a shared-data StarRocks cluster + +Tables in a shared-data StarRocks cluster support all types of queries provided by StarRocks. See StarRocks [SELECT](../../sql-reference/sql-statements/data-manipulation/SELECT.md) for more information. diff --git a/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUseIntro.md b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUseIntro.md new file mode 100644 index 0000000000..72bfb3f8fa --- /dev/null +++ b/versioned_docs/version-3.1/assets/commonMarkdown/sharedDataUseIntro.md @@ -0,0 +1,13 @@ + +The usage of shared-data StarRocks clusters is also similar to that of a classic shared-nothing StarRocks cluster, except that the shared-data cluster uses storage volumes and cloud-native tables to store data in object storage. + +### Create default storage volume + +You can use the built-in storage volumes that StarRocks automatically creates, or you can manually create and set the default storage volume. This section describes how to manually create and set the default storage volume. + +> **NOTE** +> +> If your shared-data StarRocks cluster is upgraded from v3.0, you do not need to define a default storage volume because StarRocks created one with the object storage-related properties you specified in the FE configuration file **fe.conf**. You can still create new storage volumes with other object storage resources and set the default storage volume differently. + +To give your shared-data StarRocks cluster permission to store data in your object storage, you must reference a storage volume when you create databases or cloud-native tables. A storage volume consists of the properties and credential information of the remote data storage. If you have deployed a new shared-data StarRocks cluster and disallow StarRocks to create a built-in storage volume (by specifying `enable_load_volume_from_conf` as `false`), you must define a default storage volume before you can create databases and tables in the cluster. + diff --git a/versioned_docs/version-3.1/deployment/deploy_shared_data.md b/versioned_docs/version-3.1/deployment/deploy_shared_data.md deleted file mode 100644 index 8614284e19..0000000000 --- a/versioned_docs/version-3.1/deployment/deploy_shared_data.md +++ /dev/null @@ -1,364 +0,0 @@ -# Deploy and use shared-data StarRocks - -This topic describes how to deploy and use a shared-data StarRocks cluster. This feature is supported from v3.0. - -The shared-data StarRocks cluster is specifically engineered for the cloud on the premise of separation of storage and compute. It allows data to be stored in object storage that is compatible with the S3 protocol (for example, AWS S3 and MinIO). You can achieve not only cheaper storage and better resource isolation, but elastic scalability for your cluster. The query performance of the shared-data StarRocks cluster aligns with that of a shared-nothing StarRocks cluster when the local disk cache is hit. - -Compared to the classic StarRocks architecture, separation of storage and compute offers a wide range of benefits. By decoupling these components, StarRocks provides: - -- Inexpensive and seamlessly scalable storage. -- Elastic scalable compute. Because data is not stored in CN nodes, scaling can be done without data migration or shuffling across nodes. -- Local disk cache for hot data to boost query performance. -- Asynchronous data ingestion into object storage, allowing a significant improvement in loading performance. - -The architecture of the shared-data StarRocks cluster is as follows: - -![Shared-data Architecture](../assets/share_data_arch.png) - -## Deploy a shared-data StarRocks cluster - -The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../deployment/deploy_manually.md). - -### Configure FE nodes for shared-data StarRocks - -Before starting FEs, add the following configuration items in the FE configuration file **fe.conf**: - -| **Configuration item** | **Description** | -| ----------------------------------- | ------------------------------------------------------------ | -| run_mode | The running mode of the StarRocks cluster. Valid values: `shared_data` and `shared_nothing` (Default).
`shared_data` indicates running StarRocks in shared-data mode. `shared_nothing` indicates running StarRocks in shared-nothing mode.
**CAUTION**
You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported.
DO NOT change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. | -| cloud_native_meta_port | The cloud-native meta service RPC port. Default: `6090`. | -| enable_load_volume_from_conf | Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values: `true` (Default) and `false`. Supported from v3.1.0.
  • If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates a built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start.
  • If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#create-default-storage-volume).
**CAUTION**
We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them. | -| cloud_native_storage_type | The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value: `S3` (Default) and `AZBLOB`. If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`. If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`. | -| aws_s3_path | The S3 path used to store data. It consists of the name of your S3 bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. | -| aws_s3_endpoint | The endpoint used to access your S3 bucket, for example, `https://s3.us-west-2.amazonaws.com`. | -| aws_s3_region | The region in which your S3 bucket resides, for example, `us-west-2`. | -| aws_s3_use_aws_sdk_default_behavior | Whether to use the default authentication credential of AWS SDK. Valid values: `true` and `false` (Default). | -| aws_s3_use_instance_profile | Whether to use Instance Profile and Assumed Role as credential methods for accessing S3. Valid values: `true` and `false` (Default).
  • If you use IAM user-based credential (Access Key and Secret Key) to access S3, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`.
  • If you use Instance Profile to access S3, you must specify this item as `true`.
  • If you use Assumed Role to access S3, you must specify this item as `true`, and specify `aws_s3_iam_role_arn`.
  • And if you use an external AWS account, you must also specify `aws_s3_external_id`.
| -| aws_s3_access_key | The Access Key ID used to access your S3 bucket. | -| aws_s3_secret_key | The Secret Access Key used to access your S3 bucket. | -| aws_s3_iam_role_arn | The ARN of the IAM role that has privileges on your S3 bucket in which your data files are stored. | -| aws_s3_external_id | The external ID of the AWS account that is used for cross-account access to your S3 bucket. | -| azure_blob_path | The Azure Blob Storage path used to store data. It consists of the name of the container within your storage account and the sub-path (if any) under the container, for example, `testcontainer/subpath`. | -| azure_blob_endpoint | The endpoint of your Azure Blob Storage Account, for example, `https://test.blob.core.windows.net`. | -| azure_blob_shared_key | The Shared Key used to authorize requests for your Azure Blob Storage. | -| azure_blob_sas_token | The shared access signatures (SAS) used to authorize requests for your Azure Blob Storage. | - -> **CAUTION** -> -> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them. - -If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items: - -```Properties -run_mode = shared_data -cloud_native_meta_port = -enable_load_volume_from_conf = false -``` - -If you want to specify the properties of your object storage in the FE configuration file, examples are as follows: - -- If you use AWS S3 - - - If you use the default authentication credential of AWS SDK to access S3, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example, us-west-2 - aws_s3_region = - - # For example, https://s3.us-west-2.amazonaws.com - aws_s3_endpoint = - - aws_s3_use_aws_sdk_default_behavior = true - ``` - - - If you use IAM user-based credential (Access Key and Secret Key) to access S3, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example, us-west-2 - aws_s3_region = - - # For example, https://s3.us-west-2.amazonaws.com - aws_s3_endpoint = - - aws_s3_access_key = - aws_s3_secret_key = - ``` - - - If you use Instance Profile to access S3, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example, us-west-2 - aws_s3_region = - - # For example, https://s3.us-west-2.amazonaws.com - aws_s3_endpoint = - - aws_s3_use_instance_profile = true - ``` - - - If you use Assumed Role to access S3, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example, us-west-2 - aws_s3_region = - - # For example, https://s3.us-west-2.amazonaws.com - aws_s3_endpoint = - - aws_s3_use_instance_profile = true - aws_s3_iam_role_arn = - ``` - - - If you use Assumed Role to access S3 from an external AWS account, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example, us-west-2 - aws_s3_region = - - # For example, https://s3.us-west-2.amazonaws.com - aws_s3_endpoint = - - aws_s3_use_instance_profile = true - aws_s3_iam_role_arn = - aws_s3_external_id = - ``` - -- If you use Azure Blob Storage (supported from v3.1.1 onwards): - - - If you use Shared Key to access Azure Blob Storage, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = AZBLOB - - # For example, testcontainer/subpath - azure_blob_path = - - # For example, https://test.blob.core.windows.net - azure_blob_endpoint = - - azure_blob_shared_key = - ``` - - - If you use shared access signatures (SAS) to access Azure Blob Storage, add the following configuration items: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = AZBLOB - - # For example, testcontainer/subpath - azure_blob_path = - - # For example, https://test.blob.core.windows.net - azure_blob_endpoint = - - azure_blob_sas_token = - ``` - - > **CAUTION** - > - > The hierarchical namespace must be disabled when you create the Azure Blob Storage Account. - -- If you use GCP Cloud Storage: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example: us-east-1 - aws_s3_region = - - # For example: https://storage.googleapis.com - aws_s3_endpoint = - - aws_s3_access_key = - aws_s3_secret_key = - ``` - -- If you use MinIO: - - ```Properties - run_mode = shared_data - cloud_native_meta_port = - cloud_native_storage_type = S3 - - # For example, testbucket/subpath - aws_s3_path = - - # For example: us-east-1 - aws_s3_region = - - # For example: http://172.26.xx.xxx:39000 - aws_s3_endpoint = - - aws_s3_access_key = - aws_s3_secret_key = - ``` - -### Configure CN nodes for shared-data StarRocks - -**Before starting CNs**, add the following configuration items in the CN configuration file **cn.conf**: - -```Properties -starlet_port = -storage_root_path = -``` - -| **Configuration item** | **Description** | -| ---------------------- | ------------------------------ | -| starlet_port | The CN heartbeat service port for the StarRocks shared-data cluster. Default value: `9070`.| -| storage_root_path | The storage volume directory that the local cached data depends on and the medium type of the storage. Multiple volumes are separated by semicolon (;). If the storage medium is SSD, add `,medium:ssd` at the end of the directory. If the storage medium is HDD, add `,medium:hdd` at the end of the directory. Example: `/data1,medium:hdd;/data2,medium:ssd`. Default value: `${STARROCKS_HOME}/storage`. | - -> **NOTE** -> -> The data is cached under the directory **`/starlet_cache`**. - -## Use your shared-data StarRocks cluster - -The usage of shared-data StarRocks clusters is also similar to that of a classic StarRocks cluster, except that the shared-data cluster uses storage volumes and cloud-native tables to store data in object storage. - -### Create default storage volume - -You can use the built-in storage volumes that StarRocks automatically creates, or you can manually create and set the default storage volume. This section describes how to manually create and set the default storage volume. - -> **NOTE** -> -> If your shared-data StarRocks cluster is upgraded from v3.0, you do not need to define a default storage volume because StarRocks created one with the object storage-related properties you specified in the FE configuration file **fe.conf**. You can still create new storage volumes with other object storage resources and set the default storage volume differently. - -To give your shared-data StarRocks cluster permission to store data in your object storage, you must reference a storage volume when you create databases or cloud-native tables. A storage volume consists of the properties and credential information of the remote data storage. If you have deployed a new shared-data StarRocks cluster and disallow StarRocks to create a built-in storage volume (by specifying `enable_load_volume_from_conf` as `false`), you must define a default storage volume before you can create databases and tables in the cluster. - -The following example creates a storage volume `def_volume` for an AWS S3 bucket `defaultbucket` with the IAM user-based credential (Access Key and Secret Key), enables the storage volume, and sets it as the default storage volume: - -```SQL -CREATE STORAGE VOLUME def_volume -TYPE = S3 -LOCATIONS = ("s3://defaultbucket/test/") -PROPERTIES -( - "enabled" = "true", - "aws.s3.region" = "us-west-2", - "aws.s3.endpoint" = "https://s3.us-west-2.amazonaws.com", - "aws.s3.use_aws_sdk_default_behavior" = "false", - "aws.s3.use_instance_profile" = "false", - "aws.s3.access_key" = "xxxxxxxxxx", - "aws.s3.secret_key" = "yyyyyyyyyy" -); - -SET def_volume AS DEFAULT STORAGE VOLUME; -``` - -For more information on how to create a storage volume for other object storages and set the default storage volume, see [CREATE STORAGE VOLUME](../sql-reference/sql-statements/Administration/CREATE%20STORAGE%20VOLUME.md) and [SET DEFAULT STORAGE VOLUME](../sql-reference/sql-statements/Administration/SET%20DEFAULT%20STORAGE%20VOLUME.md). - -### Create a database and a cloud-native table - -After you created a default storage volume, you can then create a database and a cloud-native table using this storage volume. - -Currently, shared-data StarRocks clusters support the following table types: - -- Duplicate Key table -- Aggregate table -- Unique Key table -- Primary Key table (Currently, the primary key persistent index is not supported.) - -The following example creates a database `cloud_db` and a table `detail_demo` based on Duplicate Key table type, enables the local disk cache, sets the hot data validity duration to one month, and disables asynchronous data ingestion into object storage: - -```SQL -CREATE DATABASE cloud_db; -USE cloud_db; -CREATE TABLE IF NOT EXISTS detail_demo ( - recruit_date DATE NOT NULL COMMENT "YYYY-MM-DD", - region_num TINYINT COMMENT "range [-128, 127]", - num_plate SMALLINT COMMENT "range [-32768, 32767] ", - tel INT COMMENT "range [-2147483648, 2147483647]", - id BIGINT COMMENT "range [-2^63 + 1 ~ 2^63 - 1]", - password LARGEINT COMMENT "range [-2^127 + 1 ~ 2^127 - 1]", - name CHAR(20) NOT NULL COMMENT "range char(m),m in (1-255) ", - profile VARCHAR(500) NOT NULL COMMENT "upper limit value 65533 bytes", - ispass BOOLEAN COMMENT "true/false") -DUPLICATE KEY(recruit_date, region_num) -DISTRIBUTED BY HASH(recruit_date, region_num) -PROPERTIES ( - "storage_volume" = "def_volume", - "datacache.enable" = "true", - "datacache.partition_duration" = "1 MONTH", - "enable_async_write_back" = "false" -); -``` - -> **NOTE** -> -> The default storage volume is used when you create a database or a cloud-native table in a shared-data StarRocks cluster if no storage volume is specified. - -In addition to the regular table PROPERTIES, you need to specify the following PROPERTIES when creating a table for shared-data StarRocks cluster: - -| **Property** | **Description** | -| ----------------------- | ------------------------------------------------------------ | -| datacache.enable | Whether to enable the local disk cache. Default: `true`.
  • When this property is set to `true`, the data to be loaded is simultaneously written into the object storage and the local disk (as the cache for query acceleration).
  • When this property is set to `false`, the data is loaded only into the object storage.
**NOTE**
To enable the local disk cache, you must specify the directory of the disk in the CN configuration item `storage_root_path`. | -| datacache.partition_duration | The validity duration of the hot data. When the local disk cache is enabled, all data is loaded into the cache. When the cache is full, StarRocks deletes the less recently used data from the cache. When a query needs to scan the deleted data, StarRocks checks if the data is within the duration of validity. If the data is within the duration, StarRocks loads the data into the cache again. If the data is not within the duration, StarRocks does not load it into the cache. This property is a string value that can be specified with the following units: `YEAR`, `MONTH`, `DAY`, and `HOUR`, for example, `7 DAY` and `12 HOUR`. If it is not specified, all data is cached as the hot data.
**NOTE**
This property is available only when `datacache.enable` is set to `true`. | -| enable_async_write_back | Whether to allow data to be written into object storage asynchronously. Default: `false`.
  • When this property is set to `true`, the load task returns success as soon as the data is written into the local disk cache, and the data is written into the object storage asynchronously. This allows better loading performance, but it also risks data reliability under potential system failures.
  • When this property is set to `false`, the load task returns success only after the data is written into both object storage and the local disk cache. This guarantees higher availability but leads to lower loading performance.
| - -### View table information - -You can view the information of tables in a specific database using `SHOW PROC "/dbs/"`. See [SHOW PROC](../sql-reference/sql-statements/Administration/SHOW%20PROC.md) for more information. - -Example: - -```Plain -mysql> SHOW PROC "/dbs/xxxxx"; -+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ -| TableId | TableName | IndexNum | PartitionColumnName | PartitionNum | State | Type | LastConsistencyCheckTime | ReplicaCount | PartitionType | StoragePath | -+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ -| 12003 | detail_demo | 1 | NULL | 1 | NORMAL | CLOUD_NATIVE | NULL | 8 | UNPARTITIONED | s3://xxxxxxxxxxxxxx/1/12003/ | -+---------+-------------+----------+---------------------+--------------+--------+--------------+--------------------------+--------------+---------------+------------------------------+ -``` - -The `Type` of a table in shared-data StarRocks cluster is `CLOUD_NATIVE`. In the field `StoragePath`, StarRocks returns the object storage directory where the table is stored. - -### Load data into a shared-data StarRocks cluster - -Shared-data StarRocks clusters support all loading methods provided by StarRocks. See [Overview of data loading](../loading/Loading_intro.md) for more information. - -### Query in a shared-data StarRocks cluster - -Tables in a shared-data StarRocks cluster support all types of queries provided by StarRocks. See StarRocks [SELECT](../sql-reference/sql-statements/data-manipulation/SELECT.md) for more information. diff --git a/versioned_docs/version-3.1/deployment/deployment_overview.md b/versioned_docs/version-3.1/deployment/deployment_overview.md index 099b01090f..e4aec094e0 100644 --- a/versioned_docs/version-3.1/deployment/deployment_overview.md +++ b/versioned_docs/version-3.1/deployment/deployment_overview.md @@ -28,7 +28,7 @@ The deployment of StarRocks generally follows the steps outlined here: 5. Deploy StarRocks. - - If you want to deploy a shared-data StarRocks cluster, which features a disaggregated storage and compute architecture, see [Deploy and use shared-data StarRocks](../deployment/deploy_shared_data.md) for instructions. + - If you want to deploy a shared-data StarRocks cluster, which features a disaggregated storage and compute architecture, see [Deploy and use shared-data StarRocks](../deployment/shared_data/s3.md) for instructions. - If you want to deploy a shared-nothing StarRocks cluster, which uses local storage, you have the following options: - [Deploy StarRocks manually](../deployment/deploy_manually.md). diff --git a/versioned_docs/version-3.1/deployment/shared_data/azure.md b/versioned_docs/version-3.1/deployment/shared_data/azure.md new file mode 100644 index 0000000000..c74b8fa9f0 --- /dev/null +++ b/versioned_docs/version-3.1/deployment/shared_data/azure.md @@ -0,0 +1,165 @@ +# Use Azure Blob Storage for shared-data + +import SharedDataIntro from '../../assets/commonMarkdown/sharedDataIntro.md' +import SharedDataCNconf from '../../assets/commonMarkdown/sharedDataCNconf.md' +import SharedDataUseIntro from '../../assets/commonMarkdown/sharedDataUseIntro.md' +import SharedDataUse from '../../assets/commonMarkdown/sharedDataUse.md' + + + +## Architecture + +![Shared-data Architecture](../../assets/share_data_arch.png) + +## Deploy a shared-data StarRocks cluster + +The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../../deployment/deploy_manually.md). + +> **Note** +> +> Do not start the cluster until after it is configured for shared-storage in the next section of this document. + +## Configure FE nodes for shared-data StarRocks + +Before starting the cluster configure the FEs and CNs. An example configuration is provided below, and then the details for each parameter are provided. + +### Example FE configuration for Azure Blob Storage + +The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each +of your FE nodes. + + ```Properties + run_mode = shared_data + cloud_native_meta_port = + cloud_native_storage_type = AZBLOB + + # For example, testcontainer/subpath + azure_blob_path = + + # For example, https://test.blob.core.windows.net + azure_blob_endpoint = + + azure_blob_shared_key = + ``` + +- If you use shared access signatures (SAS) to access Azure Blob Storage, add the following configuration items: + + ```Properties + run_mode = shared_data + cloud_native_meta_port = + cloud_native_storage_type = AZBLOB + + # For example, testcontainer/subpath + azure_blob_path = + + # For example, https://test.blob.core.windows.net + azure_blob_endpoint = + + azure_blob_sas_token = + ``` + +> **CAUTION** +> +> The hierarchical namespace must be disabled when you create the Azure Blob Storage Account. + +### All FE parameters related to shared-storage with Azure Blob Storage + + +#### run_mode + +The running mode of the StarRocks cluster. Valid values: + +- `shared_data` +- `shared_nothing` (Default). + +> **Note** +> +> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported. +> +> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. + +#### cloud_native_meta_port + +The cloud-native meta service RPC port. + +- Default: `6090` + +#### enable_load_volume_from_conf + +Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values: + +- `true` (Default) If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates the built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start. +- `false` If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#create-default-storage-volume). + +Supported from v3.1.0. + +> **CAUTION** +> +> We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them. + +#### cloud_native_storage_type + +The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value: + +- `S3` (Default) +- `AZBLOB`. + +> Note +> +> If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`. +> +> If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`. + +#### azure_blob_path + +The Azure Blob Storage path used to store data. It consists of the name of the container within your storage account and the sub-path (if any) under the container, for example, `testcontainer/subpath`. + +#### azure_blob_endpoint + +The endpoint of your Azure Blob Storage Account, for example, `https://test.blob.core.windows.net`. + +#### azure_blob_shared_key + +The Shared Key used to authorize requests for your Azure Blob Storage. + +#### azure_blob_sas_token + +The shared access signatures (SAS) used to authorize requests for your Azure Blob Storage. + +> **Note** +> +> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them. + +If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items: + +```Properties +run_mode = shared_data +cloud_native_meta_port = +enable_load_volume_from_conf = false +``` + +## Configure CN nodes for shared-data StarRocks + + + +## Use your shared-data StarRocks cluster + + + +The following example creates a storage volume `def_volume` for an Azure Blob Storage bucket `defaultbucket` with shared key access, enables the storage volume, and sets it as the default storage volume: + +```SQL +CREATE STORAGE VOLUME def_volume +TYPE = AZBLOB +LOCATIONS = ("azblob://defaultbucket/test/") +PROPERTIES +( + "enabled" = "true", + "azure.blob.endpoint" = "", + "azure.blob.shared_key" = "" +); + +SET def_volume AS DEFAULT STORAGE VOLUME; +``` + + diff --git a/versioned_docs/version-3.1/deployment/shared_data/gcs.md b/versioned_docs/version-3.1/deployment/shared_data/gcs.md new file mode 100644 index 0000000000..4783420329 --- /dev/null +++ b/versioned_docs/version-3.1/deployment/shared_data/gcs.md @@ -0,0 +1,172 @@ +# Deploy StarRocks using GCS + +import SharedDataIntro from '../../assets/commonMarkdown/sharedDataIntro.md' +import SharedDataCNconf from '../../assets/commonMarkdown/sharedDataCNconf.md' +import SharedDataUseIntro from '../../assets/commonMarkdown/sharedDataUseIntro.md' +import SharedDataUse from '../../assets/commonMarkdown/sharedDataUse.md' + + + +## Architecture + +![Shared-data Architecture](../../assets/share_data_arch.png) + +## Deploy a shared-data StarRocks cluster + +The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../deployment/deploy_manually.md). + +> **Note** +> +> Do not start the cluster until after it is configured for shared-storage in the next section of this document. + +## Configure FE nodes for shared-data StarRocks + +Before starting the cluster configure the FEs and CNs. An example configuration is provided below, and then the details for each parameter are provided. + +### Example FE configuration for GCS + +The example shared-data additions for your `fe.conf` can be added to the `fe.conf` file on each +of your FE nodes. Because GCS storage is accessed using the +[Cloud Storage XML API](https://cloud.google.com/storage/docs/xml-api/overview), the parameters +use the prefix `aws_s3`. + + ```Properties + run_mode = shared_data + cloud_native_meta_port = + cloud_native_storage_type = S3 + + # For example, testbucket/subpath + aws_s3_path = + + # For example: us-east1 + aws_s3_region = + + # For example: https://storage.googleapis.com + aws_s3_endpoint = + + aws_s3_access_key = + aws_s3_secret_key = + ``` + +### All FE parameters related to shared-storage with GCS + +#### run_mode + +The running mode of the StarRocks cluster. Valid values: + +- `shared_data` +- `shared_nothing` (Default). + +> **Note** +> +> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported. +> +> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. + +#### cloud_native_meta_port + +The cloud-native meta service RPC port. + +- Default: `6090` + +#### enable_load_volume_from_conf + +Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values: + +- `true` (Default) If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates the built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start. +- `false` If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#create-default-storage-volume). + +Supported from v3.1.0. + +> **CAUTION** +> +> We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them. + +#### cloud_native_storage_type + +The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCS, and MinIO). Valid value: + +- `S3` (Default) +- `AZBLOB`. + +#### aws_s3_path + +The S3 path used to store data. It consists of the name of your S3 bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. + +#### aws_s3_endpoint + +The endpoint used to access your S3 bucket, for example, `https://storage.googleapis.com/` + +#### aws_s3_region + +The region in which your S3 bucket resides, for example, `us-west-2`. + +#### aws_s3_use_instance_profile + +Whether to use Instance Profile and Assumed Role as credential methods for accessing GCS. Valid values: + +- `true` +- `false` (Default). + +If you use IAM user-based credential (Access Key and Secret Key) to access GCS, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`. + +If you use Instance Profile to access GCS, you must specify this item as `true`. + +If you use Assumed Role to access GCS, you must specify this item as `true`, and specify `aws_s3_iam_role_arn`. + +And if you use an external AWS account, you must also specify `aws_s3_external_id`. + +#### aws_s3_access_key + +The HMAC access Key ID used to access your GCS bucket. + +#### aws_s3_secret_key + +The HMAC Secret Access Key used to access your GCS bucket. + +#### aws_s3_iam_role_arn + +The ARN of the IAM role that has privileges on your GCS bucket in which your data files are stored. + +#### aws_s3_external_id + +The external ID of the AWS account that is used for cross-account access to your GCS bucket. + +> **Note** +> +> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them. + +If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items: + +```Properties +run_mode = shared_data +cloud_native_meta_port = +enable_load_volume_from_conf = false +``` + +## Configure CN nodes for shared-data StarRocks + + +## Use your shared-data StarRocks cluster + + + +The following example creates a storage volume `def_volume` for a GCS bucket `defaultbucket` with an HMAC Access Key and Secret Key, enables the storage volume, and sets it as the default storage volume: + +```SQL +CREATE STORAGE VOLUME def_volume +TYPE = S3 +LOCATIONS = ("s3://defaultbucket/test/") +PROPERTIES +( + "enabled" = "true", + "aws.s3.region" = "us-east1", + "aws.s3.endpoint" = "https://storage.googleapis.com", + "aws.s3.access_key" = "", + "aws.s3.secret_key" = "" +); + +SET def_volume AS DEFAULT STORAGE VOLUME; +``` + + diff --git a/versioned_docs/version-3.1/deployment/shared_data/minio.md b/versioned_docs/version-3.1/deployment/shared_data/minio.md new file mode 100644 index 0000000000..c71d5437f0 --- /dev/null +++ b/versioned_docs/version-3.1/deployment/shared_data/minio.md @@ -0,0 +1,171 @@ +# Use MinIO for shared-data + +import SharedDataIntro from '../../assets/commonMarkdown/sharedDataIntro.md' +import SharedDataCNconf from '../../assets/commonMarkdown/sharedDataCNconf.md' +import SharedDataUseIntro from '../../assets/commonMarkdown/sharedDataUseIntro.md' +import SharedDataUse from '../../assets/commonMarkdown/sharedDataUse.md' + + + +## Architecture + +![Shared-data Architecture](../../assets/share_data_arch.png) + +## Deploy a shared-data StarRocks cluster + +The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../deployment/deploy_manually.md). + +### Configure FE nodes for shared-data StarRocks + +Before starting FEs, add the following configuration items in the FE configuration file **fe.conf**. + +#### run_mode + +The running mode of the StarRocks cluster. Valid values: + +- `shared_data` +- `shared_nothing` (Default). + +> **Note** +> +> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported. +> +> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. + +#### cloud_native_meta_port + +The cloud-native meta service RPC port. + +- Default: `6090` + +#### enable_load_volume_from_conf + +Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values: + +- `true` (Default) If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates the built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start. +- `false` If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#create-default-storage-volume). + +Supported from v3.1.0. + +> **CAUTION** +> +> We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them. + +#### cloud_native_storage_type + +The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value: + +- `S3` (Default) +- `AZBLOB`. + +> Note +> +> If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`. +> +> If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`. + +#### aws_s3_path + +The S3 path used to store data. It consists of the name of your S3 bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. + +#### aws_s3_endpoint + +The endpoint used to access your S3 bucket, for example, `https://s3.us-west-2.amazonaws.com`. + +#### aws_s3_region + +The region in which your S3 bucket resides, for example, `us-west-2`. + +#### aws_s3_use_aws_sdk_default_behavior + +Whether to use the [AWS SDK default credentials provider chain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). Valid values: + +- `true` +- `false` (Default). + +#### aws_s3_use_instance_profile + +Whether to use Instance Profile and Assumed Role as credential methods for accessing S3. Valid values: + +- `true` +- `false` (Default). + +If you use IAM user-based credential (Access Key and Secret Key) to access S3, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`. + +If you use Instance Profile to access S3, you must specify this item as `true`. + +If you use Assumed Role to access S3, you must specify this item as `true`, and specify `aws_s3_iam_role_arn`. + +And if you use an external AWS account, you must also specify `aws_s3_external_id`. + +#### aws_s3_access_key + +The Access Key ID used to access your S3 bucket. + +#### aws_s3_secret_key + +The Secret Access Key used to access your S3 bucket. + +#### aws_s3_iam_role_arn + +The ARN of the IAM role that has privileges on your S3 bucket in which your data files are stored. + +#### aws_s3_external_id + +The external ID of the AWS account that is used for cross-account access to your S3 bucket. + +#### azure_blob_path + +The Azure Blob Storage path used to store data. It consists of the name of the container within your storage account and the sub-path (if any) under the container, for example, `testcontainer/subpath`. + +#### azure_blob_endpoint + +The endpoint of your Azure Blob Storage Account, for example, `https://test.blob.core.windows.net`. + +#### azure_blob_shared_key + +The Shared Key used to authorize requests for your Azure Blob Storage. + +#### azure_blob_sas_token + +The shared access signatures (SAS) used to authorize requests for your Azure Blob Storage. + +> **Note** +> +> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them. + +If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items: + +```Properties +run_mode = shared_data +cloud_native_meta_port = +enable_load_volume_from_conf = false +``` + +## Configure CN nodes for shared-data StarRocks + + + +## Use your shared-data StarRocks cluster + + + +The following example creates a storage volume `def_volume` for a MinIO bucket `defaultbucket` with Access Key and Secret Key credentials, enables the storage volume, and sets it as the default storage volume: + +```SQL +CREATE STORAGE VOLUME def_volume +TYPE = S3 +LOCATIONS = ("s3://defaultbucket/test/") +PROPERTIES +( + "enabled" = "true", + "aws.s3.region" = "us-west-2", + "aws.s3.endpoint" = "https://hostname.domainname.com:portnumber", + "aws.s3.access_key" = "xxxxxxxxxx", + "aws.s3.secret_key" = "yyyyyyyyyy" +); + +SET def_volume AS DEFAULT STORAGE VOLUME; +``` + + diff --git a/versioned_docs/version-3.1/deployment/shared_data/s3.md b/versioned_docs/version-3.1/deployment/shared_data/s3.md new file mode 100644 index 0000000000..dc1dd31ff6 --- /dev/null +++ b/versioned_docs/version-3.1/deployment/shared_data/s3.md @@ -0,0 +1,284 @@ +# Use S3 for shared-data + +import SharedDataIntro from '../../assets/commonMarkdown/sharedDataIntro.md' +import SharedDataCNconf from '../../assets/commonMarkdown/sharedDataCNconf.md' +import SharedDataUseIntro from '../../assets/commonMarkdown/sharedDataUseIntro.md' +import SharedDataUse from '../../assets/commonMarkdown/sharedDataUse.md' + + + +## Architecture + +![Shared-data Architecture](../../assets/share_data_arch.png) + +## Deploy a shared-data StarRocks cluster + +The deployment of a shared-data StarRocks cluster is similar to that of a shared-nothing StarRocks cluster. The only difference is that you need to deploy CNs instead of BEs in a shared-data cluster. This section only lists the extra FE and CN configuration items you need to add in the configuration files of FE and CN **fe.conf** and **cn.conf** when you deploy a shared-data StarRocks cluster. For detailed instructions on deploying a StarRocks cluster, see [Deploy StarRocks](../deployment/deploy_manually.md). + +> **Note** +> +> Do not start the cluster until after it is configured for shared-storage in the next section of this document. + +## Configure FE nodes for shared-data StarRocks + +Before starting the cluster configure the FEs and CNs. Example configurations are provided below, and then the details for each parameter are provided. + +### Example FE configurations for S3 + +These are example shared-data additions for your `fe.conf` file on each +of your FE nodes. The examples differ based on the AWS authentication +method being used. + +#### Default authentication credentials + +```Properties +run_mode = shared_data +cloud_native_meta_port = +cloud_native_storage_type = S3 + +# For example, testbucket/subpath +aws_s3_path = + +# For example, us-west-2 +aws_s3_region = + +# For example, https://s3.us-west-2.amazonaws.com +aws_s3_endpoint = + +aws_s3_use_aws_sdk_default_behavior = true + +# Set this to false if you do not want default +# storage created in the object storage using +# the details provided above +enable_load_volume_from_conf = true +``` + +#### IAM user-based credentials + +```Properties +run_mode = shared_data +cloud_native_meta_port = +cloud_native_storage_type = S3 + +# For example, testbucket/subpath +aws_s3_path = + +# For example, us-west-2 +aws_s3_region = + +# credentials for S3 object read/write +aws_s3_access_key = +aws_s3_secret_key = + +# Set this to false if you do not want default +# storage created in the object storage using +# the details provided above +enable_load_volume_from_conf = true +``` + +#### Instance profile + +```Properties +run_mode = shared_data +cloud_native_meta_port = +cloud_native_storage_type = S3 + +# For example, testbucket/subpath +aws_s3_path = + +# For example, us-west-2 +aws_s3_region = + +# For example, https://s3.us-west-2.amazonaws.com +aws_s3_endpoint = + +aws_s3_use_instance_profile = true + +# Set this to false if you do not want default +# storage created in the object storage using +# the details provided above +enable_load_volume_from_conf = true +``` + +#### Assumed role + +```Properties +run_mode = shared_data +cloud_native_meta_port = +cloud_native_storage_type = S3 + +# For example, testbucket/subpath +aws_s3_path = + +# For example, us-west-2 +aws_s3_region = + +# For example, https://s3.us-west-2.amazonaws.com +aws_s3_endpoint = + +aws_s3_use_instance_profile = true +aws_s3_iam_role_arn = + +# Set this to false if you do not want default +# storage created in the object storage using +# the details provided above +enable_load_volume_from_conf = true +``` + +#### Assumed role from an external account + +```Properties +run_mode = shared_data +cloud_native_meta_port = +cloud_native_storage_type = S3 + +# For example, testbucket/subpath +aws_s3_path = + +# For example, us-west-2 +aws_s3_region = + +# For example, https://s3.us-west-2.amazonaws.com +aws_s3_endpoint = + +aws_s3_use_instance_profile = true +aws_s3_iam_role_arn = +aws_s3_external_id = + +# Set this to false if you do not want default +# storage created in the object storage using +# the details provided above +enable_load_volume_from_conf = true +``` + +### All FE parameters related to shared-storage with S3 + +#### run_mode + +The running mode of the StarRocks cluster. Valid values: + +- `shared_data` +- `shared_nothing` (Default). + +> **Note** +> +> You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported. +> +> Do not change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. + +#### cloud_native_meta_port + +The cloud-native meta service RPC port. + +- Default: `6090` + +#### enable_load_volume_from_conf + +Whether to allow StarRocks to create the default storage volume by using the object storage-related properties specified in the FE configuration file. Valid values: + +- `true` (Default) If you specify this item as `true` when creating a new shared-data cluster, StarRocks creates the built-in storage volume `builtin_storage_volume` using the object storage-related properties in the FE configuration file, and sets it as the default storage volume. However, if you have not specified the object storage-related properties, StarRocks fails to start. +- `false` If you specify this item as `false` when creating a new shared-data cluster, StarRocks starts directly without creating the built-in storage volume. You must manually create a storage volume and set it as the default storage volume before creating any object in StarRocks. For more information, see [Create the default storage volume](#create-default-storage-volume). + +Supported from v3.1.0. + +> **CAUTION** +> +> We strongly recommend you leave this item as `true` while you are upgrading an existing shared-data cluster from v3.0. If you specify this item as `false`, the databases and tables you created before the upgrade become read-only, and you cannot load data into them. + +#### cloud_native_storage_type + +The type of object storage you use. In shared-data mode, StarRocks supports storing data in Azure Blob (supported from v3.1.1 onwards), and object storages that are compatible with the S3 protocol (such as AWS S3, Google GCP, and MinIO). Valid value: + +- `S3` (Default) +- `AZBLOB`. + +#### aws_s3_path + +The S3 path used to store data. It consists of the name of your S3 bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. + +#### aws_s3_endpoint + +The endpoint used to access your S3 bucket, for example, `https://s3.us-west-2.amazonaws.com`. + +#### aws_s3_region + +The region in which your S3 bucket resides, for example, `us-west-2`. + +#### aws_s3_use_aws_sdk_default_behavior + +Whether to use the [AWS SDK default credentials provider chain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). Valid values: + +- `true` +- `false` (Default). + +#### aws_s3_use_instance_profile + +Whether to use Instance Profile and Assumed Role as credential methods for accessing S3. Valid values: + +- `true` +- `false` (Default). + +If you use IAM user-based credential (Access Key and Secret Key) to access S3, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`. + +If you use Instance Profile to access S3, you must specify this item as `true`. + +If you use Assumed Role to access S3, you must specify this item as `true`, and specify `aws_s3_iam_role_arn`. + +And if you use an external AWS account, you must also specify `aws_s3_external_id`. + +#### aws_s3_access_key + +The Access Key ID used to access your S3 bucket. + +#### aws_s3_secret_key + +The Secret Access Key used to access your S3 bucket. + +#### aws_s3_iam_role_arn + +The ARN of the IAM role that has privileges on your S3 bucket in which your data files are stored. + +#### aws_s3_external_id + +The external ID of the AWS account that is used for cross-account access to your S3 bucket. + +> **Note** +> +> Only credential-related configuration items can be modified after your shared-data StarRocks cluster is created. If you changed the original storage path-related configuration items, the databases and tables you created before the change become read-only, and you cannot load data into them. + +If you want to create the default storage volume manually after the cluster is created, you only need to add the following configuration items: + +```Properties +run_mode = shared_data +cloud_native_meta_port = +enable_load_volume_from_conf = false +``` + +## Configure CN nodes for shared-data StarRocks + + +## Use your shared-data StarRocks cluster + + + +The following example creates a storage volume `def_volume` for an AWS S3 bucket `defaultbucket` with the IAM user-based credential (Access Key and Secret Key), enables the storage volume, and sets it as the default storage volume: + +```SQL +CREATE STORAGE VOLUME def_volume +TYPE = S3 +LOCATIONS = ("s3://defaultbucket/test/") +PROPERTIES +( + "enabled" = "true", + "aws.s3.region" = "us-west-2", + "aws.s3.endpoint" = "https://s3.us-west-2.amazonaws.com", + "aws.s3.use_aws_sdk_default_behavior" = "false", + "aws.s3.use_instance_profile" = "false", + "aws.s3.access_key" = "xxxxxxxxxx", + "aws.s3.secret_key" = "yyyyyyyyyy" +); + +SET def_volume AS DEFAULT STORAGE VOLUME; +``` + + diff --git a/versioned_docs/version-3.1/sql-reference/sql-statements/Administration/CREATE STORAGE VOLUME.md b/versioned_docs/version-3.1/sql-reference/sql-statements/Administration/CREATE STORAGE VOLUME.md index 0777738102..7fdfaee4bd 100644 --- a/versioned_docs/version-3.1/sql-reference/sql-statements/Administration/CREATE STORAGE VOLUME.md +++ b/versioned_docs/version-3.1/sql-reference/sql-statements/Administration/CREATE STORAGE VOLUME.md @@ -4,7 +4,7 @@ Creates a storage volume for a remote storage system. This feature is supported from v3.1. -A storage volume consists of the properties and credential information of the remote data storage. You can reference a storage volume when you create databases and cloud-native tables in a [shared-data StarRocks cluster](../../../deployment/deploy_shared_data.md). +A storage volume consists of the properties and credential information of the remote data storage. You can reference a storage volume when you create databases and cloud-native tables in a [shared-data StarRocks cluster](../../../deployment/shared_data/s3.md). > **CAUTION** > diff --git a/versioned_docs/version-3.1/sql-reference/sql-statements/data-definition/CREATE TABLE.md b/versioned_docs/version-3.1/sql-reference/sql-statements/data-definition/CREATE TABLE.md index 389b92b8c5..081a56d5b9 100644 --- a/versioned_docs/version-3.1/sql-reference/sql-statements/data-definition/CREATE TABLE.md +++ b/versioned_docs/version-3.1/sql-reference/sql-statements/data-definition/CREATE TABLE.md @@ -589,7 +589,7 @@ PROPERTIES ( #### Create cloud-native tables for StarRocks Shared-data cluster -To [use your StarRocks Shared-data cluster](../../../deployment/deploy_shared_data.md#use-your-shared-data-starrocks-cluster), you must create cloud-native tables with the following properties: +To [use your StarRocks Shared-data cluster](../../../deployment/shared_data/s3.md#use-your-shared-data-starrocks-cluster), you must create cloud-native tables with the following properties: ```SQL PROPERTIES ( diff --git a/versioned_docs/version-3.1/table_design/Data_distribution.md b/versioned_docs/version-3.1/table_design/Data_distribution.md index 59877a6e97..8169240763 100644 --- a/versioned_docs/version-3.1/table_design/Data_distribution.md +++ b/versioned_docs/version-3.1/table_design/Data_distribution.md @@ -183,7 +183,7 @@ The number of buckets: By default, StarRocks automatically sets the number of bu > **NOTICE** > -> Since v3.1, StarRocks's [shared-data mode](../deployment/deploy_shared_data.md) supports the time function expression and does not support the column expression. +> Since v3.1, StarRocks's [shared-data mode](../deployment/shared_data/s3.md) supports the time function expression and does not support the column expression. Since v3.0, StarRocks supports [expression partitioning](./expression_partitioning.md) (previously known as automatic partitioning) which is more flexible and easy-to-use. This partitioning method is suitable for most scenarios such as querying and managing data based on continuous date ranges or enum values. diff --git a/versioned_docs/version-3.1/table_design/expression_partitioning.md b/versioned_docs/version-3.1/table_design/expression_partitioning.md index 8b07bfc67a..82d001edbb 100644 --- a/versioned_docs/version-3.1/table_design/expression_partitioning.md +++ b/versioned_docs/version-3.1/table_design/expression_partitioning.md @@ -249,7 +249,7 @@ MySQL > SHOW PARTITIONS FROM t_recharge_detail1; ## Limits -- Since v3.1, StarRocks's [shared-data mode](../deployment/deploy_shared_data.md) supports the time function expression and does not support the column expression. +- Since v3.1, StarRocks's [shared-data mode](../deployment/shared_data/s3.md) supports the time function expression and does not support the column expression. - Currently, using CTAS to create tables configured expression partitioning is not supported. - Currently, using Spark Load to load data to tables that use expression partitioning is not supported. - When the `ALTER TABLE DROP PARTITION ` statement is used to delete a partition created by using the column expression, data in the partition is directly removed and cannot be recovered. diff --git a/versioned_docs/version-3.1/table_design/list_partitioning.md b/versioned_docs/version-3.1/table_design/list_partitioning.md index 609ff620a2..4ca63d3943 100644 --- a/versioned_docs/version-3.1/table_design/list_partitioning.md +++ b/versioned_docs/version-3.1/table_design/list_partitioning.md @@ -105,7 +105,7 @@ DISTRIBUTED BY HASH(`id`); ## Limits - List partitioning does support dynamic partitoning and creating multiple partitions at a time. -- Currently, StarRocks's [shared-data mode](../deployment/deploy_shared_data.md) does not support this feature. +- Currently, StarRocks's [shared-data mode](../deployment/shared_data/s3.md) does not support this feature. - When the `ALTER TABLE DROP PARTITION ;` statement is used to delete a partition created by using list partitioning, data in the partition is directly removed and cannot be recovered. - Currently you cannot [backup and restore](../administration/Backup_and_restore.md) partitions created by the list partitioning. - Currently, StarRocks does not support creating [asynchronous materialized views](../using_starrocks/Materialized_view.md) with base tables created with the list partitioning strategy.