Skip to content

Commit

Permalink
[Refactor](data-partition) Refactor data partition docs (#1364)
Browse files Browse the repository at this point in the history
1. fix wrong description of Null Range Partition
2. fix lost of Multi-Column Auto List Partition
3. explain support of Auto&Dynamic Partition and explain Best Practice
4. fix wrong version number and docs for tvf partitions
5. show examples directly in basic-concepts
6. add Partition Retrieval section in basic-concepts
7. fix wrong version number of insert overwrite+auto partition

testcases sync: apache/doris#44191

# Versions 

- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0

# Languages

- [x] Chinese
- [x] English
  • Loading branch information
zclllyybb authored Nov 19, 2024
1 parent a665b9d commit 59ffabb
Show file tree
Hide file tree
Showing 31 changed files with 1,818 additions and 688 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

## auto_partition_name
### description
### Description
#### Syntax

`VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)`
Expand All @@ -40,9 +39,9 @@ The datetime parameter is a legal date expression.

The unit parameter is the time interval you want, the available values are: [`second`, `minute`, `hour`, `day`, `month`, `year`].
If unit does not match one of these options, a syntax error will be returned.
### example

```
### Example
```sql
mysql> select auto_partition_name('range', 'years', '123');
ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument

Expand Down Expand Up @@ -108,9 +107,8 @@ mysql> select auto_partition_name('list', "你好");
+------------------------------------+
| p4f60597d2 |
+------------------------------------+
```

### keywords
### Keywords

AUTO_PARTITION_NAME,AUTO,PARTITION,NAME
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ The table function generates a temporary partition TABLE, which allows you to vi

This function is used in the from clause.

This function is supported since 2.1.5

#### Syntax

`partitions("catalog"="","database"="","table"="")`
Expand Down
173 changes: 104 additions & 69 deletions docs/table-design/data-partitioning/auto-partitioning.md

Large diffs are not rendered by default.

206 changes: 202 additions & 4 deletions docs/table-design/data-partitioning/basic-concepts.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

## auto_partition_name
### description
### Description
#### Syntax

`VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)`
Expand All @@ -40,9 +39,9 @@ datetime 参数是合法的日期表达式。

unit 参数是您希望的时间间隔,可选的值如下:[`second`,`minute`,`hour`,`day`,`month`,`year`]
如果 unit 不符合上述可选值,结果将返回语法错误。
### example

```
### Example
```sql
mysql> select auto_partition_name('range', 'years', '123');
ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument

Expand Down Expand Up @@ -108,9 +107,8 @@ mysql> select auto_partition_name('list', "你好");
+------------------------------------+
| p4f60597d2 |
+------------------------------------+
```

### keywords
### Keywords

AUTO_PARTITION_NAME,AUTO,PARTITION,NAME
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ partitions

该函数用于 From 子句中。

该函数自 2.1.5 版本开始支持。

#### Syntax

`partitions("catalog"="","database"="","table"="")`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,49 +72,36 @@ PROPERTIES (
);
```



该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了 2022 年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)对表的分区进行更改。如果这种分区需要变更,或者进行更细粒度的细分,修改起来非常繁琐。此时我们就可以使用 AUTO PARTITION 改写该表 DDL。

## 语法

建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的`partition_info`部分:
建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的 `partition_info` 部分:

1. AUTO RANGE PARTITION:

```sql
```sql
AUTO PARTITION BY RANGE (FUNC_CALL_EXPR)
(
)
```



其中
()
```

```sql
其中
```sql
FUNC_CALL_EXPR ::= date_trunc ( <partition_column>, '<interval>' )
```



​ 注意:在 2.1.0 版本,`FUNC_CALL_EXPR` 外围不需要被括号包围。
```

2. AUTO LIST PARTITION:

```sql
AUTO PARTITION BY LIST(`partition_col`)
(
)
AUTO PARTITION BY LIST(`partition_col1`[, `partition_col2`, ...])
()
```



### 用法示例

1. AUTO RANGE PARTITION

```sql
```sql
CREATE TABLE `date_table` (
`TIME_STAMP` datev2 NOT NULL COMMENT '采集日期'
) ENGINE=OLAP
Expand All @@ -126,13 +113,11 @@ AUTO PARTITION BY LIST(`partition_col`)
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```


```

2. AUTO LIST PARTITION

```sql
```sql
CREATE TABLE `str_table` (
`str` varchar not null
) ENGINE=OLAP
Expand All @@ -144,7 +129,9 @@ AUTO PARTITION BY LIST(`partition_col`)
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
```
```

LIST 自动分区支持多个分区列,分区列写法同普通 LIST 分区一样: ```AUTO PARTITION BY LIST (`col1`, `col2`, ...)```

### 约束

Expand All @@ -155,9 +142,9 @@ AUTO PARTITION BY LIST(`partition_col`)

### NULL 值分区

当开启 session variable `allow_partition_column_nullable`,LIST 和 RANGE 分区都支持 NULL 列作为分区列。当分区列实际遇到 NULL 值的插入时
当开启 session variable `allow_partition_column_nullable` 后:

1. 对于 AUTO LIST PARTITION,会自动创建对应的 NULL 值分区:
1. 对于 AUTO LIST PARTITION,可以使用 NULLABLE 列作为分区列,会正常创建对应的 NULL 值分区:

```sql
mysql> create table auto_null_list(
Expand Down Expand Up @@ -190,9 +177,7 @@ mysql> select * from auto_null_list partition(pX);
1 row in set (0.20 sec)
```



1. 对于 AUTO RANGE PARTITION,**不支持 NULLABLE 列作为分区列**
2. 对于 AUTO RANGE PARTITION,**不支持 NULLABLE 列作为分区列**

```sql
mysql> CREATE TABLE `range_table_nullable` (
Expand All @@ -211,8 +196,6 @@ mysql> CREATE TABLE `range_table_nullable` (
ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column
```


## 场景示例
在使用场景一节中的示例,在使用 AUTO PARTITION 后,该表 DDL 可以改写为:
Expand All @@ -234,9 +217,7 @@ PROPERTIES (
);
```


此时新表没有默认分区:
以此表只有两列为例,此时新表没有默认分区:
```sql
mysql> show partitions from `DAILY_TRADE_VALUE`;
Expand All @@ -258,16 +239,59 @@ mysql> show partitions from `DAILY_TRADE_VALUE`;
| 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
3 rows in set (0.12 sec)
```
经过自动分区功能所创建的 PARTITION,与手动创建的 PARTITION 具有完全一致的功能性质。
## 与动态分区联用
为使分区逻辑清晰,Doris 禁止自动分区(Auto Partition)和动态分区(Dynamic Partition)同时作用于一张表上,这种用法容易引发误用,应当以单独的自动分区功能代替。
Doris 支持自动分区和动态分区同时使用。此时,二者的功能都生效:
1. 自动分区将会自动在数据导入过程中按需创建分区;
2. 动态分区将会自动创建、回收、转储分区。
二者语法功能不存在冲突,同时设置对应的子句/属性即可。
### 最佳实践
注意:在 Doris 2.1 的某些早期版本中,该功能未被禁止,但不推荐使用。
需要对分区生命周期设限的场景,可以**将 Dynamic Partition 的创建功能关闭,创建分区完全交由 Auto Partition 完成**,通过 Dynamic Partition 动态回收分区的功能完成分区生命周期的管理:
```sql
create table auto_dynamic(
k0 datetime(6) NOT NULL
)
auto partition by range (date_trunc(k0, 'year'))
(
)
DISTRIBUTED BY HASH(`k0`) BUCKETS 2
properties(
"dynamic_partition.enable" = "true",
"dynamic_partition.prefix" = "p",
"dynamic_partition.start" = "-50",
"dynamic_partition.end" = "0", --- Dynamic Partition 不创建分区
"dynamic_partition.time_unit" = "year",
"replication_num" = "1"
);
```
这样我们同时具有了 Auto Partition 的灵活性,且分区名上保持了一致性。
## 分区管理
当启用自动分区后,分区名可以通过 `auto_partition_name` 函数映射到分区。`partitions` 表函数可以通过分区名产生详细的分区信息。仍然以 `DAILY_TRADE_VALUE` 表为例,在我们插入数据后,查看其当前分区:
```sql
mysql> select * from partitions("catalog"="internal","database"="optest","table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03');
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+
| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+
1 row in set (0.18 sec)
```
这样每个分区的 ID 和取值就可以精准地被筛选出,用于后续针对分区的具体操作(例如 `insert overwrite partition`)。
详细语法说明请见:[auto_partition_name函数文档](../../sql-manual/sql-functions/string-functions/auto-partition-name),[partitions表函数文档](../../sql-manual/sql-functions/table-valued-functions/partitions)。
## 注意事项
Expand All @@ -276,5 +300,9 @@ mysql> show partitions from `DAILY_TRADE_VALUE`;
- 使用 AUTO PARTITION 的表,只是分区创建方式上由手动转为了自动。表及其所创建分区的原本使用方法都与非 AUTO PARTITION 的表或分区相同。
- 为防止意外创建过多分区,我们通过[FE 配置项](../../admin-manual/config/fe-config)中的`max_auto_partition_num`控制了一个 AUTO PARTITION 表最大容纳分区数。如有需要可以调整该值
- 向开启了 AUTO PARTITION 的表导入数据时,Coordinator 发送数据的轮询间隔与普通表有所不同。具体请见[BE 配置项](../../admin-manual/config/be-config)中的`olap_table_sink_send_interval_auto_partition_factor`。开启前移(`enable_memtable_on_sink_node = true`)后该变量不产生影响。
- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时,如果指定了覆写的 partition,则 AUTO PARTITION 表在此过程中表现得如同普通表,不创建新的分区
- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时 AUTO PARTITION 表的行为详见 INSERT OVERWRITE 文档
- 如果导入创建分区时,该表涉及其他元数据操作(如 Schema Change、Rebalance),则导入可能失败。
## 关键词
AUTO, PARTITION, AUTO_PARTITION
Loading

0 comments on commit 59ffabb

Please sign in to comment.