feat(batch): distributed dml #14630

chenzl25 · 2024-01-17T14:18:12Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolve feat: Distributed Insertion #14574
Add session variable batch_enable_distributed_dml to control whether to execute DML in a distributed way.
Currently, we use a hash shuffle between batch insert/delete/update and its input. In the future, we could use a round robin instead to avoid the hash calculation cost if necessary.
Use a sum Agg to accumulate the affected rows when we plan a distributed DML.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Introduce a session variable batch_enable_distributed_dml to enable batch ingesting. Enable distributed DML, so an insert, delete, and update statement can be executed in a distributed way (e.g. running in multiple compute nodes).No atomicity guarantee in this mode. Its goal is to gain the best ingestion performance for initial batch ingestion where users always can drop their table when failure happens.

…d_insert

chenzl25 · 2024-01-17T14:43:01Z

Experiment:
Table t contains 20million rows about 1.6GB data

4CN: each 1c4g

Test: insert select

CREATE TABLE t2 (a INT, b CHARACTER VARYING) Append only;
insert into t2 select * from t;

distributed_dml = true (66% performance better)

Time: 30s
CPU utilization of each CN is even.

distributed_dml = false

Time: 50s
One CN CPU utilization has reached 100%

chenzl25 · 2024-01-17T15:34:09Z

However, when we use CREATE TABLE t2 (a INT, b CHARACTER VARYING) as the destination table. There is no big difference between distributed DML and the non-distributed one, because conflict check causes a back-pressure.

It takes about 180s, so we need to improve the table (with conflict check) ingestion performance.

liurenjie1024

LGTM, thanks!

fuyufjh · 2024-01-18T02:37:56Z

src/common/src/session_config/mod.rs

@@ -94,6 +94,10 @@ pub struct ConfigMap {
    #[parameter(default = true, rename = "rw_batch_enable_sort_agg")]
    batch_enable_sort_agg: bool,

+    /// Enable distributed dml, so a insert, delete and update statement could be executed in distributed way (e.g. running in multiple compute nodes).
+    #[parameter(default = false, rename = "batch_enable_distributed_dml")]


Let's do some performance test and make default to true.

make default to true

It comes with the cost for no atomicity guarantee. 😕

If we support reading external sources (e.g. iceberg) in the future, we can detect it from the plan and auto-enable it.

fuyufjh · 2024-01-18T02:50:36Z

src/frontend/src/optimizer/plan_node/batch_delete.rs

+            // Add an hash shuffle between the delete and its input.
+            let new_input = RequiredDist::PhysicalDist(Distribution::HashShard(
+                (0..self.input().schema().len()).collect(),
+            ))


If I remember correctly, the hash used in batch exchange is different from streaming, and we have added a so-called ConsistentHash for the streaming hash distribution.

risingwave/proto/batch_plan.proto

Lines 344 to 345 in f85de37

HASH = 3;

CONSISTENT_HASH = 4;

Here which distribution does this generated BatchExchange follow?

This one HASH = 3

If I remember correctly, the hash used in batch exchange is different from streaming, and we have added a so-called ConsistentHash for the streaming hash distribution.

use batch hash distribution here is ok because we still have the stream hash exchange before the materialize executor

BugenZhao · 2024-01-18T03:14:55Z

May I ask how the atomicity is guaranteed under distributed mode?

chenzl25 · 2024-01-18T03:19:37Z

May I ask how the atomicity is guaranteed under distributed mode?

No atomicity guarantee in this mode. Its goal is to gain the best ingestion performance for initial batch ingestion where users always can drop their table when failure happens.

BugenZhao · 2024-01-18T03:22:07Z

May I ask how the atomicity is guaranteed under distributed mode?

No atomicity guarantee in this mode. Its goal is to gain the best ingestion performance for initial batch ingestion where users always can drop their table when failure happens.

Got it. What about documenting it somewhere?

BugenZhao

Rest LGTM

BugenZhao · 2024-01-18T03:25:35Z

src/frontend/planner_test/tests/testdata/output/delete.yaml

+    BatchSimpleAgg { aggs: [sum()] }
+    └─BatchExchange { order: [], dist: Single }
+      └─BatchDelete { table: t }


This reminds me of an issue in ancient times 🤣

#2678

src/frontend/src/optimizer/plan_node/batch_update.rs

src/frontend/src/scheduler/distributed/stage.rs

st1page · 2024-01-18T06:11:24Z

However, when we use CREATE TABLE t2 (a INT, b CHARACTER VARYING) as the destination table. There is no big difference between distributed DML and the non-distributed one, because conflict check causes a back-pressure.

It takes about 180s, so we need to improve the table (with conflict check) ingestion performance.

It is strange and we need do some invesitigation later.
Because in the

Experiment:
Table t contains 2000W rows about 1.6GB data

4CN: each 1c4g

All keys should be in the cache and the handle conflict should not be the bottleneck.
does all new inserted key has the new pk? maybe a bloomfilter in memory can help the situation.

chenzl25 · 2024-01-18T06:33:58Z

All keys should be in the cache and the handle conflict should not be the bottleneck. does all new inserted key has the new pk? maybe a bloomfilter in memory can help the situation.

Table t could be in the cache, but t2 is a newly created table and the compactor is running.

st1page · 2024-01-18T06:40:57Z

All keys should be in the cache and the handle conflict should not be the bottleneck. does all new inserted key has the new pk? maybe a bloomfilter in memory can help the situation.

Table t could be in the cache, but t2 is a newly created table and the compactor is running.

Ok, So we need a way to check those non-append-only tables without pk to check if the row_id in the insert opRow is a generated by row_id_gen (do not need to handle conflict)?

chenzl25 · 2024-01-18T06:42:54Z

Table t could be in the cache, but t2 is a newly created table and the compactor is running.

Let me kill the compactor to test it again first as suggested by @hzxa21

chenzl25 · 2024-01-18T07:02:23Z

Table t could be in the cache, but t2 is a newly created table and the compactor is running.

Let me kill the compactor to test it again first as suggested by @hzxa21

Without compactors, barriers would pile up when batch ingestion.

chenzl25 · 2024-01-18T07:04:27Z

Ok, So we need a way to check those non-append-only tables without pk to check if the row_id in the insert opRow is a generated by row_id_gen (do not need to handle conflict)?

#14635 I think we can just disable the conflict check for the table without any downstream mv. Our storage actually has provided overwrite semantic.

chenzl25 added 8 commits January 12, 2024 20:43

test

12e24be

Merge remote-tracking branch 'origin/main' into dylan/test_distribute…

96e9c98

…d_insert

support distributed dml

86487f1

fix

818f38b

fix batch default scheduler parallelism

56baffe

revert risedev yml

851be91

Merge remote-tracking branch 'origin/main' into dylan/distributed_dml

2fdbdae

revert example toml

0baacc0

github-actions bot added the type/feature label Jan 17, 2024

add planner test

a5510ea

chenzl25 added 2 commits January 17, 2024 23:08

fix test

4f5e330

Merge remote-tracking branch 'origin/main' into dylan/distributed_dml

7065f69

fmt

2f43d5d

chenzl25 requested review from BugenZhao, st1page, stdrc, fuyufjh and liurenjie1024 January 17, 2024 15:39

liurenjie1024 approved these changes Jan 18, 2024

View reviewed changes

fuyufjh reviewed Jan 18, 2024

View reviewed changes

BugenZhao added the user-facing-changes Contains changes that are visible to users label Jan 18, 2024

BugenZhao reviewed Jan 18, 2024

View reviewed changes

chenzl25 mentioned this pull request Jan 18, 2024

feat: disable the conflict check for a table if it has no downstream mv #14635

Closed

chenzl25 added 2 commits January 18, 2024 12:05

refine

3c8c497

fmt

b42417a

chenzl25 requested review from fuyufjh and BugenZhao January 18, 2024 04:27

chenzl25 added this pull request to the merge queue Jan 18, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 18, 2024

chenzl25 added this pull request to the merge queue Jan 18, 2024

Merged via the queue into main with commit 0f79291 Jan 18, 2024
27 of 28 checks passed

chenzl25 deleted the dylan/distributed_dml branch January 18, 2024 15:40

Little-Wallace pushed a commit that referenced this pull request Jan 20, 2024

feat(batch): distributed dml (#14630)

9576cba

chenzl25 mentioned this pull request Jan 23, 2024

Feat: Batch ingest iceberg/file source #14742

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): distributed dml #14630

feat(batch): distributed dml #14630

chenzl25 commented Jan 17, 2024 •

edited

Loading

chenzl25 commented Jan 17, 2024 •

edited

Loading

chenzl25 commented Jan 17, 2024 •

edited

Loading

liurenjie1024 left a comment

fuyufjh Jan 18, 2024

BugenZhao Jan 18, 2024

chenzl25 Jan 18, 2024

fuyufjh Jan 18, 2024

chenzl25 Jan 18, 2024

st1page Jan 18, 2024

BugenZhao commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

BugenZhao commented Jan 18, 2024

BugenZhao left a comment

BugenZhao Jan 18, 2024

chenzl25 Jan 18, 2024

st1page commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

st1page commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

chenzl25 commented Jan 18, 2024 •

edited

Loading

feat(batch): distributed dml #14630

feat(batch): distributed dml #14630

Conversation

chenzl25 commented Jan 17, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

chenzl25 commented Jan 17, 2024 • edited Loading

distributed_dml = true (66% performance better)

distributed_dml = false

chenzl25 commented Jan 17, 2024 • edited Loading

liurenjie1024 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BugenZhao commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

BugenZhao commented Jan 18, 2024

BugenZhao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

st1page commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

st1page commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

chenzl25 commented Jan 18, 2024

chenzl25 commented Jan 18, 2024 • edited Loading

chenzl25 commented Jan 17, 2024 •

edited

Loading

chenzl25 commented Jan 17, 2024 •

edited

Loading

chenzl25 commented Jan 17, 2024 •

edited

Loading

chenzl25 commented Jan 18, 2024 •

edited

Loading