feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523

Little-Wallace · 2024-03-07T12:49:24Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

batch several get_compact_task and apply_compact_task together to reduce cost of ETCD backend.
batch these operation so that we only need to clone once version.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace · 2024-03-14T04:58:39Z

From dev environment deployed by some user:

The process latency of commit_epoch decrease so much even if we do not change any code of it because this operation must wait write-lock of version. And when the time cost by report_compact_task and get_compact_task reduce, commit_epoch can get lock more quickly.

Signed-off-by: Little-Wallace <[email protected]>

src/meta/src/hummock/manager/tests.rs

src/meta/src/hummock/manager/compaction.rs

src/meta/src/hummock/manager/mod.rs

Li0k · 2024-03-25T09:30:47Z

src/storage/src/hummock/compactor/compactor_runner.rs

@@ -460,7 +460,7 @@ pub async fn compact(
    ) * compact_task.splits.len() as u64;

    tracing::info!(


Why does remove this log? It is acceptable to me, we often need to check the input of task in prod environment (without successful execution)

Because the size log file is too large and make test fail....

+1. The task input log here is very useful and I suggest we keep it. I also find that the log size is larger when debugging other issues and I guess it is related to logging table_ids and table_vnode_partition_info
compact_task_to_string. Should we optimize that instead of removing the log?

OK, I will revert this deletion.

Li0k · 2024-03-25T10:33:15Z

src/meta/src/hummock/manager/mod.rs

+                Some(config) => config,
+                None => continue,
+            };
+            if !compaction_statuses.contains_key(&compaction_group_id) {


add comments for lazy initialization

src/meta/src/hummock/manager/mod.rs

Li0k · 2024-03-25T10:45:02Z

src/meta/src/hummock/manager/mod.rs

+                let is_trivial_reclaim = CompactStatus::is_trivial_reclaim(&compact_task);
+                let is_trivial_move = CompactStatus::is_trivial_move_task(&compact_task);
+                if is_trivial_reclaim || (is_trivial_move && can_trivial_move) {
+                    let log_label = if is_trivial_reclaim {


how about using the labe directly ?

src/meta/src/hummock/manager/mod.rs

Li0k · 2024-03-26T06:07:23Z

src/meta/src/hummock/manager/mod.rs

+                table_stats_change: table_stats_change.unwrap_or_default(),
+            }])
+            .await?;
+        Ok(rets[0])


Should we return Vec<bool> instead of rets[0]? Assuming that we ensure that all report tasks are successful, then this return value is meaningless

src/meta/src/hummock/manager/mod.rs

Li0k · 2024-03-26T06:30:17Z

@hzxa21 @zwang28 PTAL, this pr refactor the "get" and "report' logic, I think this is a critical path and we need to confirm its correctness again and again

Signed-off-by: Little-Wallace <[email protected]>

src/meta/src/hummock/manager/mod.rs

Signed-off-by: Little-Wallace <[email protected]>

hzxa21

Left some earlir comments first. Will continue review later.

src/meta/src/hummock/manager/mod.rs

src/meta/src/hummock/manager/compaction.rs

hzxa21 · 2024-03-28T16:28:25Z

src/meta/src/hummock/manager/compaction.rs

+                                        compactor.context_id(),
+                                    );
+                                    self.compactor_manager.remove_compactor(context_id);
+                                    meet_error = true;


Do we need to cancel the unsend but generated tasks in compact_tasks if we exit the loop early? cc @Li0k I noticed that we also don't cancel the task prior to this PR but I think the effect can be more significant here because we may generate many tasks and lock many SSTs here.

Good catch, it worked before this PR. I think it is an optimization to proactively cancel the assignment of all tasks when send fails. However, it requires reacquiring the lock. The previous philosophy was to handle the failure problem through heartbeat timeout, which may delay the 30s, but it is simpler and unified.

Fix with f45011e
@hzxa21

src/meta/src/hummock/manager/compaction.rs

hzxa21 · 2024-03-28T16:37:04Z

src/meta/src/hummock/manager/mod.rs

+
+            {
+                // apply result
+                compact_task.task_status = task.task_status;


Should we add back this debug assert?

debug_assert!( compact_task.task_status() != TaskStatus::Pending, "report pending compaction task" );

I do not understand, The task shall be 'Pending' before we report it ?

src/meta/src/hummock/manager/tests.rs

Signed-off-by: Little-Wallace <[email protected]>

Li0k · 2024-04-02T05:43:13Z

src/meta/src/hummock/manager/mod.rs

+                                .insert(*table_id, vnode_partition_count);
+                        }
+                    } else {
+                        compact_task.table_vnode_partition = table_to_vnode_partition.clone();


why need clone this line

Because it is in a loop

Signed-off-by: Little-Wallace <[email protected]>

Li0k

LGTM, thanks for the effort.

hzxa21

LGTM

hzxa21 · 2024-04-01T10:12:03Z

src/storage/src/hummock/compactor/compactor_runner.rs

@@ -460,7 +460,7 @@ pub async fn compact(
    ) * compact_task.splits.len() as u64;

    tracing::info!(


+1. The task input log here is very useful and I suggest we keep it. I also find that the log size is larger when debugging other issues and I guess it is related to logging table_ids and table_vnode_partition_info
compact_task_to_string. Should we optimize that instead of removing the log?

hzxa21 · 2024-04-02T10:19:22Z

src/meta/src/hummock/manager/mod.rs

+                        for table_id in &compact_task.existing_table_ids {
+                            compact_task
+                                .table_vnode_partition
+                                .insert(*table_id, vnode_partition_count);
+                        }


This is a question instead of a comment because this logic existed prior to this PR: is it possible to enter this branch for cg2 and cg3 (cg with more than one existing_table_ids)? If yes, using compact_task.input.vnode_partition_count for all tables looks strange to me.

No, because split_weight_by_vnode of cg2 and cg2 will always equals to zero.

Signed-off-by: Little-Wallace <[email protected]>

…saction (#15523) Signed-off-by: Little-Wallace <[email protected]>

…transaction (#15523) (#16419) Signed-off-by: Little-Wallace <[email protected]> Co-authored-by: Wallace <[email protected]>

github-actions bot added the type/feature label Mar 7, 2024

Little-Wallace marked this pull request as ready for review March 7, 2024 13:12

Little-Wallace force-pushed the wallace/batch-get branch from 0e821b7 to 2b64892 Compare March 8, 2024 08:49

batch get

1331a66

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace force-pushed the wallace/batch-get branch from ff39dae to 1331a66 Compare March 13, 2024 07:49

Little-Wallace added 3 commits March 14, 2024 19:36

apply task in once loop

e2dc747

Signed-off-by: Little-Wallace <[email protected]>

fix type

69e1aa5

Signed-off-by: Little-Wallace <[email protected]>

Merge branch 'main' into wallace/batch-get

8b89287

Little-Wallace mentioned this pull request Mar 15, 2024

fix(meta): trivial move bug #15659

Merged

9 tasks

MrCroxx reviewed Mar 25, 2024

View reviewed changes

src/meta/src/hummock/manager/tests.rs Outdated Show resolved Hide resolved

src/meta/src/hummock/manager/compaction.rs Outdated Show resolved Hide resolved

Li0k reviewed Mar 25, 2024

View reviewed changes

src/meta/src/hummock/manager/mod.rs Show resolved Hide resolved

src/meta/src/hummock/manager/mod.rs Show resolved Hide resolved

Li0k reviewed Mar 26, 2024

View reviewed changes

Li0k requested review from hzxa21 and zwang28 March 26, 2024 06:30

support table schema

910339d

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace force-pushed the wallace/batch-get branch from 3be0916 to 910339d Compare March 26, 2024 07:47

Little-Wallace added 2 commits March 26, 2024 15:48

Merge branch 'main' into wallace/batch-get

1b1e924

fix metrics

645f758

Signed-off-by: Little-Wallace <[email protected]>

Li0k reviewed Mar 27, 2024

View reviewed changes

src/meta/src/hummock/manager/mod.rs Show resolved Hide resolved

limit loop count

7c2bf52

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace force-pushed the wallace/batch-get branch from e185075 to 7c2bf52 Compare March 27, 2024 09:15

Merge branch 'main' into wallace/batch-get

4c80639

hzxa21 reviewed Mar 28, 2024

View reviewed changes

hzxa21 reviewed Mar 31, 2024

View reviewed changes

src/meta/src/hummock/manager/tests.rs Outdated Show resolved Hide resolved

address comment

d4b28d7

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace force-pushed the wallace/batch-get branch from 013cb50 to d4b28d7 Compare April 1, 2024 05:47

fix vnode partition

264cb7f

Signed-off-by: Little-Wallace <[email protected]>

Li0k reviewed Apr 2, 2024

View reviewed changes

Little-Wallace added 2 commits April 2, 2024 14:32

batch cancel

f45011e

Signed-off-by: Little-Wallace <[email protected]>

fix test

a8fc6f4

Signed-off-by: Little-Wallace <[email protected]>

Li0k approved these changes Apr 2, 2024

View reviewed changes

hzxa21 approved these changes Apr 2, 2024

View reviewed changes

Little-Wallace added 2 commits April 2, 2024 19:52

revert log

73f3a79

Signed-off-by: Little-Wallace <[email protected]>

do not print input sst

840b512

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace added this pull request to the merge queue Apr 2, 2024

Merged via the queue into main with commit 7310910 Apr 2, 2024
26 of 27 checks passed

Little-Wallace deleted the wallace/batch-get branch April 2, 2024 13:36

Li0k mentioned this pull request Apr 8, 2024

perf(compactor): Record changes related to the compactor component #15973

Open

Li0k pushed a commit that referenced this pull request Apr 19, 2024

feat(storage): batch get_compact_task/apply_compact_task in once tran…

bc57c1f

…saction (#15523) Signed-off-by: Little-Wallace <[email protected]>

hzxa21 mentioned this pull request Apr 20, 2024

feat(cherry-pick): batch get_compact_task/apply_compact_task in once transaction (#15523) #16419

Merged

9 tasks

github-merge-queue bot pushed a commit that referenced this pull request Apr 22, 2024

feat(cherry-pick): batch get_compact_task/apply_compact_task in once …

c55d825

…transaction (#15523) (#16419) Signed-off-by: Little-Wallace <[email protected]> Co-authored-by: Wallace <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523

feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523

Little-Wallace commented Mar 7, 2024

Little-Wallace commented Mar 14, 2024

Li0k Mar 25, 2024

Little-Wallace Mar 27, 2024

hzxa21 Apr 1, 2024

Little-Wallace Apr 2, 2024

Li0k Mar 25, 2024

Li0k Mar 25, 2024 •

edited

Loading

Li0k Mar 26, 2024

Li0k commented Mar 26, 2024

hzxa21 left a comment

hzxa21 Mar 28, 2024

Li0k Apr 1, 2024

Li0k Apr 2, 2024

hzxa21 Mar 28, 2024

Little-Wallace Apr 2, 2024

Li0k Apr 2, 2024

Little-Wallace Apr 2, 2024

Li0k left a comment

hzxa21 left a comment

hzxa21 Apr 1, 2024

hzxa21 Apr 2, 2024

Little-Wallace Apr 2, 2024

		@@ -460,7 +460,7 @@ pub async fn compact(
		) * compact_task.splits.len() as u64;

		tracing::info!(

feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523

feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523

Conversation

Little-Wallace commented Mar 7, 2024

What's changed and what's your intention?

Checklist

Documentation

Release note

Little-Wallace commented Mar 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k commented Mar 26, 2024

hzxa21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k left a comment

Choose a reason for hiding this comment

hzxa21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k Mar 25, 2024 •

edited

Loading