Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): refactor ReportTask event to reduce the network packet size #11711

Closed
wants to merge 306 commits into from

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Aug 16, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Due to the upgrade of tonic, in some scenarios we found that the size of the ReportTask Event exceeded 4m.

The reason is that the CompactTask carries KeyRange information, so the size of the CompactTask will change depending on the key size of the business.

Since the original compact task is already stored in the assignment, the ReportTask doesn't need to pass back the entire compact task, just the sorted_output_ssts are sufficient. This pr is modified from the above:

  1. Simplify the parameters of ReportEvent
  2. refactoring the parameters of the report_compact_task interface
  • Ensure that all tasks find assignment in memory.
  • Find task at report time based on assignment
  • Do not change information about the task externally

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Merging #11711 (86f6889) into main (0874a48) will increase coverage by 0.00%.
The diff coverage is 85.07%.

❗ Current head 86f6889 differs from pull request most recent head 8c8fc64. Consider uploading reports for the commit 8c8fc64 to get more accurate results

@@           Coverage Diff            @@
##             main   #11711    +/-   ##
========================================
  Coverage   70.28%   70.28%            
========================================
  Files        1366     1366            
  Lines      228417   228519   +102     
========================================
+ Hits       160541   160624    +83     
- Misses      67876    67895    +19     
Flag Coverage Δ
rust 70.28% <85.07%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/meta/src/manager/notification.rs 73.06% <ø> (ø)
src/storage/src/hummock/compactor/mod.rs 67.81% <0.00%> (-0.17%) ⬇️
src/meta/src/hummock/manager/worker.rs 74.68% <66.66%> (-0.63%) ⬇️
src/meta/src/hummock/manager/mod.rs 61.99% <75.16%> (+0.24%) ⬆️
src/meta/src/hummock/test_utils.rs 96.72% <88.88%> (-0.23%) ⬇️
src/storage/hummock_test/src/compactor_tests.rs 96.46% <90.16%> (-0.44%) ⬇️
src/storage/hummock_test/src/sync_point_tests.rs 97.61% <93.33%> (-0.22%) ⬇️
...ta/src/hummock/manager/compaction_group_manager.rs 79.05% <100.00%> (+0.14%) ⬆️
src/meta/src/hummock/manager/tests.rs 93.92% <100.00%> (-0.02%) ⬇️

... and 12 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Li0k Li0k changed the title feat(storage): trim compact task before report task refactor(storage): refactor ReportTask event to reduce the network packet size Aug 17, 2023
{
let is_trivial_reclaim = CompactStatus::is_trivial_reclaim(&compact_task);
let is_trivial_move = CompactStatus::is_trivial_move_task(&compact_task);
let can_trivial_move: bool =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_trivial_reclaim, is_trivial_move and can_trivial_move are already calculated inside get_compact_task_impl.

Copy link
Contributor

@Little-Wallace Little-Wallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 1059 to 1064
let compaction = compaction_guard.deref_mut();
let mut compact_task_assignment =
BTreeMapTransaction::new(&mut compaction.compact_task_assignment);

// remove task_assignment
let _ = compact_task_assignment.remove(compact_task.task_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues here:

  1. Why do we need to explicitly remove the task assignment on failure? Shouldn't it be done in report_compact_task_impl?
  2. We only new a BTreeMapTransaction and don't commit it to etcd/memory. This makes the remove a no-op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this task commit failed, we could still remove its assignment because we will never commit it again

Comment on lines 1088 to 1093
let compaction = compaction_guard.deref_mut();
let mut compact_task_assignment =
BTreeMapTransaction::new(&mut compaction.compact_task_assignment);

// remove task_assignment
let _ = compact_task_assignment.remove(compact_task.task_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

self.check_state_consistency().await;
} else {
// For trivial tasks, commit the memory state for subsequent reports.
compact_task_assignment.commit();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only commit task assignment to memory here assuming the caller will them report the task and clear the task assignment later. This logic is too subtle and is prone to error (easy to cause inconsistency between memory and etcd state). I suggest we switch to one of the following options:
a. commit the task assignment to both etcd and memory regardless of what task it is.
b. do not populate the task assignment at all for trivial task and report directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't commit compact_status to both memory and etcd for trivial task. Will this be a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because trivial task shall not change compact_status.

Copy link
Contributor Author

@Li0k Li0k Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzxa21 This point also confuse me.

The trivial task is added to the task assignment in order to take it out when reporting the task. Since we are no longer passing the entire compact task structure. And, if we record the trivial-task's task assignment in etcd, this will take even more time.

tabVersion and others added 19 commits September 15, 2023 15:01
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: TennyZhuang <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: TennyZhuang <[email protected]>
kwannoel and others added 27 commits September 15, 2023 15:25
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bugen Zhao <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Co-authored-by: Runji Wang <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nto li0k/storage_reduce_report_event_size
@Li0k Li0k requested a review from a team as a code owner September 15, 2023 10:09
…nto li0k/storage_reduce_report_event_size
@Li0k Li0k closed this Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.