-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): batch get_compact_task/apply_compact_task in once transaction #15523
Conversation
0e821b7
to
2b64892
Compare
Signed-off-by: Little-Wallace <[email protected]>
ff39dae
to
1331a66
Compare
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
@@ -460,7 +460,7 @@ pub async fn compact( | |||
) * compact_task.splits.len() as u64; | |||
|
|||
tracing::info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does remove this log? It is acceptable to me, we often need to check the input of task in prod environment (without successful execution)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the size log file is too large and make test fail....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. The task input log here is very useful and I suggest we keep it. I also find that the log size is larger when debugging other issues and I guess it is related to logging table_ids and table_vnode_partition_info
compact_task_to_string
. Should we optimize that instead of removing the log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will revert this deletion.
Some(config) => config, | ||
None => continue, | ||
}; | ||
if !compaction_statuses.contains_key(&compaction_group_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments for lazy initialization
let is_trivial_reclaim = CompactStatus::is_trivial_reclaim(&compact_task); | ||
let is_trivial_move = CompactStatus::is_trivial_move_task(&compact_task); | ||
if is_trivial_reclaim || (is_trivial_move && can_trivial_move) { | ||
let log_label = if is_trivial_reclaim { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using the labe directly ?
table_stats_change: table_stats_change.unwrap_or_default(), | ||
}]) | ||
.await?; | ||
Ok(rets[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return Vec<bool> instead of rets[0]? Assuming that we ensure that all report tasks are successful, then this return value is meaningless
Signed-off-by: Little-Wallace <[email protected]>
3be0916
to
910339d
Compare
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
e185075
to
7c2bf52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some earlir comments first. Will continue review later.
compactor.context_id(), | ||
); | ||
self.compactor_manager.remove_compactor(context_id); | ||
meet_error = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cancel the unsend but generated tasks in compact_tasks
if we exit the loop early? cc @Li0k I noticed that we also don't cancel the task prior to this PR but I think the effect can be more significant here because we may generate many tasks and lock many SSTs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it worked before this PR. I think it is an optimization to proactively cancel the assignment of all tasks when send fails. However, it requires reacquiring the lock. The previous philosophy was to handle the failure problem through heartbeat timeout, which may delay the 30s, but it is simpler and unified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
{ | ||
// apply result | ||
compact_task.task_status = task.task_status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add back this debug assert?
debug_assert!(
compact_task.task_status() != TaskStatus::Pending,
"report pending compaction task"
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand, The task shall be 'Pending' before we report it ?
Signed-off-by: Little-Wallace <[email protected]>
013cb50
to
d4b28d7
Compare
Signed-off-by: Little-Wallace <[email protected]>
.insert(*table_id, vnode_partition_count); | ||
} | ||
} else { | ||
compact_task.table_vnode_partition = table_to_vnode_partition.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need clone this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is in a loop
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -460,7 +460,7 @@ pub async fn compact( | |||
) * compact_task.splits.len() as u64; | |||
|
|||
tracing::info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. The task input log here is very useful and I suggest we keep it. I also find that the log size is larger when debugging other issues and I guess it is related to logging table_ids and table_vnode_partition_info
compact_task_to_string
. Should we optimize that instead of removing the log?
for table_id in &compact_task.existing_table_ids { | ||
compact_task | ||
.table_vnode_partition | ||
.insert(*table_id, vnode_partition_count); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a question instead of a comment because this logic existed prior to this PR: is it possible to enter this branch for cg2 and cg3 (cg with more than one existing_table_ids
)? If yes, using compact_task.input.vnode_partition_count
for all tables looks strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because split_weight_by_vnode
of cg2 and cg2 will always equals to zero.
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
…saction (#15523) Signed-off-by: Little-Wallace <[email protected]>
…transaction (#15523) (#16419) Signed-off-by: Little-Wallace <[email protected]> Co-authored-by: Wallace <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.