Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(storage): fix compactor task parallelism race #16387
fix(storage): fix compactor task parallelism race #16387
Changes from 6 commits
de6db46
b8666cd
874ec03
a384c99
9931c74
0fd60f9
f136e6f
d8b1526
f000910
a42ec94
3b4b613
c83d81d
dc9b272
9f4b393
3e9bbd7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this line causes a behavior change. Prior to this PR, when the condition is hit, we will only have one split. After this PR, we will have two splits. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code implementation was traced back:
@Little-Wallace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a good practice to rely on the caller's behavior to make sure the implementation of a function is expected. It is too implicit. What's the problem of directing returning 1 split
if indexes.len() <= parallelism
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After refactoring the calculation, we can unify the parallelism calculation and eliminate the overuse of
generate_splits_fast
. We can then revert this code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
generate_splits_fast
here seems to be an overkill because we don't actually need to calculate the index from sstable info and generate the split in order to know the task parallelism. We just need the logic in L183-194 and the input sst len:Also, I think we can pass the parallelism as a parameter to
generate_split
andgenerate_split_fast
because we already pre-calculate it after this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I noticed that the parallelism calculation is not exactly the same in
generate_splits
(L268-L279) andgenerate_splits_fast
(L183-L194). Is this expected? Which one should we follow here?