Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit pages size to a configurable limit #14994
Limit pages size to a configurable limit #14994
Changes from all commits
90564dc
1624db3
6cd5b36
1cf1aa0
be9b736
65cea5c
cb4101f
79a4eb0
6766b34
0def1aa
00e0829
2f0fba8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something in somewhere else in this PR, but doesn't GlobalSortTargetSizeShuffleSpec enforce the limit on the total partition size summed across all workers? Since we create a new page for each worker parition combination, would the limit be enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalSortTargetSizeShuffleSpec enforces a limit on partition size globally yes.
So there can be 2 cases:
If the last stage is group by post shuffle, then we know that each partition will only be present on distinct worker only. Hence the page size will control the number of rows in that partition.
If the last stage is scanStage, then we add a new
QueryResultFrameProcessor
since data needs to be sorted on the boost column. The queryResultFrameProcessor will merge the result in the same partition and write out a single partition. Since the partition cuts on sizes are done globally, in the controller, we would have the final partition equal to the page size configured.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new testcase
testExternSelectWithMultipleWorkers
. You can look at the counter checks to see whats happening with a scan query.