Limit pages size to a configurable limit #14994

cryptoe · 2023-09-15T07:00:02Z

Adding the ability to limit the pages sizes of select queries.

We piggyback on the same machinery that is used to control the numRowsPerSegment.
This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows.
This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
[] a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

LakshSingla

Was done with a partial review, but this question popped up in my mind: It seems like we are adding a new stage to partition by page size. I think this is not an optimal way since adding a new stage means that we'd be shuffling again.
In cases when fault tolerance is enabled, this also means an extra round of upload and download. There would be a performance penalty associated with this since this feature is intended for querying from deep storage where the result set would be huge.
If the above is true, I think we should look for an alternative than creating a new stage.

docs/multi-stage-query/reference.md

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

LakshSingla · 2023-09-21T07:53:18Z

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQSelectTest.java

-        .setExpectedCountersForStageWorkerChannel(
-            CounterSnapshotMatcher
-                .with().rows(6).frames(1),
-            0, 0, "output"
-        )
-        .setExpectedCountersForStageWorkerChannel(
-            CounterSnapshotMatcher
-                .with().rows(6).frames(1),
-            0, 0, "shuffle"
-        )


Why are we removing the assertion?

Its was more of an influx thing. Fixed it

extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/MSQTestBase.java

LakshSingla · 2023-09-21T07:56:07Z

...ulti-stage-query/src/test/java/org/apache/druid/msq/util/SqlStatementResourceHelperTest.java

+    return channelCounters;
+  }
+
+  public static class TestFrame extends Frame


Suggested change

public static class TestFrame extends Frame

private static class TestFrame extends Frame

Let's not create a separate class for a single use

I just removed this class and went with mocks.

LakshSingla · 2023-09-21T08:00:17Z

processing/src/main/java/org/apache/druid/frame/Frame.java

@@ -105,7 +105,7 @@ public class Frame
  private final int numRegions;
  private final boolean permuted;

-  private Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)
+  protected Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)


Suggested change

protected Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)

private Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)

Please revert this change. We shouldn't change the access level in the main class to allow testing. At worst, we can mock the class in the tests, but if we do want to assert something that won't ever be possible and we are setting up a defensive check for the same.

We are basically annotating the constructor here with @VisibleForTesting, without mentioning it
#11848 (comment)

# Conflicts: # docs/multi-stage-query/reference.md

adarshsanjeev · 2023-10-11T14:10:29Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

-      );
+      if (finalShuffleStageDef.doesSortDuringShuffle()) {
+        final QueryDefinitionBuilder builder = QueryDefinition.builder();
+        for (final StageDefinition stageDef : queryDef.getStageDefinitions()) {


nit: could use org.apache.druid.msq.kernel.QueryDefinitionBuilder#addAll(org.apache.druid.msq.kernel.QueryDefinition) here

adarshsanjeev · 2023-10-11T14:43:17Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

-      // we add a final stage which generates one partition per worker.
-      shuffleSpecFactory = ShuffleSpecFactories.globalSortWithMaxPartitionCount(tuningConfig.getMaxNumWorkers());
+      shuffleSpecFactory = ShuffleSpecFactories.getGlobalSortWithTargetSize(
+          MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context())


I might be missing something in somewhere else in this PR, but doesn't GlobalSortTargetSizeShuffleSpec enforce the limit on the total partition size summed across all workers? Since we create a new page for each worker parition combination, would the limit be enforced?

GlobalSortTargetSizeShuffleSpec enforces a limit on partition size globally yes.

So there can be 2 cases:

If the last stage is group by post shuffle, then we know that each partition will only be present on distinct worker only. Hence the page size will control the number of rows in that partition.

If the last stage is scanStage, then we add a new QueryResultFrameProcessor since data needs to be sorted on the boost column. The queryResultFrameProcessor will merge the result in the same partition and write out a single partition. Since the partition cuts on sizes are done globally, in the controller, we would have the final partition equal to the page size configured.

I have added a new testcase testExternSelectWithMultipleWorkers. You can look at the counter checks to see whats happening with a scan query.

LakshSingla

This PR is also supposed to help with the case of missing results reported in community Slack.
Let us add a test case that fails with the existing code and is fixed by the patch.

As discussed offline, you mentioned that it was due to a single processor producing multiple frames, so we should add that to confirm that the regression is fixed.

cryptoe · 2023-10-11T16:02:19Z

@LakshSingla that test case is already added if you see SqlStatementResourceHelperTest#testDistinctPartitionsOnEachWorker() .

LakshSingla · 2023-10-11T17:05:02Z

Okay I looked at SqlStatementResourceHelper#populatePageList, and we are using counters during the execution time to determine the page list. Shouldn't that be risky, considering counters are a best-effort user-facing statistics for the job and not something we can rely upon.

cryptoe · 2023-10-12T06:13:39Z

@LakshSingla
In theory I agree with you that counters are a little risky in the long run and we should have something else in the task report but there are no good alternatives yet. Once we go down on generating frame indexes, then we can think of changing the approach of this.

LakshSingla · 2023-10-12T06:18:15Z

Once we go down on generating frame indexes, then we can think of changing the approach of this.

Why do we need indexes on frames for this?

LakshSingla · 2023-10-12T06:22:48Z

@LakshSingla that test case is already added if you see SqlStatementResourceHelperTest#testDistinctPartitionsOnEachWorker() .

Also, if possible please add a MSQ select test for the regression. Using counters is risky as is and we'd wanna make sure that we are tackling the issue we are seeing E2E.

cryptoe · 2023-10-12T06:37:48Z

Why do we need indexes on frames for this?

That will help us in supporting usecases in the result api with api params like startRowOffset and range. Then users need not fetch via pages.

cryptoe · 2023-10-12T07:15:25Z

@LakshSingla that test case is already added if you see SqlStatementResourceHelperTest#testDistinctPartitionsOnEachWorker() .

Also, if possible please add a MSQ select test for the regression. Using counters is risky as is and we'd wanna make sure that we are tackling the issue we are seeing E2E.

Added another test SqlMSQStatementResourcePostTest#testMultipleWorkersWithPageSizeLimiting which covers that case e2e.

LakshSingla

LGTM post CI 🚀

cryptoe · 2023-10-12T08:30:59Z

Thanks @adarshsanjeev @LakshSingla for the review.

Adding the ability to limit the pages sizes of select queries. We piggyback on the same machinery that is used to control the numRowsPerSegment. This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows. This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage. (cherry picked from commit 61ea9e0)

Adding the ability to limit the pages sizes of select queries. We piggyback on the same machinery that is used to control the numRowsPerSegment. This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows. This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.

marzi312 · 2024-02-22T17:47:18Z

hey @cryptoe
I've tested this feature with a curl command, POST to /v2/sql/statements but it completely ignores rowsPerPage parameter.
Example:
curl -XPOST -H "Content-Type: application/json" -d '{"context": {"includeSegmentSource":"REALTIME","selectDestination":"DURABLESTORAGE", "executionMode":"ASYNC", "durableShuffleStorage": "true", "rowsPerPage":"1000"}, "query":"SELECT * FROM my_table limit 200000"}' https://<url>/druid/v2/sql/statements

rowsPerPage default value is 100000 but it seems it even ignores the default value for me.
I tested this with Druid V29

and here's the results from get query status endpoint
...."pages":[{"id":0,"numRows":200000,"sizeInBytes":123026540}]}

LakshSingla · 2024-02-22T20:53:18Z

@marzi312 It is happening because you have a LIMIT in front of your query. It is a bug in the MSQ engine. I'll raise a patch for the same.

Initial patch for page size limiting

90564dc

github-actions bot added Area - Documentation Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Sep 15, 2023

LakshSingla removed the Area - Querying label Sep 15, 2023

LakshSingla reviewed Sep 20, 2023

View reviewed changes

docs/multi-stage-query/reference.md Show resolved Hide resolved

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java Outdated Show resolved Hide resolved

LakshSingla reviewed Sep 21, 2023

View reviewed changes

cryptoe added 2 commits October 1, 2023 12:43

Merge remote-tracking branch 'apache/master' into pageSizeLimiting

1624db3

Upstream merge things.

6cd5b36

github-actions bot added the Area - Querying label Oct 9, 2023

cryptoe added 4 commits October 10, 2023 22:13

Merge remote-tracking branch 'apache/master' into pageSizeLimiting

1cf1aa0

# Conflicts: # docs/multi-stage-query/reference.md

Addressing review comments.

be9b736

Addressing review comments.

65cea5c

Fixing forbidden check

cb4101f

adarshsanjeev reviewed Oct 11, 2023

View reviewed changes

Fixing checkstyle.

79a4eb0

LakshSingla reviewed Oct 11, 2023

View reviewed changes

Adding some more tests.

6766b34

Review

0def1aa

Adding multiWorker test case.

00e0829

Adding more tests.

2f0fba8

LakshSingla approved these changes Oct 12, 2023

View reviewed changes

cryptoe merged commit 61ea9e0 into apache:master Oct 12, 2023
11 checks passed

cryptoe mentioned this pull request Oct 12, 2023

28.0.0 backports #15139

Merged

cryptoe mentioned this pull request Oct 12, 2023

Fix flaky assertion in MSQ Unit test #15142

Merged

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

LakshSingla added this to the 29.0.0 milestone Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit pages size to a configurable limit #14994

Limit pages size to a configurable limit #14994

cryptoe commented Sep 15, 2023

LakshSingla left a comment

LakshSingla Sep 21, 2023

cryptoe Oct 11, 2023

LakshSingla Sep 21, 2023

cryptoe Oct 11, 2023

LakshSingla Sep 21, 2023 •

edited

Loading

adarshsanjeev Oct 11, 2023

adarshsanjeev Oct 11, 2023

cryptoe Oct 12, 2023

cryptoe Oct 12, 2023

LakshSingla left a comment •

edited

Loading

cryptoe commented Oct 11, 2023

LakshSingla commented Oct 11, 2023

cryptoe commented Oct 12, 2023

LakshSingla commented Oct 12, 2023 •

edited

Loading

LakshSingla commented Oct 12, 2023

cryptoe commented Oct 12, 2023 •

edited

Loading

cryptoe commented Oct 12, 2023

LakshSingla left a comment

cryptoe commented Oct 12, 2023

marzi312 commented Feb 22, 2024

LakshSingla commented Feb 22, 2024

	public static class TestFrame extends Frame
	private static class TestFrame extends Frame

	protected Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)
	private Frame(Memory memory, FrameType frameType, long numBytes, int numRows, int numRegions, boolean permuted)

Limit pages size to a configurable limit #14994

Limit pages size to a configurable limit #14994

Conversation

cryptoe commented Sep 15, 2023

LakshSingla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LakshSingla left a comment • edited Loading

Choose a reason for hiding this comment

cryptoe commented Oct 11, 2023

LakshSingla commented Oct 11, 2023

cryptoe commented Oct 12, 2023

LakshSingla commented Oct 12, 2023 • edited Loading

LakshSingla commented Oct 12, 2023

cryptoe commented Oct 12, 2023 • edited Loading

cryptoe commented Oct 12, 2023

LakshSingla left a comment

Choose a reason for hiding this comment

cryptoe commented Oct 12, 2023

marzi312 commented Feb 22, 2024

LakshSingla commented Feb 22, 2024

LakshSingla Sep 21, 2023 •

edited

Loading

LakshSingla left a comment •

edited

Loading

LakshSingla commented Oct 12, 2023 •

edited

Loading

cryptoe commented Oct 12, 2023 •

edited

Loading