-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove group-by v1 #14866
remove group-by v1 #14866
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally!! 🎉
I looked through and thought about whether there are other things that could be updated. What are your thoughts on these two?
longLast
,longFirst
,doubleFirst
,doubleLast
could stop lying about their intermediate types. IIRC they only had to lie about this to make groupBy v1 work.- OnheapIncrementalIndex no longer needs to support multithreaded writers, so
addToFacts
could be simplified.
I actually did this initially, but then noticed that #14462 was also making this change, so reverted for now. That said, it would only be a minor conflict with the other PR, so Im willing to add it back.
Yeah, totally, i actually imagine a lot of other parts of |
OK, fair enough. We can leave it for #14462.
Up to you how much you want to do in this PR. I'd at least adjust the comment in there that mentions groupBy v1. Other stuff could be done as future work. |
docs/configuration/index.md
Outdated
Note: Even if cache is enabled, for [groupBy v2](../querying/groupbyquery.md#strategies) queries, segment level cache do not work on Brokers. | ||
See [Differences between v1 and v2](../querying/groupbyquery.md#differences-between-v1-and-v2) and [Query caching](../querying/caching.md) for more information. | ||
Note: Even if cache is enabled, for [groupBy](../querying/groupbyquery.md) queries, segment level cache does not work on Brokers. | ||
See [query caching](../querying/caching.md) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See [query caching](../querying/caching.md) for more information. | |
See [Query caching](../querying/caching.md) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see lots of other links like this.... but why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean, its in the middle of a sentence, why would we uppercase it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, personally I find the suggested style to be better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess like... I disagree, but we have a lot of other links doing that too.
Why should the A in "Advance groupBy v2 configurations" and the M in "Memory tuning and resource limits" be uppercase?
Is this a matter of perspective? Like it bothers me because I view these inline links as just some text that is part of a sentence that happen to be clickable. But i guess another perspective might be that this is some proper section title and it should match as closely as possible? I still don't really personally like it, but i changed it because a handful of other links are like this on this and other pages (not all of them though...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it seems better because upper-casing highlights that part of the text. But yes, in the end, its a matter of perspective. I don't have an objective argument for either or against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clintropolis In this specific example, Query caching is the title of the page and must be referenced using sentence case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO @ektravel's perspective makes sense. If the link text is a title of a page and is referring to the page itself then capitalizing the first letter seems good. It emphasizes that we're referring to a documentation page rather than a concept.
To me this is correct (the text refers to the page being linked to):
See [Query caching](../querying/caching.md) for more information.
And this is also correct (the text refers to a concept, not the page being linked to):
It is important to set up [query caching](../querying/caching.md) properly for best performance.
Btw, the following is also IMO correct, but awkward and I'd avoid it in favor of the first example. Here "query caching" is a common noun acting as an adjective for "documentation"; it's not referring to the page being linked to, so it shouldn't be capitalized:
See the [query caching](../querying/caching.md) documentation for more information.
docs/querying/groupbyquery.md
Outdated
@@ -440,6 +372,7 @@ Supported query contexts: | |||
|
|||
|Key|Description|Default| | |||
|---|-----------|-------| | |||
|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.| | |
|`groupByIsSingleThreaded`|Overrides the value of `druid.query.groupBy.singleThreaded` for this query.|None| |
The entry on line 375 is missing the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the docs portion.
Following apache#14866, there is no longer a reason for IncrementalIndex#add to be thread-safe. It turns out it already was not using its selectors in a thread-safe way, as exposed by apache#15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors. At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors. This patch also: 1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator, which the similar datasketches versions already have. This is done to help them adhere to the contract of Aggregator: concurrent calls to "aggregate" and "get" must be thread-safe. 2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the druid-benchmarks module.
* IncrementalIndex#add is no longer thread-safe. Following #14866, there is no longer a reason for IncrementalIndex#add to be thread-safe. It turns out it already was not using its selectors in a thread-safe way, as exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors. At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors. This patch also: 1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator, which the similar datasketches versions already have. This is done to help them adhere to the contract of Aggregator: concurrent calls to "aggregate" and "get" must be thread-safe. 2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the druid-benchmarks module. * Spelling. * Changes from static analysis. * Fix javadoc.
* IncrementalIndex#add is no longer thread-safe. Following apache#14866, there is no longer a reason for IncrementalIndex#add to be thread-safe. It turns out it already was not using its selectors in a thread-safe way, as exposed by apache#15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors. At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors. This patch also: 1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator, which the similar datasketches versions already have. This is done to help them adhere to the contract of Aggregator: concurrent calls to "aggregate" and "get" must be thread-safe. 2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the druid-benchmarks module. * Spelling. * Changes from static analysis. * Fix javadoc.
* IncrementalIndex#add is no longer thread-safe. Following #14866, there is no longer a reason for IncrementalIndex#add to be thread-safe. It turns out it already was not using its selectors in a thread-safe way, as exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors. At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors. This patch also: 1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator, which the similar datasketches versions already have. This is done to help them adhere to the contract of Aggregator: concurrent calls to "aggregate" and "get" must be thread-safe. 2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the druid-benchmarks module. * Spelling. * Changes from static analysis. * Fix javadoc. Co-authored-by: Gian Merlino <[email protected]>
Description
This PR removes the 'v1' grouping engine as well as the strategy selection abstraction. I found
GroupByStrategyV2
to still be a relatively handy thing to keep around since it is a nice container for grabbing the resources and stuff needed and plumbing to the underlying group by v2 engine, so I have transitioned it into a new classGroupingEngine
which is now used in place of prior uses ofGroupByStrategySelector
. Alternatively its functionality could have been distributed among the query runner and toolchest, but it seemed a bit more disruptive.Release note
The V1 group by query has been removed in favor of always using the V2 engine. Any query context parameters associated with it will now be ignored internally.
AggregatorFactory.getRequiredColumns
has been deprecated and will be removed in a future release. If you have an extension that implementsAggregatorFactory
then this method should be removed from your implementation (there is now a default definition that throws an exception about being deprecated).This PR has: