Skip to content

Commit

Permalink
[Docs] Remove ambiguous advice regarding TopN correctness (apache#17522)
Browse files Browse the repository at this point in the history
  • Loading branch information
techdocsmith authored Nov 27, 2024
1 parent f3e1f1e commit 0325f62
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/querying/topnquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ sidebar_label: "TopN"

Apache Druid TopN queries return a sorted set of results for the values in a given dimension according to some criteria. Conceptually, they can be thought of as an approximate [GroupByQuery](../querying/groupbyquery.md) over a single dimension with an [Ordering](../querying/limitspec.md) spec. TopNs are much faster and resource efficient than GroupBys for this use case. These types of queries take a topN query object and return an array of JSON objects where each object represents a value asked for by the topN query.

TopNs are approximate in that each data process will rank their top K results and only return those top K results to the Broker. K, by default in Druid, is `max(1000, threshold)`. In practice, this means that if you ask for the top 1000 items ordered, the correctness of the first ~900 items will be 100%, and the ordering of the results after that is not guaranteed. TopNs can be made more accurate by increasing the threshold.
TopNs are approximate in that each data process will rank their top K results and only return those top K results to the Broker. K, by default in Druid, is `max(1000, threshold)`.

A topN query object looks like:

Expand Down

0 comments on commit 0325f62

Please sign in to comment.