Skip to content

Commit

Permalink
doc fixes, add outputType to ExpressionPostAggregator to make docs true
Browse files Browse the repository at this point in the history
  • Loading branch information
clintropolis committed Nov 1, 2023
1 parent aaa1486 commit bcefe45
Show file tree
Hide file tree
Showing 26 changed files with 355 additions and 274 deletions.
2 changes: 1 addition & 1 deletion docs/multi-stage-query/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ The following table lists the context parameters for the MSQ task engine:
| `maxNumTasks` | SELECT, INSERT, REPLACE<br /><br />The maximum total number of tasks to launch, including the controller task. The lowest possible value for this setting is 2: one controller and one worker. All tasks must be able to launch simultaneously. If they cannot, the query returns a `TaskStartTimeout` error code after approximately 10 minutes.<br /><br />May also be provided as `numTasks`. If both are present, `maxNumTasks` takes priority. | 2 |
| `taskAssignment` | SELECT, INSERT, REPLACE<br /><br />Determines how many tasks to use. Possible values include: <ul><li>`max`: Uses as many tasks as possible, up to `maxNumTasks`.</li><li>`auto`: When file sizes can be determined through directory listing (for example: local files, S3, GCS, HDFS) uses as few tasks as possible without exceeding 512 MiB or 10,000 files per task, unless exceeding these limits is necessary to stay within `maxNumTasks`. When calculating the size of files, the weighted size is used, which considers the file format and compression format used if any. When file sizes cannot be determined through directory listing (for example: http), behaves the same as `max`.</li></ul> | `max` |
| `finalizeAggregations` | SELECT, INSERT, REPLACE<br /><br />Determines the type of aggregation to return. If true, Druid finalizes the results of complex aggregations that directly appear in query results. If false, Druid returns the aggregation's intermediate type rather than finalized type. This parameter is useful during ingestion, where it enables storing sketches directly in Druid tables. For more information about aggregations, see [SQL aggregation functions](../querying/sql-aggregations.md). | true |
| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type values are stored in Druid segments. When set to `'array'` (recommended for SQL compliance), Druid will store all ARRAY typed values in [ARRAY typed columns](../querying/arrays.md), and supports storing both VARCHAR and numeric typed arrays. When set to `'mvd'` (the default, for backwards compatibility), Druid only supports VARCHAR typed arrays, and will store them as [multi-value string columns](../querying/multi-value-dimensions.md). When set to `none`, Druid will throw an exception when trying to store any type of arrays, used to help migrate operators from `'mvd'` mode to `'array'` mode and force query writers to make an explicit choice between ARRAY and multi-value VARCHAR typed columns. | `'mvd'` (for backwards compatibility, recommended to use `array` for SQL compliance)|
| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type values are stored in Druid segments. When set to `array` (recommended for SQL compliance), Druid will store all ARRAY typed values in [ARRAY typed columns](../querying/arrays.md), and supports storing both VARCHAR and numeric typed arrays. When set to `mvd` (the default, for backwards compatibility), Druid only supports VARCHAR typed arrays, and will store them as [multi-value string columns](../querying/multi-value-dimensions.md). When set to `none`, Druid will throw an exception when trying to store any type of arrays. `none` is most useful when set in the system default query context with (`druid.query.default.context.arrayIngestMode=none`) to be used to help migrate operators from `mvd` mode to `array` mode and force query writers to make an explicit choice between ARRAY and multi-value VARCHAR typed columns. | `mvd` (for backwards compatibility, recommended to use `array` for SQL compliance)|
| `sqlJoinAlgorithm` | SELECT, INSERT, REPLACE<br /><br />Algorithm to use for JOIN. Use `broadcast` (the default) for broadcast hash join or `sortMerge` for sort-merge join. Affects all JOIN operations in the query. This is a hint to the MSQ engine and the actual joins in the query may proceed in a different way than specified. See [Joins](#joins) for more details. | `broadcast` |
| `rowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to store in memory at once before flushing to disk during the segment generation process. Ignored for non-INSERT queries. In most cases, use the default value. You may need to override the default if you run into one of the [known issues](./known-issues.md) around memory usage. | 100,000 |
| `segmentSortOrder` | INSERT or REPLACE<br /><br />Normally, Druid sorts rows in individual segments using `__time` first, followed by the [CLUSTERED BY](#clustered-by) clause. When you set `segmentSortOrder`, Druid sorts rows in segments using this column list first, followed by the CLUSTERED BY order.<br /><br />You provide the column list as comma-separated values or as a JSON array in string form. If your query includes `__time`, then this list must begin with `__time`. For example, consider an INSERT query that uses `CLUSTERED BY country` and has `segmentSortOrder` set to `__time,city`. Within each time chunk, Druid assigns rows to segments based on `country`, and then within each of those segments, Druid sorts those rows by `__time` first, then `city`, then `country`. | empty list |
Expand Down
2 changes: 1 addition & 1 deletion docs/querying/multi-value-dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ By default, Druid sorts values in multi-value dimensions. This behavior is contr
See [Dimension Objects](../ingestion/ingestion-spec.md#dimension-objects) for information on configuring multi-value handling.

### SQL-based ingestion
Multi-value dimensions can also be inserted with [SQL-based ingestion](../multi-stage-query/index.md). The multi-stage query engine does not have direct handling of class Druid multi-value dimensions. A special pair of functions, `MV_TO_ARRAY` which converts multi-value dimensions into `VARCHAR ARRAY` and `ARRAY_TO_MV` to coerce them back into `VARCHAR` exist to enable handling these types. Multi-value handling is not available when using the multi-stage query engine to insert data.
Multi-value dimensions can also be inserted with [SQL-based ingestion](../multi-stage-query/index.md). The functions `MV_TO_ARRAY` and `ARRAY_TO_MV` can assist in converting `VARCHAR` to `VARCHAR ARRAY` and `VARCHAR ARRAY` into `VARCHAR` respectively. Multi-value handling is not available when using the multi-stage query engine to insert data.

For example, to insert the data used in this document:
```sql
Expand Down
5 changes: 4 additions & 1 deletion docs/querying/post-aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,13 @@ The expression post-aggregator is defined using a Druid [expression](math-expr.m
"type": "expression",
"name": <output_name>,
"expression": <post-aggregation expression>,
"ordering" : <null (default), or "numericFirst">
"ordering": <null (default), or "numericFirst">,
"outputType": <output value type of expression>
}
```

Output type is optional, and can be any native Druid type: `LONG`, `FLOAT`, `DOUBLE`, `STRING`, `ARRAY` types (e.g. `ARRAY<LONG>`), or `COMPLEX` types (e.g. `COMPLEX<json>`).


### Greatest / Least post-aggregators

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@
import org.apache.druid.query.aggregation.datasketches.hll.HllSketchToStringPostAggregator;
import org.apache.druid.query.aggregation.datasketches.hll.HllSketchUnionPostAggregator;
import org.apache.druid.query.aggregation.post.ArithmeticPostAggregator;
import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
import org.apache.druid.query.aggregation.post.FinalizingFieldAccessPostAggregator;
import org.apache.druid.query.dimension.DefaultDimensionSpec;
Expand Down Expand Up @@ -187,7 +186,10 @@ public class HllSketchSqlAggregatorTest extends BaseCalciteQueryTest
private static final List<AggregatorFactory> EXPECTED_FILTERED_AGGREGATORS =
EXPECTED_PA_AGGREGATORS.stream()
.limit(5)
.map(factory -> new FilteredAggregatorFactory(factory, equality("dim2", "a", ColumnType.STRING)))
.map(factory -> new FilteredAggregatorFactory(
factory,
equality("dim2", "a", ColumnType.STRING)
))
.collect(Collectors.toList());

/**
Expand All @@ -198,15 +200,15 @@ public class HllSketchSqlAggregatorTest extends BaseCalciteQueryTest
ImmutableList.of(
new HllSketchToEstimatePostAggregator("p1", new FieldAccessPostAggregator("p0", "a0"), false),
new HllSketchToEstimatePostAggregator("p3", new FieldAccessPostAggregator("p2", "a0"), false),
new ExpressionPostAggregator("p4", "(\"p3\" + 1)", null, TestExprMacroTable.INSTANCE),
expressionPostAgg("p4", "(\"p3\" + 1)", ColumnType.DOUBLE),
new HllSketchToEstimatePostAggregator("p6", new FieldAccessPostAggregator("p5", "a3"), false),
new HllSketchToEstimatePostAggregator("p8", new FieldAccessPostAggregator("p7", "a0"), false),
new ExpressionPostAggregator("p9", "abs(\"p8\")", null, TestExprMacroTable.INSTANCE),
expressionPostAgg("p9", "abs(\"p8\")", ColumnType.DOUBLE),
new HllSketchToEstimateWithBoundsPostAggregator("p11", new FieldAccessPostAggregator("p10", "a0"), 2),
new HllSketchToEstimateWithBoundsPostAggregator("p13", new FieldAccessPostAggregator("p12", "a0"), 1),
new HllSketchToStringPostAggregator("p15", new FieldAccessPostAggregator("p14", "a0")),
new HllSketchToStringPostAggregator("p17", new FieldAccessPostAggregator("p16", "a0")),
new ExpressionPostAggregator("p18", "upper(\"p17\")", null, TestExprMacroTable.INSTANCE),
expressionPostAgg("p18", "upper(\"p17\")", ColumnType.STRING),
new HllSketchToEstimatePostAggregator("p20", new FieldAccessPostAggregator("p19", "a0"), true)
);

Expand Down Expand Up @@ -726,41 +728,37 @@ public void testHllSketchPostAggsFinalizeOuterSketches()
)
)
.aggregators(
ImmutableList.of(
new HllSketchBuildAggregatorFactory("a0", "dim2", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a1", "m1", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a2", "v0", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a3", "v1", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a4", "dim2", null, null, null, true, true)
)
new HllSketchBuildAggregatorFactory("a0", "dim2", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a1", "m1", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a2", "v0", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a3", "v1", null, null, null, true, true),
new HllSketchBuildAggregatorFactory("a4", "dim2", null, null, null, true, true)
)
.postAggregators(
ImmutableList.of(
new HllSketchToEstimatePostAggregator("p1", new FieldAccessPostAggregator("p0", "a0"), false),
new HllSketchToEstimatePostAggregator("p3", new FieldAccessPostAggregator("p2", "a0"), false),
new ExpressionPostAggregator("p4", "(\"p3\" + 1)", null, TestExprMacroTable.INSTANCE),
new HllSketchToEstimatePostAggregator("p6", new FieldAccessPostAggregator("p5", "a2"), false),
new HllSketchToEstimatePostAggregator(
"p8",
new FieldAccessPostAggregator("p7", "a0"),
false
),
new ExpressionPostAggregator("p9", "abs(\"p8\")", null, TestExprMacroTable.INSTANCE),
new HllSketchToEstimateWithBoundsPostAggregator(
"p11",
new FieldAccessPostAggregator("p10", "a0"),
2
),
new HllSketchToEstimateWithBoundsPostAggregator(
"p13",
new FieldAccessPostAggregator("p12", "a0"),
1
),
new HllSketchToStringPostAggregator("p15", new FieldAccessPostAggregator("p14", "a0")),
new HllSketchToStringPostAggregator("p17", new FieldAccessPostAggregator("p16", "a0")),
new ExpressionPostAggregator("p18", "upper(\"p17\")", null, TestExprMacroTable.INSTANCE),
new HllSketchToEstimatePostAggregator("p20", new FieldAccessPostAggregator("p19", "a0"), true)
)
new HllSketchToEstimatePostAggregator("p1", new FieldAccessPostAggregator("p0", "a0"), false),
new HllSketchToEstimatePostAggregator("p3", new FieldAccessPostAggregator("p2", "a0"), false),
expressionPostAgg("p4", "(\"p3\" + 1)", ColumnType.DOUBLE),
new HllSketchToEstimatePostAggregator("p6", new FieldAccessPostAggregator("p5", "a2"), false),
new HllSketchToEstimatePostAggregator(
"p8",
new FieldAccessPostAggregator("p7", "a0"),
false
),
expressionPostAgg("p9", "abs(\"p8\")", ColumnType.DOUBLE),
new HllSketchToEstimateWithBoundsPostAggregator(
"p11",
new FieldAccessPostAggregator("p10", "a0"),
2
),
new HllSketchToEstimateWithBoundsPostAggregator(
"p13",
new FieldAccessPostAggregator("p12", "a0"),
1
),
new HllSketchToStringPostAggregator("p15", new FieldAccessPostAggregator("p14", "a0")),
new HllSketchToStringPostAggregator("p17", new FieldAccessPostAggregator("p16", "a0")),
expressionPostAgg("p18", "upper(\"p17\")", ColumnType.STRING),
new HllSketchToEstimatePostAggregator("p20", new FieldAccessPostAggregator("p19", "a0"), true)
)
.context(queryContext)
.build()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
import org.apache.druid.common.config.NullHandling;
import org.apache.druid.guice.DruidInjectorBuilder;
import org.apache.druid.java.util.common.granularity.Granularities;
import org.apache.druid.math.expr.ExprMacroTable;
import org.apache.druid.query.Druids;
import org.apache.druid.query.QueryDataSource;
import org.apache.druid.query.QueryRunnerFactoryConglomerate;
Expand All @@ -44,7 +43,6 @@
import org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchToRankPostAggregator;
import org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchToStringPostAggregator;
import org.apache.druid.query.aggregation.post.ArithmeticPostAggregator;
import org.apache.druid.query.aggregation.post.ExpressionPostAggregator;
import org.apache.druid.query.aggregation.post.FieldAccessPostAggregator;
import org.apache.druid.query.dimension.DefaultDimensionSpec;
import org.apache.druid.query.expression.TestExprMacroTable;
Expand Down Expand Up @@ -537,11 +535,10 @@ public void testDoublesSketchPostAggs()
makeFieldAccessPostAgg("a1:agg"),
0.5f
),
new ExpressionPostAggregator(
expressionPostAgg(
"p0",
"(\"a1\" + 1)",
null,
TestExprMacroTable.INSTANCE
ColumnType.DOUBLE
),
new DoublesSketchToQuantilePostAggregator(
"p2",
Expand All @@ -551,11 +548,10 @@ public void testDoublesSketchPostAggs()
),
0.5f
),
new ExpressionPostAggregator(
expressionPostAgg(
"p3",
"(\"p2\" + 1000)",
null,
TestExprMacroTable.INSTANCE
ColumnType.DOUBLE
),
new DoublesSketchToQuantilePostAggregator(
"p5",
Expand All @@ -565,11 +561,10 @@ public void testDoublesSketchPostAggs()
),
0.5f
),
new ExpressionPostAggregator(
expressionPostAgg(
"p6",
"(\"p5\" + 1000)",
null,
TestExprMacroTable.INSTANCE
ColumnType.DOUBLE
),
new DoublesSketchToQuantilePostAggregator(
"p8",
Expand All @@ -579,7 +574,7 @@ public void testDoublesSketchPostAggs()
),
0.5f
),
new ExpressionPostAggregator("p9", "abs(\"p8\")", null, TestExprMacroTable.INSTANCE),
expressionPostAgg("p9", "abs(\"p8\")", ColumnType.DOUBLE),
new DoublesSketchToQuantilesPostAggregator(
"p11",
new FieldAccessPostAggregator(
Expand Down Expand Up @@ -620,13 +615,12 @@ public void testDoublesSketchPostAggs()
"a2:agg"
)
),
new ExpressionPostAggregator(
expressionPostAgg(
"p20",
"replace(replace(\"p19\",'HeapCompactDoublesSketch','HeapUpdateDoublesSketch'),"
+ "'Combined Buffer Capacity : 6',"
+ "'Combined Buffer Capacity : 8')",
null,
ExprMacroTable.nil()
ColumnType.STRING
)
)
.context(QUERY_CONTEXT_DEFAULT)
Expand Down
Loading

0 comments on commit bcefe45

Please sign in to comment.