diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index a43da396d52..20e13cec7a7 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -30,7 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search. Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields. - +Star Tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Allows creating materialized views by pre-computing aggregations during indexing based on user-provided configuration to accelerate performance of aggregations. ## Arrays There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field. All values in the array must have the same field type. diff --git a/_field-types/supported-field-types/star-tree.md b/_field-types/supported-field-types/star-tree.md new file mode 100644 index 00000000000..e4c6b4c0204 --- /dev/null +++ b/_field-types/supported-field-types/star-tree.md @@ -0,0 +1,148 @@ +--- +layout: default +title: Star Tree +nav_order: 61 +has_children: false +parent: Supported field types +redirect_from: + - /opensearch/supported-field-types/star-tree/ + - /field-types/star-tree/ +--- +# Star tree field type + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +Star Tree Index is a multi-field index that improves the performance of aggregations. +Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested. + +OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. + +For more information, see [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/) + +## Prerequisites + +Before using star-tree field, be sure to satisfy the following prerequisites: + +- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). +- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). +- Set the `index.composite_index` index setting to `true` during index creation. +- Enable `doc_values` : Ensure that the `doc_values` is enabled for the dimensions and metrics fields used in your star-tree mapping. + + +## Examples + +The following examples show how to use star-tree index. + +### Star tree index mapping + +Define star-tree mapping under new section `composite` in `mappings`.
+To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings: + +```json +PUT logs +{ + "settings": { + "index.number_of_shards": 1, + "index.number_of_replicas": 0, + "index.composite_index": true + }, + "mappings": { + "composite": { + "startree1": { + "type": "star_tree", + "config": { + "max_leaf_docs": 10000, + "skip_star_node_creation_for_dimensions": [ + "port" + ], + "ordered_dimensions": [ + { + "name": "status" + }, + { + "name": "port" + } + ], + "metrics": [ + { + "name": "request_size", + "stats": [ + "sum", + "value_count", + "min", + "max" + ], + "name": "latency", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + } + ] + } + } + }, + "properties": { + "status": { + "type": "integer" + }, + "port": { + "type": "integer" + }, + "request_size": { + "type": "integer" + }, + "latency": { + "type": "scaled_float", + "scaling_factor": 10 + } + } + } +} +``` +In the above example, for `startree1` , we will create an associated Star Tree index. Currently only `one` star-tree index can be created per index with support for multiple star-trees coming in future.
+ +## Star tree mapping parameters +Specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents. + +### Ordered dimensions +The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star Tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration. +- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. +- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely. +- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. + - Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases. +- Minimum of `2` and maximum of `10` dimensions are supported per Star Tree index. + +#### Properties + +| Parameter | Required/Optional | Description | +|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. + +### Metrics +Configure fields for which you need to perform aggregations. This is required property as part of star-tree configuration. +- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. +- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`. + - `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest are base metrics which are indexed. +- Upto `100` base metrics are supported per Star Tree index. + +#### Properties + +| Parameter | Required/Optional | Description | +|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. +| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.
Defaults are `Sum` and `Value_count`.
`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`. + +### Star tree configuration parameters +Following are additional optional parameters that can be configured alongside star-tree index. + +| Parameter | Required/Optional | Description | +|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and the other way around when increasing the value. +| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. + +## Supported queries and aggregations +For more details on supported queries and aggregations, see [supported query and aggregations for Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations) diff --git a/_search-plugins/improving-search-performance.md b/_search-plugins/improving-search-performance.md index 4a0ffafe118..3172f0d925d 100644 --- a/_search-plugins/improving-search-performance.md +++ b/_search-plugins/improving-search-performance.md @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance: - Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/). -- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/). \ No newline at end of file +- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/). + +- Improve performance of aggregations using [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/). \ No newline at end of file diff --git a/_search-plugins/star-tree-index.md b/_search-plugins/star-tree-index.md new file mode 100644 index 00000000000..45171e68132 --- /dev/null +++ b/_search-plugins/star-tree-index.md @@ -0,0 +1,175 @@ +--- +layout: default +title: Star Tree index +parent: Improving search performance +nav_order: 54 +--- + +# Star tree index + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +Star Tree Index is a multi-field index that improves the performance of aggregations. + +OpenSearch will use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. + +## Star tree index structure + +A Star Tree index containing two dimensions and two metrics + +Star Tree index structure as portrayed in the above figure, consists of mainly two parts: Star Tree and sorted and aggregated star-tree documents backed by doc-values indexes. + +Each node in the Star Tree points to a range of star-tree documents. +A node is further split into child nodes based on maxLeafDocs configuration. +The number of documents a leaf node points to is than or equal to maxLeafDocs. This ensures the maximum number of documents that gets traversed to get to the aggregated value is at most maxLeafDocs, thus providing predictable latencies. + +There are special nodes called `star nodes (*)` which helps in skipping non-competitive nodes and also in fetching aggregated document wherever applicable during query time. + +The figure contains three examples explaining the Star Tree traversal during query: +- Compute average request size aggregation with Terms query where port equals 8443 and status equals 200 (Support for Terms query will be added in upcoming release) +- Compute count of requests aggregation with Term query where status equals 200 (query traverses through * node of `port` dimension since `port` is not present as part of query) +- Compute average request size aggregation with Term query where port equals 5600 (query traverses through * node of `status` dimension since `status` is not present as part of query). +
The second and third examples uses star nodes. + + +## When to use star tree index +You can be use Star Tree index to perform faster aggregations with a constant upper bound on query latency. +- Star Tree natively supports multi field aggregations +- Star Tree index will be created in real time as part of regular indexing, so the data in Star Tree will always be up to date with the live data. +- Star Tree index consolidates the data and hence is a storage efficient index which results in efficient paging and fraction of IO utilization for search queries. + +## Considerations +- Star Tree index ideally should be used with append-only indices, as updates or deletes are not accounted in Star Tree index. +- Star Tree index will be used for aggregation queries only if the query input is a subset of the Star Tree configuration of dimensions and metrics +- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index. + - Changing Star Tree configuration will also require a re-index operation. +- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported +- Only [limited queries and aggregations](#supported-query-and-aggregations) are supported with support for more coming in future +- The cardinality of the dimensions should not be very high (like _id fields), otherwise it leads to storage explosion and higher query latencies. + +## Enabling star tree index +- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). +- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). +- Set the `index.composite_index` index setting to `true` during index creation. + +## Examples + +The following examples show how to use star-tree index. + +### Defining star tree index in mappings + +Define star-tree configuration in index mappings when creating an index.
+To create star-tree index to pre-compute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping: + +```json +PUT logs +{ + "settings": { + "index.number_of_shards": 1, + "index.number_of_replicas": 0, + "index.composite_index": true + }, + "mappings": { + "composite": { + "startree1": { + "type": "star_tree", + "config": { + "ordered_dimensions": [ + { + "name": "status" + }, + { + "name": "port" + } + ], + "metrics": [ + { + "name": "request_size", + "stats": [ + "sum", + "value_count", + "min", + "max" + ], + "name": "latency", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + } + ] + } + } + }, + "properties": { + "status": { + "type": "integer" + }, + "port": { + "type": "integer" + }, + "request_size": { + "type": "integer" + }, + "latency": { + "type": "scaled_float", + "scaling_factor": 10 + } + } + } +} +``` + +For detailed information about Star Tree index mapping and parameters see [Star Tree field type]({{site.url}}{{site.baseurl}}/field-types/star-tree/). + +## Supported query and aggregations + +Star Tree index can be used to optimize aggregations for selected set of queries with support for more coming in upcoming releases. + +### Supported queries +Ensure the following in star tree index mapping, +- The fields present in the query must be present as part of `ordered_dimensions` as part of star-tree configuration. + +The following queries are supported [ when supported aggregations are specified ]
+ +- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/) +- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/) + +### Supported aggregations +Ensure the following in star tree index mapping, +- The fields present in the aggregation must be present as part of `metrics` as part of star-tree configuration. +- The metric aggregation type must be part of `stats` parameter. + +Following metric aggregations are supported. +- [Sum](https://opensearch.org/docs/latest/aggregations/metric/sum/) +- [Minimum](https://opensearch.org/docs/latest/aggregations/metric/minimum/) +- [Maximum](https://opensearch.org/docs/latest/aggregations/metric/maximum/) +- [Value count](https://opensearch.org/docs/latest/aggregations/metric/value-count/) +- [Average](https://opensearch.org/docs/latest/aggregations/metric/average/) + +### Examples +To get sum of `request_size` for all error logs with `status=500` with the [example mapping](#defining-star-tree-index-in-mappings) : +```json +POST /logs/_search +{ + "query": { + "term": { + "status": "500" + } + }, + "aggs": { + "sum_request_size": { + "sum": { + "field": "request_size" + } + } + } +} +``` + +This query will get optimized automatically as star-tree index will be used. + +You can set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using star-tree index. \ No newline at end of file diff --git a/images/star-tree-index.png b/images/star-tree-index.png new file mode 100644 index 00000000000..f281ea84ac4 Binary files /dev/null and b/images/star-tree-index.png differ