From 4e5e03f209a8a8fdc7e9c5f483bd536591614289 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Tue, 19 Nov 2024 06:40:57 -0600 Subject: [PATCH 1/8] Fix typo in getting started (#8772) Fixes #8770 Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _getting-started/communicate.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_getting-started/communicate.md b/_getting-started/communicate.md index 3472270c30..773558fb21 100644 --- a/_getting-started/communicate.md +++ b/_getting-started/communicate.md @@ -28,7 +28,7 @@ curl -X GET "http://localhost:9200/_cluster/health" If you're using the Security plugin, provide the username and password in the request: ```bash -curl -X GET "http://localhost:9200/_cluster/health" -ku admin: +curl -X GET "https://localhost:9200/_cluster/health" -ku admin: ``` {% include copy.html %} @@ -317,4 +317,4 @@ Once a field is created, you cannot change its type. Changing a field type requi ## Next steps -- See [Ingest data into OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) to learn about ingestion options. \ No newline at end of file +- See [Ingest data into OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) to learn about ingestion options. From 03c8b18678250befa073744fd9ce5af52be09c63 Mon Sep 17 00:00:00 2001 From: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> Date: Tue, 19 Nov 2024 05:01:15 -0800 Subject: [PATCH 2/8] Additions to data source acceleration (#8759) * Additions to data source acceleration Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> * Update _dashboards/management/accelerate-external-data.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> * Update _dashboards/management/accelerate-external-data.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> * Update _dashboards/management/accelerate-external-data.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> * Implement feedback Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> * Update required ruby version Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> --------- Signed-off-by: Liz Snyder <31932630+lizsnyder@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- CONTRIBUTING.md | 2 +- .../management/accelerate-external-data.md | 176 +++++++++++++++--- 2 files changed, 156 insertions(+), 22 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 99008aa7b4..f6d9b87ee8 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -84,7 +84,7 @@ Follow these steps to set up your local copy of the repository: ``` curl -sSL https://get.rvm.io | bash -s stable - rvm install 3.2.4 + rvm install 3.3.2 ruby -v ``` diff --git a/_dashboards/management/accelerate-external-data.md b/_dashboards/management/accelerate-external-data.md index 6d1fa030e4..00eb8671ec 100644 --- a/_dashboards/management/accelerate-external-data.md +++ b/_dashboards/management/accelerate-external-data.md @@ -1,10 +1,8 @@ --- layout: default title: Optimize query performance using OpenSearch indexing -parent: Connecting Amazon S3 to OpenSearch -grand_parent: Data sources -nav_order: 15 -has_children: false +parent: Data sources +nav_order: 17 --- # Optimize query performance using OpenSearch indexing @@ -14,35 +12,171 @@ Introduced 2.11 Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. -A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. +- A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. For more information, see [Skipping indexes](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#skipping-indexes). +- A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. For more information, see [Covering indexes](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#covering-indexes). +- A _materialized view_ enhances query performance by storing precomputed and aggregated data from the source data. For more information, see [Materialized views](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#materialized-views). -A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process. +For comprehensive guidance on each indexing process, see the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md). ## Data sources use case: Accelerate performance -To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps: +To get started with accelerating query performance, perform the following steps: -1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner. -2. From the left-side navigation menu, select a database. -3. View the results in the table and confirm that you have the desired data. +1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your data source from the **Data sources** dropdown menu. +2. From the navigation menu, select a database. +3. View the results in the table and confirm that you have the correct data. 4. Create an OpenSearch index by following these steps: - 1. Select the **Accelerate data** button. A pop-up window appears. - 2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. -5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. + 1. Select **Accelerate data**. A pop-up window appears. + 2. Enter your database and table details under **Select data fields**. +5. For **Acceleration type**, select the type of acceleration according to your use case. Then, enter the information for your acceleration type. For more information, see the following sections: + - [Skipping indexes](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#skipping-indexes) + - [Covering indexes](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#covering-indexes) + - [Materialized views](https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/#materialized-views) + +## Skipping indexes + +A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. + +With a skipping index, you can index only the metadata of the data stored in Amazon S3. When you query a table with a skipping index, the query planner references the index and rewrites the query to efficiently locate the data, instead of scanning all partitions and files. This allows the skipping index to quickly narrow down the specific location of the stored data. ### Define skipping index settings -1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. -2. Select the **Copy Query to Editor** button to apply your skipping index settings. -3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. +1. Under **Skipping index definition**, select **Generate** to automatically generate a skipping index. Alternately, to manually choose the fields you want to add, select **Add fields**. Choose from the following types: + - `Partition`: Uses data partition details to locate data. This type is best for partitioning-based columns such as year, month, day, hour. + - `MinMax`: Uses lower and upper bound of the indexed column to locate data. This type is best for numeric columns. + - `ValueSet`: Uses a unique value set to locate data. This type is best for columns with low to moderate cardinality that require exact matching. + - `BloomFilter`: Uses the bloom filter algorithm to locate data. This type is best for columns with high cardinality that do not require exact matching. +2. Select **Create acceleration** to apply your skipping index settings. +3. View the skipping index query details and then click **Run**. OpenSearch adds your index to the left navigation pane. + +Alternately, you can manually create a skipping index using Query Workbench. Select your data source from the dropdown and run a query like the following: + +```sql +CREATE SKIPPING INDEX +ON datasourcename.gluedatabasename.vpclogstable( + `srcaddr` BLOOM_FILTER, + `dstaddr` BLOOM_FILTER, + `day` PARTITION, + `account_id`BLOOM_FILTER + ) WITH ( +index_settings = '{"number_of_shards":5,"number_of_replicas":1}', +auto_refresh = true, +checkpoint_location = 's3://accountnum-vpcflow/AWSLogs/checkpoint' +) +``` + +## Covering indexes + +A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. + +With a covering index, you can ingest data from a specified column in a table. This is the most performant of the three indexing types. Because OpenSearch ingests all data from your desired column, you get better performance and can perform advanced analytics. + +OpenSearch creates a new index from the covering index data. You can use this new index to create visualizations, or for anomaly detection and geospatial capabilities. You can manage the covering view index with Index State Management. For more information, see [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/). ### Define covering index settings -1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. -2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. -3. Select the **Copy Query to Editor** button to apply your covering index settings. -4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. +1. For **Index name**, enter a valid index name. Note that each table can have multiple covering indexes. +2. Choose a **Refresh type**. By default, OpenSearch automatically refreshes the index. Otherwise, you must manually trigger a refresh using a REFRESH statement. +3. Enter a **Checkpoint location**, which is a path for refresh job checkpoints. The location must be a path in an HDFS compatible file system. +4. Define the covering index fields by selecting **(add fields here)** under **Covering index definition**. +5. Select **Create acceleration** to apply your covering index settings. +6. View the covering index query details and then click **Run**. OpenSearch adds your index to the left navigation pane. + +Alternately, you can manually create a covering index on your table using Query Workbench. Select your data source from the dropdown and run a query like the following: + +```sql +CREATE INDEX vpc_covering_index +ON datasourcename.gluedatabasename.vpclogstable (version, account_id, interface_id, +srcaddr, dstaddr, srcport, dstport, protocol, packets, +bytes, start, action, log_status STRING, +`aws-account-id`, `aws-service`, `aws-region`, year, +month, day, hour ) +WITH ( + auto_refresh = true, + refresh_interval = '15 minute', + checkpoint_location = 's3://accountnum-vpcflow/AWSLogs/checkpoint' +) +``` + +## Materialized views + +With _materialized views_, you can use complex queries, such as aggregations, to power Dashboards visualizations. Materialized views ingest a small amount of your data, depending on the query, into OpenSearch. OpenSearch then forms an index from the ingested data that you can use for visualizations. You can manage the materialized view index with Index State Management. For more information, see [Index State Management](https://opensearch.org/docs/latest/im-plugin/ism/index/). + +### Define materialized view settings + +1. For **Index name**, enter a valid index name. Note that each table can have multiple covering indexes. +2. Choose a **Refresh type**. By default, OpenSearch automatically refreshes the index. Otherwise, you must manually trigger a refresh using a `REFRESH` statement. +3. Enter a **Checkpoint location**, which is a path for refresh job checkpoints. The location must be a path in an HDFS compatible file system. +4. Enter a **Watermark delay**, which defines how late data can come and still be processed, such as 1 minute or 10 seconds. +5. Define the covering index fields under **Materialized view definition**. +6. Select **Create acceleration** to apply your materialized view index settings. +7. View the materialized view query details and then click **Run**. OpenSearch adds your index to the left navigation pane. + +Alternately, you can manually create a materialized view index on your table using Query Workbench. Select your data source from the dropdown and run a query like the following: + +```sql +CREATE MATERIALIZED VIEW {table_name}__week_live_mview AS + SELECT + cloud.account_uid AS `aws.vpc.cloud_account_uid`, + cloud.region AS `aws.vpc.cloud_region`, + cloud.zone AS `aws.vpc.cloud_zone`, + cloud.provider AS `aws.vpc.cloud_provider`, + + CAST(IFNULL(src_endpoint.port, 0) AS LONG) AS `aws.vpc.srcport`, + CAST(IFNULL(src_endpoint.svc_name, 'Unknown') AS STRING) AS `aws.vpc.pkt-src-aws-service`, + CAST(IFNULL(src_endpoint.ip, '0.0.0.0') AS STRING) AS `aws.vpc.srcaddr`, + CAST(IFNULL(src_endpoint.interface_uid, 'Unknown') AS STRING) AS `aws.vpc.src-interface_uid`, + CAST(IFNULL(src_endpoint.vpc_uid, 'Unknown') AS STRING) AS `aws.vpc.src-vpc_uid`, + CAST(IFNULL(src_endpoint.instance_uid, 'Unknown') AS STRING) AS `aws.vpc.src-instance_uid`, + CAST(IFNULL(src_endpoint.subnet_uid, 'Unknown') AS STRING) AS `aws.vpc.src-subnet_uid`, + + CAST(IFNULL(dst_endpoint.port, 0) AS LONG) AS `aws.vpc.dstport`, + CAST(IFNULL(dst_endpoint.svc_name, 'Unknown') AS STRING) AS `aws.vpc.pkt-dst-aws-service`, + CAST(IFNULL(dst_endpoint.ip, '0.0.0.0') AS STRING) AS `aws.vpc.dstaddr`, + CAST(IFNULL(dst_endpoint.interface_uid, 'Unknown') AS STRING) AS `aws.vpc.dst-interface_uid`, + CAST(IFNULL(dst_endpoint.vpc_uid, 'Unknown') AS STRING) AS `aws.vpc.dst-vpc_uid`, + CAST(IFNULL(dst_endpoint.instance_uid, 'Unknown') AS STRING) AS `aws.vpc.dst-instance_uid`, + CAST(IFNULL(dst_endpoint.subnet_uid, 'Unknown') AS STRING) AS `aws.vpc.dst-subnet_uid`, + CASE + WHEN regexp(dst_endpoint.ip, '(10\\..*)|(192\\.168\\..*)|(172\\.1[6-9]\\..*)|(172\\.2[0-9]\\..*)|(172\\.3[0-1]\\.*)') + THEN 'ingress' + ELSE 'egress' + END AS `aws.vpc.flow-direction`, + + CAST(IFNULL(connection_info['protocol_num'], 0) AS INT) AS `aws.vpc.connection.protocol_num`, + CAST(IFNULL(connection_info['tcp_flags'], '0') AS STRING) AS `aws.vpc.connection.tcp_flags`, + CAST(IFNULL(connection_info['protocol_ver'], '0') AS STRING) AS `aws.vpc.connection.protocol_ver`, + CAST(IFNULL(connection_info['boundary'], 'Unknown') AS STRING) AS `aws.vpc.connection.boundary`, + CAST(IFNULL(connection_info['direction'], 'Unknown') AS STRING) AS `aws.vpc.connection.direction`, + + CAST(IFNULL(traffic.packets, 0) AS LONG) AS `aws.vpc.packets`, + CAST(IFNULL(traffic.bytes, 0) AS LONG) AS `aws.vpc.bytes`, + + CAST(FROM_UNIXTIME(time / 1000) AS TIMESTAMP) AS `@timestamp`, + CAST(FROM_UNIXTIME(start_time / 1000) AS TIMESTAMP) AS `start_time`, + CAST(FROM_UNIXTIME(start_time / 1000) AS TIMESTAMP) AS `interval_start_time`, + CAST(FROM_UNIXTIME(end_time / 1000) AS TIMESTAMP) AS `end_time`, + status_code AS `aws.vpc.status_code`, + + severity AS `aws.vpc.severity`, + class_name AS `aws.vpc.class_name`, + category_name AS `aws.vpc.category_name`, + activity_name AS `aws.vpc.activity_name`, + disposition AS `aws.vpc.disposition`, + type_name AS `aws.vpc.type_name`, + + region AS `aws.vpc.region`, + accountid AS `aws.vpc.account-id` + FROM + datasourcename.gluedatabasename.vpclogstable +WITH ( + auto_refresh = true, + refresh_interval = '15 Minute', + checkpoint_location = 's3://accountnum-vpcflow/AWSLogs/checkpoint', + watermark_delay = '1 Minute', +) +``` ## Limitations -This feature is still under development, so there are some limitations. For real-time updates, refer to the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). +This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations). From 034f37abf514c196a6b2b3d025d6fd804a7295cc Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 19 Nov 2024 08:58:15 -0500 Subject: [PATCH 3/8] Change link in update settings (#8773) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _api-reference/index-apis/update-settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/index-apis/update-settings.md b/_api-reference/index-apis/update-settings.md index 18caf2185d..3afaaa10d3 100644 --- a/_api-reference/index-apis/update-settings.md +++ b/_api-reference/index-apis/update-settings.md @@ -11,7 +11,7 @@ redirect_from: **Introduced 1.0** {: .label .label-purple } -You can use the update settings API operation to update index-level settings. You can change dynamic index settings at any time, but static settings cannot be changed after index creation. For more information about static and dynamic index settings, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). +You can use the update settings API operation to update index-level settings. You can change dynamic index settings at any time, but static settings cannot be changed after index creation. For more information about static and dynamic index settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). Aside from the static and dynamic index settings, you can also update individual plugins' settings. To get the full list of updatable settings, run `GET /_settings?include_defaults=true`. From 47e881be191fe26dbf7d8fc0a9c6e50fbc69c9ee Mon Sep 17 00:00:00 2001 From: Peter Alfonsi Date: Tue, 19 Nov 2024 13:27:09 -0800 Subject: [PATCH 4/8] Document change to size > 0 setting for request cache (#8684) * document change to size > 0 setting Signed-off-by: Peter Alfonsi * Update _search-plugins/caching/request-cache.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Peter Alfonsi * Update _search-plugins/caching/request-cache.md Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Peter Alfonsi Signed-off-by: Peter Alfonsi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Peter Alfonsi Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- _search-plugins/caching/request-cache.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_search-plugins/caching/request-cache.md b/_search-plugins/caching/request-cache.md index 124152300b..49d5e8cd82 100644 --- a/_search-plugins/caching/request-cache.md +++ b/_search-plugins/caching/request-cache.md @@ -28,6 +28,7 @@ Setting | Data type | Default | Level | Static/Dynamic | Description `indices.cache.cleanup_interval` | Time unit | `1m` (1 minute) | Cluster | Static | Schedules a recurring background task that cleans up expired entries from the cache at the specified interval. `indices.requests.cache.size` | Percentage | `1%` | Cluster | Static | The cache size as a percentage of the heap size (for example, to use 1% of the heap, specify `1%`). `index.requests.cache.enable` | Boolean | `true` | Index | Dynamic | Enables or disables the request cache. +`indices.requests.cache.maximum_cacheable_size` | Integer | `0` | Cluster | Dynamic | Sets the maximum `size` of queries to be added to the request cache. ### Example From 6da9e56fd4b1f037e8fc60da0205fab031236671 Mon Sep 17 00:00:00 2001 From: Benjamin Wolak Date: Wed, 20 Nov 2024 07:19:51 -0800 Subject: [PATCH 5/8] Clarify master-to-cluster-manager name change in index.md (#8784) In https://github.com/opensearch-project/documentation-website/pull/7660 there was an effort to find-and-replace `master` with `cluster manager`. In this one particular case it was unwarranted however, as the sentence is explaining how master nodes are now called cluster manager nodes. I also simplified the sentence, as the "nomenclature" part is unnecessary. Signed-off-by: Benjamin Wolak --- _tuning-your-cluster/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/index.md b/_tuning-your-cluster/index.md index fa0973395f..f434c2b5ec 100644 --- a/_tuning-your-cluster/index.md +++ b/_tuning-your-cluster/index.md @@ -20,7 +20,7 @@ To create and deploy an OpenSearch cluster according to your requirements, it’ There are many ways to design a cluster. The following illustration shows a basic architecture that includes a four-node cluster that has one dedicated cluster manager node, one dedicated coordinating node, and two data nodes that are cluster manager eligible and also used for ingesting data. - The nomenclature for the cluster manager node is now referred to as the cluster manager node. + The master node is now referred to as the cluster manager node. {: .note } ![multi-node cluster architecture diagram]({{site.url}}{{site.baseurl}}/images/cluster.png) From 1ae2f7dced1831c15a22bfe5ab54df4781d18736 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 20 Nov 2024 13:45:37 -0600 Subject: [PATCH 6/8] Order sections to reflect API template (#8782) * Order sections to reflect API template Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update DEVELOPER_GUIDE.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update DEVELOPER_GUIDE.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update DEVELOPER_GUIDE.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- DEVELOPER_GUIDE.md | 44 +++++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md index 18de000473..9b0ec1c79d 100644 --- a/DEVELOPER_GUIDE.md +++ b/DEVELOPER_GUIDE.md @@ -72,6 +72,29 @@ All spec insert components accept the following arguments: - `component` (String; required): The name of the component to render, such as `query_parameters`, `path_parameters`, or `paths_and_http_methods`. - `omit_header` (Boolean; Default is `false`): If set to `true`, the markdown header of the component will not be rendered. +### Paths and HTTP methods +To insert paths and HTTP methods for the `search` API, use the following snippet: +```markdown + + +``` + +### Path parameters + +To insert a path parameters table of the `indices.create` API, use the following snippet. Use the `x-operation-group` field from OpenSearch OpenAPI Spec for the `api` value: + +```markdown + + +``` +This table accepts the same arguments as the query parameters table except the `include_global` argument. + ### Query parameters To insert the API query parameters table of the `cat.indices` API, use the following snippet: ```markdown @@ -110,24 +133,3 @@ pretty: true --> ``` - -### Path parameters -To insert path parameters table of the `indices.create` API, use the following snippet: -```markdown - - -``` -This table behaves the same as the query parameters table except that it does not accept the `include_global` argument. - -### Paths and HTTP methods -To insert paths and HTTP methods for the `search` API, use the following snippet: -```markdown - - -``` \ No newline at end of file From f39ba2fad163ecbfed3548db9f62bd9608926f95 Mon Sep 17 00:00:00 2001 From: Landon Lengyel Date: Thu, 21 Nov 2024 11:08:20 -0700 Subject: [PATCH 7/8] Update performance.md - Remove old curl commands (#8761) * Update performance.md Updating old curl examples to current standard across the rest of the documentation. Signed-off-by: Landon Lengyel * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Landon Lengyel Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _tuning-your-cluster/performance.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/_tuning-your-cluster/performance.md b/_tuning-your-cluster/performance.md index 28f47aeacb..b5066a890c 100644 --- a/_tuning-your-cluster/performance.md +++ b/_tuning-your-cluster/performance.md @@ -32,12 +32,9 @@ An increased `index.translog.flush_threshold_size` can also increase the time th Before increasing `index.translog.flush_threshold_size`, call the following API operation to get current flush operation statistics: ```json -curl -XPOST "os-endpoint/index-name/_stats/flush?pretty" +GET //_stats/flush?pretty ``` -{% include copy.html %} - - -Replace the `os-endpoint` and `index-name` with your endpoint and index name. +{% include copy-curl.html %} In the output, note the number of flushes and the total time. The following example output shows that there are 124 flushes, which took 17,690 milliseconds: @@ -53,9 +50,15 @@ In the output, note the number of flushes and the total time. The following exam To increase the flush threshold size, call the following API operation: ```json -curl -XPUT "os-endpoint/index-name/_settings?pretty" -d "{"index":{"translog.flush_threshold_size" : "1024MB"}}" +PUT //_settings +{ + "index": + { + "translog.flush_threshold_size" : "1024MB" + } +} ``` -{% include copy.html %} +{% include copy-curl.html %} In this example, the flush threshold size is set to 1024 MB, which is ideal for instances that have more than 32 GB of memory. @@ -65,9 +68,9 @@ Choose the appropriate threshold size for your cluster. Run the stats API operation again to see whether the flush activity changed: ```json -curl -XGET "os-endpoint/index-name/_stats/flush?pretty" +GET //_stats/flush ``` -{% include copy.html %} +{% include copy-curl.html %} It's a best practice to increase the `index.translog.flush_threshold_size` only for the current index. After you confirm the outcome, apply the changes to the index template. {: .note} @@ -127,14 +130,14 @@ To reduce the size of the OpenSearch response, use the `filter_path` parameter t In the following example, the `index-name`, `type-name`, and `took` fields are excluded from the response: ```json -curl -XPOST "es-endpoint/index-name/type-name/_bulk?pretty&filter_path=-took,-items.index._index,-items.index._type" -H 'Content-Type: application/json' -d' +POST /_bulk?pretty&filter_path=-took,-items.index._index,-items.index._type { "index" : { "_index" : "test2", "_id" : "1" } } { "user" : "testuser" } { "update" : {"_id" : "1", "_index" : "test2"} } { "doc" : {"user" : "example"} } ``` -{% include copy.html %} +{% include copy-curl.html %} ## Compression codecs -In OpenSearch 2.9 and later, there are two new codecs for compression: `zstd` and `zstd_no_dict`. You can optionally specify a compression level for these in the `index.codec.compression_level` setting with values in the [1, 6] range. [Benchmark]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#benchmarking) data shows that `zstd` provides a 7% better write throughput and `zstd_no_dict` provides a 14% better throughput, along with a 30% improvement in storage compared with the `default` codec. For more information about compression, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/). \ No newline at end of file +In OpenSearch 2.9 and later, there are two new codecs for compression: `zstd` and `zstd_no_dict`. You can optionally specify a compression level for these in the `index.codec.compression_level` setting with values in the [1, 6] range. [Benchmark]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#benchmarking) data shows that `zstd` provides a 7% better write throughput and `zstd_no_dict` provides a 14% better throughput, along with a 30% improvement in storage compared with the `default` codec. For more information about compression, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/). From b50a3ebc214eaedc2779a7a0c4346411aed2f73d Mon Sep 17 00:00:00 2001 From: Scott Stults Date: Thu, 21 Nov 2024 15:10:16 -0500 Subject: [PATCH 8/8] Update logging-features.md (#8795) * Update logging-features.md Add a link to hello-ltr demo for context about the index schema Signed-off-by: Scott Stults * Update _search-plugins/ltr/logging-features.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Scott Stults --------- Signed-off-by: Scott Stults Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/ltr/logging-features.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/ltr/logging-features.md b/_search-plugins/ltr/logging-features.md index fd73eda9a7..7922b8683d 100644 --- a/_search-plugins/ltr/logging-features.md +++ b/_search-plugins/ltr/logging-features.md @@ -12,7 +12,7 @@ Feature values need to be logged in order to train a model. This is a crucial co ## `sltr` query -The `sltr` query is the primary method for running features and evaluating models. When logging, an `sltr` query is used to execute each feature query and retrieve the feature scores. A feature set structure is shown in the following example request: +The `sltr` query is the primary method for running features and evaluating models. When logging, an `sltr` query is used to execute each feature query and retrieve the feature scores. A feature set structure that works with the [`hello-ltr`](https://github.com/o19s/hello-ltr) demo schema is shown in the following example request: ```json PUT _ltr/_featureset/more_movie_features