From 54f39806070d551705a3c85d1381762f8b8238ee Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 9 Jun 2023 14:05:58 -0600 Subject: [PATCH 001/286] Add page Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 _api-reference/ingest-apis/ingest-processors.md diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md new file mode 100644 index 0000000000..97d85c47d5 --- /dev/null +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -0,0 +1,9 @@ +--- +layout: default +title: Ingest processors +parent: Ingest APIs +nav_order: 50 +--- + +# Ingest processors + From f3d89bf4b7853e32fabdff90cf35906f0b8d4dae Mon Sep 17 00:00:00 2001 From: Chris Moore <107723039+cwillum@users.noreply.github.com> Date: Wed, 7 Jun 2023 15:26:34 -0700 Subject: [PATCH 002/286] Add documentation for score based password estimator settings (#4267) * fix#4088 score based pw estimator Signed-off-by: cwillum * fix#4088 score based pw estimator Signed-off-by: cwillum * fix#4088 score based pw estimator Signed-off-by: cwillum * fix#4088 score based pw estimator Signed-off-by: cwillum * fix#4088 score based pw estimator Signed-off-by: cwillum --------- Signed-off-by: cwillum Signed-off-by: Melissa Vagi --- _security/configuration/yaml.md | 46 +++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 5 deletions(-) diff --git a/_security/configuration/yaml.md b/_security/configuration/yaml.md index e7a34a07de..1d10e50268 100644 --- a/_security/configuration/yaml.md +++ b/_security/configuration/yaml.md @@ -120,6 +120,22 @@ plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opend node.max_local_storage_nodes: 3 ``` +### Refining your configuration + +The `plugins.security.allow_default_init_securityindex` setting, when set to `true`, sets the Security plugin to its default security settings if an attempt to create the security index fails when OpenSearch launches. Default security settings are stored in YAML files contained in the `opensearch-project/security/config` directory. By default, this setting is `false`. + +```yml +plugins.security.allow_default_init_securityindex: true +``` + +An authentication cache for the Security plugin exists to help speed up authentication by temporarily storing user objects returned from the backend so that the Security plugin is not required to make repeated requests for them. To determine how long it takes for caching to time out, you can use the `plugins.security.cache.ttl_minutes` property to set a value in minutes. The default is `60`. You can disable caching by setting the value to `0`. + +```yml +plugins.security.cache.ttl_minutes: 60 +``` + +### Password settings + If you want to run your users' passwords against some validation, specify a regular expression (regex) in this file. You can also include an error message that loads when passwords don't pass validation. The following example demonstrates how to include a regex so OpenSearch requires new passwords to be a minimum of eight characters with at least one uppercase, one lowercase, one digit, and one special character. Note that OpenSearch validates only users and passwords created through OpenSearch Dashboards or the REST API. @@ -129,16 +145,36 @@ plugins.security.restapi.password_validation_regex: '(?=.*[A-Z])(?=.*[^a-zA-Z\d] plugins.security.restapi.password_validation_error_message: "Password must be minimum 8 characters long and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character." ``` -The opensearch.yml file also contains the `plugins.security.allow_default_init_securityindex` property. When set to `true`, the Security plugin uses default security settings if an attempt to create the security index fails when OpenSearch launches. Default security settings are stored in YAML files contained in the `opensearch-project/security/config` directory. By default, this setting is `false`. +In addition, a score-based password strength estimator allows you to set a threshold for password strength when creating a new internal user or updating a user's password. This feature makes use of the [zxcvbn library](https://github.com/dropbox/zxcvbn) to apply a policy that emphasizes a password's complexity rather than its capacity to meet traditional criteria such as uppercase keys, numerals, and special characters. + +For information about creating users, see [Create users]({{site.url}}{{site.baseurl}}/security/access-control/users-roles/#create-users). + +This feature is not compatible with users specified as reserved. For information about reserved resources, see [Reserved and hidden resources]({{site.url}}{{site.baseurl}}/security/access-control/api#reserved-and-hidden-resources). +{: .important } + +Score-based password strength requires two settings to configure the feature. The following table describes the two settings. + +| Setting | Description | +| :--- | :--- | +| `plugins.security.restapi.password_min_length` | Sets the minimum number of characters for the password length. The default is `8`. This is also the minimum. | +| `plugins.security.restapi.password_score_based_validation_strength` | Sets a threshold to determine whether the password is strong or weak. There are four values that represent a threshold's increasing complexity.
`fair`--A very "guessable" password: provides protection from throttled online attacks.
`good`--A somewhat guessable password: provides protection from unthrottled online attacks.
`strong`--A safely "unguessable" password: provides moderate protection from an offline, slow-hash scenario.
`very_strong`--A very unguessable password: provides strong protection from an offline, slow-hash scenario. | + +The following example shows the settings configured for the `opensearch.yml` file and enabling a password with a minimum of 10 characters and a threshold requiring the highest strength: ```yml -plugins.security.allow_default_init_securityindex: true +plugins.security.restapi.password_min_length: 10 +plugins.security.restapi.password_score_based_validation_strength: very_strong ``` -Authentication cache for the Security plugin exists to help speed up authentication by temporarily storing user objects returned from the backend so that the Security plugin is not required to make repeated requests for them. To determine how long it takes for caching to time out, you can use the `plugins.security.cache.ttl_minutes` property to set a value in minutes. The default is `60`. You can disable caching by setting the value to `0`. +When you try to create a user with a password that doesn't reach the specified threshold, the system generates a "weak password" warning, indicating that the password needs to be modified before you can save the user. -```yml -plugins.security.cache.ttl_minutes: 60 +The following example shows the response from the [Create user]({{site.url}}{{site.baseurl}}/security/access-control/api/#create-user) API when the password is weak: + +```json +{ + "status": "error", + "reason": "Weak password" +} ``` ## allowlist.yml From ff1f1d3f9c94fe96f2050804ef424f55bca76bdc Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Thu, 8 Jun 2023 14:50:21 -0700 Subject: [PATCH 003/286] Updated the 404 message (#4288) * Updated the message Signed-off-by: Heather Halter * addtrailingslash Signed-off-by: Heather Halter * added editorial input Signed-off-by: Heather Halter * fixedlinktohomepage Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi --- 404.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/404.md b/404.md index 498a540ce8..5165c9e449 100644 --- a/404.md +++ b/404.md @@ -6,6 +6,15 @@ heading_anchors: false nav_exclude: true --- -# OpenSearch cannot find that page. +## Oops, this isn't the page you're looking for. + +Maybe our [homepage](https://opensearch.org/docs/latest) +or one of the popular pages listed below can help. + +- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) +- [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) +- [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) +- [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/) +- [API Reference]({{site.url}}{{site.baseurl}}/api-reference/index/) + -Perhaps we moved something around, or you mistyped the URL? Try using search or go to the [OpenSearch Documentation home page](https://opensearch.org/docs/latest/). If you need further help, see the [OpenSearch community forum](https://forum.opensearch.org/). From bf499679356966b83605d15e6842a97f7257342d Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 9 Jun 2023 11:13:02 -0500 Subject: [PATCH 004/286] Add redirect for ML Dashboard (#4294) Relates to issue https://github.com/opensearch-project/ml-commons-dashboards/issues/208 Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _ml-commons-plugin/ml-dashboard.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_ml-commons-plugin/ml-dashboard.md b/_ml-commons-plugin/ml-dashboard.md index 11d6e12c40..31f919a28c 100644 --- a/_ml-commons-plugin/ml-dashboard.md +++ b/_ml-commons-plugin/ml-dashboard.md @@ -2,6 +2,8 @@ layout: default title: Managing ML models in OpenSearch Dashboards nav_order: 120 +redirect_from: + - /ml-commons-plugin/ml-dashbaord/ --- Released in OpenSearch 2.6, the machine learning (ML) functionality in OpenSearch Dashboards is experimental and can't be used in a production environment. For updates or to leave feedback, see the [OpenSearch Forum discussion](https://forum.opensearch.org/t/feedback-ml-commons-ml-model-health-dashboard-for-admins-experimental-release/12494). From ba8c6bffbf794a04842343a3e3a168d3e2d659f0 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 9 Jun 2023 13:28:21 -0400 Subject: [PATCH 005/286] Add info to enable search pipelines (#4297) Signed-off-by: Fanit Kolchina Signed-off-by: Melissa Vagi --- _search-plugins/search-pipelines/index.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index 5a557cff8b..1aec24b864 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -13,6 +13,15 @@ This is an experimental feature and is not recommended for use in a production e You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application. +## Enabling search pipelines + +Search pipeline functionality is disabled by default. To enable it, edit the configuration in `opensearch.yml` and then restart your cluster: + +1. Navigate to the OpenSearch config directory. +1. Open the `opensearch.yml` configuration file. +1. Add `opensearch.experimental.feature.search_pipeline.enabled: true` and save the configuration file. +1. Restart your cluster. + ## Terminology The following is a list of search pipeline terminology: From 9ec60522379853a764b3fd53d837fd9b9daf8c76 Mon Sep 17 00:00:00 2001 From: Chris Moore <107723039+cwillum@users.noreply.github.com> Date: Mon, 12 Jun 2023 18:23:07 -0700 Subject: [PATCH 006/286] Add documentation for API rate limiting (#4287) * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum * fix#4171 api rate limit Signed-off-by: cwillum --------- Signed-off-by: cwillum Signed-off-by: Melissa Vagi --- _security/configuration/configuration.md | 79 +++++++++++++++++++++++- 1 file changed, 78 insertions(+), 1 deletion(-) diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index 3ca2e607fd..77018bccfe 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -136,7 +136,84 @@ In most cases, you set the `challenge` flag to `true`. The flag defines the beha If `challenge` is set to `true`, the Security plugin sends a response with status `UNAUTHORIZED` (401) back to the client. If the client is accessing the cluster with a browser, this triggers the authentication dialog box, and the user is prompted to enter a user name and password. -If `challenge` is set to `false` and no `Authorization` header field is set, the Security plugin does not send a `WWW-Authenticate` response back to the client, and authentication fails. You might want to use this setting if you have another challenge `http_authenticator` in your configured authentication domains. One such scenario is when you plan to use basic authentication and OpenID Connect together. +If `challenge` is set to `false` and no `Authorization` header field is set, the Security plugin does not send a `WWW-Authenticate` response back to the client, and authentication fails. Consider using this setting if you have more than one challenge `http_authenticator` keys in your configured authentication domains. This might be the case, for example, when you plan to use basic authentication and OpenID Connect together. + + +## API rate limiting + +API rate limiting is typically used to restrict the number of API calls that users can make in a set span of time, thereby helping to manage the rate of API traffic. For security purposes, rate limiting features have the potential to defend against DoS attacks, or repeated login attempts to gain access through trial and error, by restricting failed login attempts. + +You have the option to configure the Security plugin for username rate limiting, IP address rate limiting, or both. These configurations are made in the `config.yml` file. See the following sections for information about each type of rate limiting configuration. + + +### Username rate limiting + +This configuration limits login attempts by username. When a login fails, the username is blocked for any machine in the network. The following example shows `config.yml` file settings configured for username rate limiting: + +```yml +auth_failure_listeners: + internal_authentication_backend_limiting: + type: username + authentication_backend: internal + allowed_tries: 3 + time_window_seconds: 60 + block_expiry_seconds: 60 + max_blocked_clients: 100000 + max_tracked_clients: 100000 +``` +{% include copy.html %} + +The following table describes the individual settings for this type of configuration. + +| Setting | Description | +| :--- | :--- | +| `type` | The type of rate limiting. In this case, `username`. | +| `authentication_backend` | The internal backend. Enter `internal`. | +| `allowed_tries` | The number of login attempts allowed before login is blocked. Be aware that increasing the number increases heap usage. | +| `time_window_seconds` | The window of time in which the value for `allowed_tries` is enforced. For example, if `allowed_tries` is `3` and `time_window_seconds` is `60`, a username has three attempts to log in successfully within a 60-second time span before login is blocked. | +| `block_expiry_seconds` | The duration of time that login remains blocked after a failed login. After this time elapses, login is reset and the username can attempt successful login again. | +| `max_blocked_clients` | The maximum number of blocked usernames. This limits heap usage to avoid a potential DoS. | +| `max_tracked_clients` | The maximum number of tracked usernames that have failed login. This limits heap usage to avoid a potential DoS. | + + +### IP address rate limiting + +This configuration limits login attempts by IP address. When a login fails, the IP address specific to the machine being used for login is blocked. + +There are two steps for configuring IP address rate limiting. First, set the `challenge` setting to `false` in the `http_authenticator` section of the `config.yml` file. + +```yml +http_authenticator: + type: basic + challenge: false +``` + +For more information about this setting, see [HTTP basic authentication](#http-basic-authentication). + +Second, configure the IP address rate limiting settings. The following example shows a completed configuration: + +```yml +auth_failure_listeners: + ip_rate_limiting: + type: ip + allowed_tries: 1 + time_window_seconds: 20 + block_expiry_seconds: 180 + max_blocked_clients: 100000 + max_tracked_clients: 100000 +``` +{% include copy.html %} + +The following table describes the individual settings for this type of configuration. + +| Setting | Description | +| :--- | :--- | +| `type` | The type of rate limiting. In this case, `ip`. | +| `allowed_tries` | The number of login attempts allowed before login is blocked. Be aware that increasing the number increases heap usage. | +| `time_window_seconds` | The window of time in which the value for `allowed_tries` is enforced. For example, if `allowed_tries` is `3` and `time_window_seconds` is `60`, an IP address has three attempts to log in successfully within a 60-second time span before login is blocked. | +| `block_expiry_seconds` | The duration of time that login remains blocked after a failed login. After this time elapses, login is reset and the IP address can attempt successful login again. | +| `max_blocked_clients` | The maximum number of blocked IP addresses. This limits heap usage to avoid a potential DoS. | +| `max_tracked_clients` | The maximum number of tracked IP addresses that have failed login. This limits heap usage to avoid a potential DoS. | ## Backend configuration examples From e7aa8419ff23bb864fb84aea22d543f8ceb4373f Mon Sep 17 00:00:00 2001 From: Matthew Wells Date: Mon, 12 Jun 2023 19:16:33 -0700 Subject: [PATCH 007/286] Updated functions documentation (#4232) * Updated documentation of arithmetic functions, correct some mistakes, added missing functions Signed-off-by: Matthew Wells * Added a few missing variable definitions Signed-off-by: Matthew Wells * Updated arithmetic operators documentation to include symbol Signed-off-by: Matthew Wells * removed unneeded SELECT and LIMIT Signed-off-by: Matthew Wells * reformated table Signed-off-by: Matthew Wells * updated data types of functions and made some corrections Signed-off-by: Matthew Wells * removed missing brackets, fixed incorrect ifnull documentation Signed-off-by: Matthew Wells * fixed more minor mistakes and confirmed all function examples work Signed-off-by: Matthew Wells * added ticks to all function names to get it to pass the linter Signed-off-by: Matthew Wells --------- Signed-off-by: Matthew Wells Signed-off-by: Melissa Vagi --- _search-plugins/sql/functions.md | 292 ++++++++++++++++--------------- 1 file changed, 148 insertions(+), 144 deletions(-) diff --git a/_search-plugins/sql/functions.md b/_search-plugins/sql/functions.md index e065c80db4..de3b578e1a 100644 --- a/_search-plugins/sql/functions.md +++ b/_search-plugins/sql/functions.md @@ -18,166 +18,170 @@ The SQL plugin supports the following common functions shared across the SQL and ## Mathematical -Function | Specification | Example -:--- | :--- | :--- -abs | `abs(number T) -> T` | `SELECT abs(0.5) FROM my-index LIMIT 1` -add | `add(number T, number) -> T` | `SELECT add(1, 5) FROM my-index LIMIT 1` -cbrt | `cbrt(number T) -> T` | `SELECT cbrt(0.5) FROM my-index LIMIT 1` -ceil | `ceil(number T) -> T` | `SELECT ceil(0.5) FROM my-index LIMIT 1` -conv | `conv(string T, int a, int b) -> T` | `SELECT CONV('12', 10, 16), CONV('2C', 16, 10), CONV(12, 10, 2), CONV(1111, 2, 10) FROM my-index LIMIT 1` -crc32 | `crc32(string T) -> T` | `SELECT crc32('MySQL') FROM my-index LIMIT 1` -divide | `divide(number T, number) -> T` | `SELECT divide(1, 0.5) FROM my-index LIMIT 1` -e | `e() -> double` | `SELECT e() FROM my-index LIMIT 1` -exp | `exp(number T) -> T` | `SELECT exp(0.5) FROM my-index LIMIT 1` -expm1 | `expm1(number T) -> T` | `SELECT expm1(0.5) FROM my-index LIMIT 1` -floor | `floor(number T) -> T` | `SELECT floor(0.5) AS Rounded_Down FROM my-index LIMIT 1` -ln | `ln(number T) -> double` | `SELECT ln(10) FROM my-index LIMIT 1` -log | `log(number T) -> double` or `log(number T, number) -> double` | `SELECT log(10) FROM my-index LIMIT 1` -log2 | `log2(number T) -> double` | `SELECT log2(10) FROM my-index LIMIT 1` -log10 | `log10(number T) -> double` | `SELECT log10(10) FROM my-index LIMIT 1` -mod | `mod(number T, number) -> T` | `SELECT modulus(2, 3) FROM my-index LIMIT 1` -multiply | `multiply(number T, number) -> number` | `SELECT multiply(2, 3) FROM my-index LIMIT 1` -pi | `pi() -> double` | `SELECT pi() FROM my-index LIMIT 1` -pow | `pow(number T) -> T` or `pow(number T, number) -> T` | `SELECT pow(2, 3) FROM my-index LIMIT 1` -power | `power(number T) -> T` or `power(number T, number) -> T` | `SELECT power(2, 3) FROM my-index LIMIT 1` -rand | `rand() -> number` or `rand(number T) -> T` | `SELECT rand(0.5) FROM my-index LIMIT 1` -rint | `rint(number T) -> T` | `SELECT rint(1.5) FROM my-index LIMIT 1` -round | `round(number T) -> T` | `SELECT round(1.5) FROM my-index LIMIT 1` -sign | `sign(number T) -> T` | `SELECT sign(1.5) FROM my-index LIMIT 1` -signum | `signum(number T) -> T` | `SELECT signum(0.5) FROM my-index LIMIT 1` -sqrt | `sqrt(number T) -> T` | `SELECT sqrt(0.5) FROM my-index LIMIT 1` -strcmp | `strcmp(string T, string T) -> T` | `SELECT strcmp('hello', 'hello') FROM my-index LIMIT 1` -subtract | `subtract(number T, number) -> T` | `SELECT subtract(3, 2) FROM my-index LIMIT 1` -truncate | `truncate(number T, number T) -> T` | `SELECT truncate(56.78, 1) FROM my-index LIMIT 1` -/ | `number [op] number -> number` | `SELECT 1 / 100 FROM my-index LIMIT 1` -% | `number [op] number -> number` | `SELECT 1 % 100 FROM my-index LIMIT 1` +| Function | Specification | Example | +|:-----------|:-----------------------------------------------------------------|:-----------------------------------------------| +| `abs` | `abs(number T) -> T` | `SELECT abs(0.5)` | +| `add` | `add(number T, number T) -> T` | `SELECT add(1, 5)` | +| `cbrt` | `cbrt(number T) -> double` | `SELECT cbrt(8)` | +| `ceil` | `ceil(number T) -> T` | `SELECT ceil(0.5)` | +| `conv` | `conv(string T, integer, integer) -> string` | `SELECT conv('2C', 16, 10), conv(1111, 2, 10)` | +| `crc32` | `crc32(string) -> string` | `SELECT crc32('MySQL')` | +| `divide` | `divide(number T, number T) -> T` | `SELECT divide(1, 0.5)` | +| `e` | `e() -> double` | `SELECT e()` | +| `exp` | `exp(number T) -> double` | `SELECT exp(0.5)` | +| `expm1` | `expm1(number T) -> double` | `SELECT expm1(0.5)` | +| `floor` | `floor(number T) -> long` | `SELECT floor(0.5)` | +| `ln` | `ln(number T) -> double` | `SELECT ln(10)` | +| `log` | `log(number T) -> double` or `log(number T, number T) -> double` | `SELECT log(10)`, `SELECT log(2, 16)` | +| `log2` | `log2(number T) -> double` | `SELECT log2(10)` | +| `log10` | `log10(number T) -> double` | `SELECT log10(10)` | +| `mod` | `mod(number T, number T) -> T` | `SELECT mod(2, 3)` | +| `modulus` | `modulus(number T, number T) -> T` | `SELECT modulus(2, 3)` | +| `multiply` | `multiply(number T, number T) -> T` | `SELECT multiply(2, 3)` | +| `pi` | `pi() -> double` | `SELECT pi()` | +| `pow` | `pow(number T, number T) -> double` | `SELECT pow(2, 3)` | +| `power` | `power(number T, number T) -> double` | `SELECT power(2, 3)` | +| `rand` | `rand() -> float` or `rand(number T) -> float` | `SELECT rand()`, `SELECT rand(0.5)` | +| `rint` | `rint(number T) -> double` | `SELECT rint(1.5)` | +| `round` | `round(number T) -> T` or `round(number T, integer) -> T` | `SELECT round(1.5)`, `SELECT round(1.175, 2)` | +| `sign` | `sign(number T) -> integer` | `SELECT sign(1.5)` | +| `signum` | `signum(number T) -> integer` | `SELECT signum(0.5)` | +| `sqrt` | `sqrt(number T) -> double` | `SELECT sqrt(0.5)` | +| `strcmp` | `strcmp(string T, string T) -> integer` | `SELECT strcmp('hello', 'hello world')` | +| `subtract` | `subtract(number T, number T) -> T` | `SELECT subtract(3, 2)` | +| `truncate` | `truncate(number T, number T) -> T` | `SELECT truncate(56.78, 1)` | +| `+` | `number T + number T -> T` | `SELECT 1 + 5` | +| `-` | `number T - number T -> T` | `SELECT 3 - 2` | +| `*` | `number T * number T -> T` | `SELECT 2 * 3` | +| `/` | `number T / number T -> T` | `SELECT 1 / 0.5` | +| `%` | `number T % number T -> T` | `SELECT 2 % 3` | ## Trigonometric -Function | Specification | Example -:--- | :--- | :--- -acos | `acos(number T) -> double` | `SELECT acos(0.5) FROM my-index LIMIT 1` -asin | `asin(number T) -> double` | `SELECT asin(0.5) FROM my-index LIMIT 1` -atan | `atan(number T) -> double` | `SELECT atan(0.5) FROM my-index LIMIT 1` -atan2 | `atan2(number T, number) -> double` | `SELECT atan2(1, 0.5) FROM my-index LIMIT 1` -cos | `cos(number T) -> double` | `SELECT cos(0.5) FROM my-index LIMIT 1` -cosh | `cosh(number T) -> double` | `SELECT cosh(0.5) FROM my-index LIMIT 1` -cot | `cot(number T) -> double` | `SELECT cot(0.5) FROM my-index LIMIT 1` -degrees | `degrees(number T) -> double` | `SELECT degrees(0.5) FROM my-index LIMIT 1` -radians | `radians(number T) -> double` | `SELECT radians(0.5) FROM my-index LIMIT 1` -sin | `sin(number T) -> double` | `SELECT sin(0.5) FROM my-index LIMIT 1` -sinh | `sinh(number T) -> double` | `SELECT sinh(0.5) FROM my-index LIMIT 1` -tan | `tan(number T) -> double` | `SELECT tan(0.5) FROM my-index LIMIT 1` +| Function | Specification | Example | +|:----------|:--------------------------------------|:-----------------------| +| `acos` | `acos(number T) -> double` | `SELECT acos(0.5)` | +| `asin` | `asin(number T) -> double` | `SELECT asin(0.5)` | +| `atan` | `atan(number T) -> double` | `SELECT atan(0.5)` | +| `atan2` | `atan2(number T, number T) -> double` | `SELECT atan2(1, 0.5)` | +| `cos` | `cos(number T) -> double` | `SELECT cos(0.5)` | +| `cosh` | `cosh(number T) -> double` | `SELECT cosh(0.5)` | +| `cot` | `cot(number T) -> double` | `SELECT cot(0.5)` | +| `degrees` | `degrees(number T) -> double` | `SELECT degrees(0.5)` | +| `radians` | `radians(number T) -> double` | `SELECT radians(0.5)` | +| `sin` | `sin(number T) -> double` | `SELECT sin(0.5)` | +| `sinh` | `sinh(number T) -> double` | `SELECT sinh(0.5)` | +| `tan` | `tan(number T) -> double` | `SELECT tan(0.5)` | ## Date and time Functions marked with * are only available in SQL. -Function | Specification | Example -:--- | :--- | :--- -adddate | `adddate(date, INTERVAL expr unit) -> date` | `SELECT adddate(date('2020-08-26'), INTERVAL 1 hour) FROM my-index LIMIT 1` -addtime | `addtime(date, date) -> date` | `SELECT addtime(date('2008-12-12'), date('2008-12-12'))` -convert_tz | `convert_tz(date, string, string) -> date` | `SELECT convert_tz('2008-12-25 05:30:00', '+00:00', 'America/Los_Angeles')` -curtime | `curtime() -> time` | `SELECT curtime()` -curdate | `curdate() -> date` | `SELECT curdate() FROM my-index LIMIT 1` -current_date | `current_date() -> date` | `SELECT current_date() FROM my-index LIMIT 1` -current_time | `current_time() -> time` | `SELECT current_time()` -current_timestamp | `current_timestamp() -> date` | `SELECT current_timestamp() FROM my-index LIMIT 1` -date | `date(date) -> date` | `SELECT date() FROM my-index LIMIT 1` -datediff | `datediff(date, date) -> integer` | `SELECT datediff(date('2000-01-02'), date('2000-01-01'))` -datetime | `datetime(string) -> datetime` | `SELECT datetime('2008-12-25 00:00:00')` -date_add | `date_add(date, INTERVAL integer UNIT)` | `SELECT date_add('2020-08-26'), INTERVAL 1 HOUR)` -date_format | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date, 'Y') FROM my-index LIMIT 1` -date_sub | `date_sub(date, INTERVAL expr unit) -> date` | `SELECT date_sub(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` -dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` -day | `day(date) -> integer` | `SELECT day(date('2020-08-25'))` -dayname | `dayname(date) -> string` | `SELECT dayname(date('2020-08-26')) FROM my-index LIMIT 1` -dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` -dayofweek | `dayofweek(date) -> integer` | `SELECT dayofweek(date) FROM my-index LIMIT 1` -dayofyear | `dayofyear(date) -> integer` | `SELECT dayofyear(date('2020-08-26')) FROM my-index LIMIT 1` -dayofweek | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26')) FROM my-index LIMIT 1` -day_of_month\* | `day_of_month(date) -> integer` | `SELECT day_of_month(date) FROM my-index LIMIT 1` -day_of_week\* | `day_of_week(date) -> integer` | `SELECT day_of_week(date('2020-08-26')) FROM my-index LIMIT 1` -day_of_year\* | `day_of_year(date) -> integer` | `SELECT day_of_year(date('2020-08-26')) FROM my-index LIMIT 1` -extract\* | `extract(part FROM date) -> integer` | `SELECT extract(MONTH FROM datetime('2020-08-26 10:11:12'))` -from_days | `from_days(N) -> integer` | `SELECT from_days(733687) FROM my-index LIMIT 1` -from_unixtime | `from_unixtime(N) -> date` | `SELECT from_unixtime(1220249547)` -get_format | `get_format(PART, string) -> string` | `SELECT get_format(DATE, 'USA')` -hour | `hour(time) -> integer` | `SELECT hour((time '01:02:03')) FROM my-index LIMIT 1` -hour_of_day\* | `hour_of_day(time) -> integer` | `SELECT hour_of_day((time '01:02:03')) FROM my-index LIMIT 1` -last_day\* | `last_day(date) -> integer` | `SELECT last_day(date('2020-08-26'))` -localtime | `localtime() -> date` | `SELECT localtime() FROM my-index LIMIT 1` -localtimestamp | `localtimestamp() -> date` | `SELECT localtimestamp() FROM my-index LIMIT 1` -makedate | `makedate(double, double) -> date` | `SELECT makedate(1945, 5.9)` -maketime | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3) FROM my-index LIMIT 1` -microsecond | `microsecond(expr) -> integer` | `SELECT microsecond((time '01:02:03.123456')) FROM my-index LIMIT 1` -minute | `minute(expr) -> integer` | `SELECT minute((time '01:02:03')) FROM my-index LIMIT 1` -minute_of_day\* | `minute_of_day(expr) -> integer` | `SELECT minute_of_day((time '01:02:03')) FROM my-index LIMIT 1` -minute_of_hour\* | `minute_of_hour(expr) -> integer` | `SELECT minute_of_hour((time '01:02:03')) FROM my-index LIMIT 1` -month | `month(date) -> integer` | `SELECT month(date) FROM my-index` -month_of_year\* | `month_of_year(date) -> integer` | `SELECT month_of_year(date) FROM my-index` -monthname | `monthname(date) -> string` | `SELECT monthname(date) FROM my-index` -now | `now() -> date` | `SELECT now() FROM my-index LIMIT 1` -period_add | `period_add(integer, integer)` | `SELECT period_add(200801, 2)` -period_diff | `period_diff(integer, integer)` | `SELECT period_diff(200802, 200703)` -quarter | `quarter(date) -> integer` | `SELECT quarter(date('2020-08-26')) FROM my-index LIMIT 1` -second | `second(time) -> integer` | `SELECT second((time '01:02:03')) FROM my-index LIMIT 1` -second_of_minute\* | `second_of_minute(time) -> integer` | `SELECT second_of_minute((time '01:02:03')) FROM my-index LIMIT 1` -sec_to_time\* | `sec_to_time(integer) -> date` | `SELECT sec_to_time(10000)` -subdate | `subdate(date, INTERVAL expr unit) -> date, datetime` | `SELECT subdate(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` -subtime | `subtime(date, date) -> date` | `SELECT subtime(date('2008-12-12'), date('2008-11-15'))` -str_to_date\* | `str_to_date(string, format) -> date` | `SELECT str_to_date("March 10 2000", %M %d %Y")` -time | `time(expr) -> time` | `SELECT time('13:49:00') FROM my-index LIMIT 1` -timediff | `timediff(time, time) -> time` | `SELECT timediff(time('23:59:59'), time('13:00:00'))` -timestamp | `timestamp(date) -> date` | `SELECT timestamp(date) FROM my-index LIMIT 1` -timestampadd | `timestampadd(interval, integer, date) -> date)` | `SELECT timestampadd(DAY, 17, datetime('2000-01-01 00:00:00'))` -timestampdiff | `timestampdiff(interval, date, date) -> integer` | `SELECT timestampdiff(YEAR, '1997-01-01 00:00:00, '2001-03-06 00:00:00')` -time_format | `time_format(date, string) -> string` | `SELECT time_format('1998-01-31 13:14:15.012345', '%f %H %h %I %i %p %r %S %s %T')` -time_to_sec | `time_to_sec(time) -> long` | `SELECT time_to_sec(time '22:23:00') FROM my-index LIMIT 1` -to_days | `to_days(date) -> long` | `SELECT to_days(date '2008-10-07') FROM my-index LIMIT 1` -to_seconds | `to_seconds(date) -> integer` | `SELECT to_seconds(date('2008-10-07')` -unix_timestamp | `unix_timestamp(date) -> double` | `SELECT unix_timestamp(timestamp('1996-11-15 17:05:42'))` -utc_date | `utc_date() -> date` | `SELECT utc_date()` -utc_time | `utc_time() -> date` | `SELECT utc_time()` -utc_timestamp | `utc_timestamp() -> date` | `SELECT utc_timestamp()` -week | `week(date[mode]) -> integer` | `SELECT week(date('2008-02-20')) FROM my-index LIMIT 1` -weekofyear | `weekofyear(date[mode]) -> integer` | `SELECT weekofyear(date('2008-02-20')) FROM my-index LIMIT 1` -week_of_year\* | `week_of_year(date[mode]) -> integer` | `SELECT week_of_year(date('2008-02-20')) FROM my-index LIMIT 1` -year | `year(date) -> integer` | `SELECT year(date) FROM my-index LIMIT 1` -yearweek\* | `yearweek(date[mode]) -> integer` | `SELECT yearweek(date('2008-02-20')) FROM my-index LIMIT 1` +| Function | Specification | Example | +|:---------------------|:---------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| +| `adddate` | `adddate(date, INTERVAL expr unit) -> date` | `SELECT adddate(date('2020-08-26'), INTERVAL 1 hour)` | +| `addtime` | `addtime(date, date) -> date` | `SELECT addtime(date('2008-12-12'), date('2008-12-12'))` | +| `convert_tz` | `convert_tz(date, string, string) -> date` | `SELECT convert_tz('2008-12-25 05:30:00', '+00:00', 'America/Los_Angeles')` | +| `curtime` | `curtime() -> time` | `SELECT curtime()` | +| `curdate` | `curdate() -> date` | `SELECT curdate()` | +| `current_date` | `current_date() -> date` | `SELECT current_date()` | +| `current_time` | `current_time() -> time` | `SELECT current_time()` | +| `current_timestamp` | `current_timestamp() -> date` | `SELECT current_timestamp()` | +| `date` | `date(date) -> date` | `SELECT date('2000-01-02')` | +| `datediff` | `datediff(date, date) -> integer` | `SELECT datediff(date('2000-01-02'), date('2000-01-01'))` | +| `datetime` | `datetime(string) -> datetime` | `SELECT datetime('2008-12-25 00:00:00')` | +| `date_add` | `date_add(date, INTERVAL integer UNIT)` | `SELECT date_add('2020-08-26', INTERVAL 1 HOUR)` | +| `date_format` | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date('2020-08-26'), 'Y')` | +| `date_sub` | `date_sub(date, INTERVAL expr unit) -> date` | `SELECT date_sub(date('2008-01-02'), INTERVAL 31 day)` | +| `dayofmonth` | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date('2001-05-07'))` | +| `day` | `day(date) -> integer` | `SELECT day(date('2020-08-25'))` | +| `dayname` | `dayname(date) -> string` | `SELECT dayname(date('2020-08-26'))` | +| `dayofmonth` | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date('2020-08-26'))` | +| `dayofweek` | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26'))` | +| `dayofyear` | `dayofyear(date) -> integer` | `SELECT dayofyear(date('2020-08-26'))` | +| `dayofweek` | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26'))` | +| `day_of_month`\* | `day_of_month(date) -> integer` | `SELECT day_of_month(date('2020-08-26'))` | +| `day_of_week`\* | `day_of_week(date) -> integer` | `SELECT day_of_week(date('2020-08-26'))` | +| `day_of_year`\* | `day_of_year(date) -> integer` | `SELECT day_of_year(date('2020-08-26'))` | +| `extract`\* | `extract(part FROM date) -> integer` | `SELECT extract(MONTH FROM datetime('2020-08-26 10:11:12'))` | +| `from_days` | `from_days(N) -> integer` | `SELECT from_days(733687)` | +| `from_unixtime` | `from_unixtime(N) -> date` | `SELECT from_unixtime(1220249547)` | +| `get_format` | `get_format(PART, string) -> string` | `SELECT get_format(DATE, 'USA')` | +| `hour` | `hour(time) -> integer` | `SELECT hour(time '01:02:03')` | +| `hour_of_day`\* | `hour_of_day(time) -> integer` | `SELECT hour_of_day(time '01:02:03')` | +| `last_day`\* | `last_day(date) -> integer` | `SELECT last_day(date('2020-08-26'))` | +| `localtime` | `localtime() -> date` | `SELECT localtime()` | +| `localtimestamp` | `localtimestamp() -> date` | `SELECT localtimestamp()` | +| `makedate` | `makedate(double, double) -> date` | `SELECT makedate(1945, 5.9)` | +| `maketime` | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3)` | +| `microsecond` | `microsecond(expr) -> integer` | `SELECT microsecond(time '01:02:03.123456')` | +| `minute` | `minute(expr) -> integer` | `SELECT minute(time '01:02:03')` | +| `minute_of_day`\* | `minute_of_day(expr) -> integer` | `SELECT minute_of_day(time '01:02:03')` | +| `minute_of_hour`\* | `minute_of_hour(expr) -> integer` | `SELECT minute_of_hour(time '01:02:03')` | +| `month` | `month(date) -> integer` | `SELECT month(date('2020-08-26'))` | +| `month_of_year`\* | `month_of_year(date) -> integer` | `SELECT month_of_year(date('2020-08-26'))` | +| `monthname` | `monthname(date) -> string` | `SELECT monthname(date('2020-08-26'))` | +| `now` | `now() -> date` | `SELECT now()` | +| `period_add` | `period_add(integer, integer)` | `SELECT period_add(200801, 2)` | +| `period_diff` | `period_diff(integer, integer)` | `SELECT period_diff(200802, 200703)` | +| `quarter` | `quarter(date) -> integer` | `SELECT quarter(date('2020-08-26'))` | +| `second` | `second(time) -> integer` | `SELECT second(time '01:02:03')` | +| `second_of_minute`\* | `second_of_minute(time) -> integer` | `SELECT second_of_minute(time '01:02:03')` | +| `sec_to_time`\* | `sec_to_time(integer) -> date` | `SELECT sec_to_time(10000)` | +| `subdate` | `subdate(date, INTERVAL expr unit) -> date, datetime` | `SELECT subdate(date('2008-01-02'), INTERVAL 31 day)` | +| `subtime` | `subtime(date, date) -> date` | `SELECT subtime(date('2008-12-12'), date('2008-11-15'))` | +| `str_to_date`\* | `str_to_date(string, format) -> date` | `SELECT str_to_date("01,5,2013", "%d,%m,%Y")` | +| `time` | `time(expr) -> time` | `SELECT time('13:49:00')` | +| `timediff` | `timediff(time, time) -> time` | `SELECT timediff(time('23:59:59'), time('13:00:00'))` | +| `timestamp` | `timestamp(date) -> date` | `SELECT timestamp('2001-05-07 00:00:00')` | +| `timestampadd` | `timestampadd(interval, integer, date) -> date)` | `SELECT timestampadd(DAY, 17, datetime('2000-01-01 00:00:00'))` | +| `timestampdiff` | `timestampdiff(interval, date, date) -> integer` | `SELECT timestampdiff(YEAR, '1997-01-01 00:00:00', '2001-03-06 00:00:00')` | +| `time_format` | `time_format(date, string) -> string` | `SELECT time_format('1998-01-31 13:14:15.012345', '%f %H %h %I %i %p %r %S %s %T')` | +| `time_to_sec` | `time_to_sec(time) -> long` | `SELECT time_to_sec(time '22:23:00')` | +| `to_days` | `to_days(date) -> long` | `SELECT to_days(date '2008-10-07')` | +| `to_seconds` | `to_seconds(date) -> integer` | `SELECT to_seconds(date('2008-10-07'))` | +| `unix_timestamp` | `unix_timestamp(date) -> double` | `SELECT unix_timestamp(timestamp('1996-11-15 17:05:42'))` | +| `utc_date` | `utc_date() -> date` | `SELECT utc_date()` | +| `utc_time` | `utc_time() -> date` | `SELECT utc_time()` | +| `utc_timestamp` | `utc_timestamp() -> date` | `SELECT utc_timestamp()` | +| `week` | `week(date[mode]) -> integer` | `SELECT week(date('2008-02-20'))` | +| `weekofyear` | `weekofyear(date[mode]) -> integer` | `SELECT weekofyear(date('2008-02-20'))` | +| `week_of_year`\* | `week_of_year(date[mode]) -> integer` | `SELECT week_of_year(date('2008-02-20'))` | +| `year` | `year(date) -> integer` | `SELECT year(date('2001-07-05'))` | +| `yearweek`\* | `yearweek(date[mode]) -> integer` | `SELECT yearweek(date('2008-02-20'))` | ## String -Function | Specification | Example -:--- | :--- | :--- -ascii | `ascii(string T) -> integer` | `SELECT ascii(name.keyword) FROM my-index LIMIT 1` -concat | `concat(str1, str2) -> string` | `SELECT concat('hello', 'world') FROM my-index LIMIT 1` -concat_ws | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws("-", "Tutorial", "is", "fun!") FROM my-index LIMIT 1` -left | `left(string T, integer) -> T` | `SELECT left('hello', 2) FROM my-index LIMIT 1` -length | `length(string) -> integer` | `SELECT length('hello') FROM my-index LIMIT 1` -locate | `locate(string, string, integer) -> integer` or `locate(string, string) -> INTEGER` | `SELECT locate('o', 'hello') FROM my-index LIMIT 1`, `SELECT locate('l', 'hello', 3) FROM my-index LIMIT 1` -replace | `replace(string T, string, string) -> T` | `SELECT replace('hello', 'l', 'x') FROM my-index LIMIT 1` -right | `right(string T, integer) -> T` | `SELECT right('hello', 1) FROM my-index LIMIT 1` -rtrim | `rtrim(string T) -> T` | `SELECT rtrim(name.keyword) FROM my-index LIMIT 1` -substring | `substring(string T, integer, integer) -> T` | `SELECT substring(name.keyword, 2,5) FROM my-index LIMIT 1` -trim | `trim(string T) -> T` | `SELECT trim(' hello') FROM my-index LIMIT 1` -upper | `upper(string T) -> T` | `SELECT upper('helloworld') FROM my-index LIMIT 1` +| Function | Specification | Example | +|:------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------| +| `ascii` | `ascii(string) -> integer` | `SELECT ascii('h')` | +| `concat` | `concat(string, string) -> string` | `SELECT concat('hello', 'world')` | +| `concat_ws` | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws(" ", "Hello", "World!")` | +| `left` | `left(string, integer) -> string` | `SELECT left('hello', 2)` | +| `length` | `length(string) -> integer` | `SELECT length('hello')` | +| `locate` | `locate(string, string, integer) -> integer` or `locate(string, string) -> integer` | `SELECT locate('o', 'hello')`, `locate('l', 'hello world', 5)` | +| `replace` | `replace(string, string, string) -> string` | `SELECT replace('hello', 'l', 'x')` | +| `right` | `right(string, integer) -> string` | `SELECT right('hello', 2)` | +| `rtrim` | `rtrim(string) -> string` | `SELECT rtrim('hello ')` | +| `substring` | `substring(string, integer, integer) -> string` | `SELECT substring('hello', 2, 4)` | +| `trim` | `trim(string) -> string` | `SELECT trim(' hello')` | +| `upper` | `upper(string) -> string` | `SELECT upper('hello world')` | ## Aggregate -Function | Specification | Example -:--- | :--- | :--- -avg | `avg(number T) -> T` | `SELECT avg(2, 3) FROM my-index LIMIT 1` -count | `count(number T) -> T` | `SELECT count(date) FROM my-index LIMIT 1` -min | `min(number T, number) -> T` | `SELECT min(2, 3) FROM my-index LIMIT 1` -show | `show(string T) -> T` | `SHOW TABLES LIKE my-index` +| Function | Specification | Example | +|:---------|:-------------------------|:-----------------------------------| +| `avg` | `avg(number T) -> T` | `SELECT avg(column) FROM my-index` | +| `count` | `count(number T) -> T` | `SELECT count(date) FROM my-index` | +| `min` | `min(number T) -> T` | `SELECT min(column) FROM my-index` | +| `show` | `show(string) -> string` | `SHOW TABLES LIKE my-index` | ## Advanced -Function | Specification | Example -:--- | :--- | :--- -if | `if(boolean, es_type, es_type) -> es_type` | `SELECT if(false, 0, 1) FROM my-index LIMIT 1`, `SELECT if(true, 0, 1) FROM my-index LIMIT 1` -ifnull | `ifnull(es_type, es_type) -> es_type` | `SELECT ifnull('hello', 1) FROM my-index LIMIT 1`, `SELECT ifnull(null, 1) FROM my-index LIMIT 1` -isnull | `isnull(es_type) -> integer` | `SELECT isnull(null) FROM my-index LIMIT 1`, `SELECT isnull(1) FROM my-index LIMIT 1` +| Function | Specification | Example | +|:---------|:-------------------------------------------|:----------------------------------------| +| `if` | `if(boolean, os_type, os_type) -> os_type` | `SELECT if(false, 0, 1),if(true, 0, 1)` | +| `ifnull` | `ifnull(os_type, os_type) -> os_type` | `SELECT ifnull(0, 1), ifnull(null, 1)` | +| `isnull` | `isnull(os_type) -> integer` | `SELECT isnull(null), isnull(1)` | ## Relevance-based search (full-text search) From c77e0088774e351942626643cd1c51b695ff0eff Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Tue, 13 Jun 2023 07:44:22 -0700 Subject: [PATCH 008/286] Final text for 404 (#4302) * final text Signed-off-by: Heather Halter * text Signed-off-by: Heather Halter * removedfile Signed-off-by: Heather Halter * Update 404.md Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Heather Halter --------- Signed-off-by: Heather Halter Signed-off-by: Heather Halter Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: Melissa Vagi --- 404.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/404.md b/404.md index 5165c9e449..60a1bc8847 100644 --- a/404.md +++ b/404.md @@ -8,8 +8,7 @@ nav_exclude: true ## Oops, this isn't the page you're looking for. -Maybe our [homepage](https://opensearch.org/docs/latest) -or one of the popular pages listed below can help. +Maybe our [home page](https://opensearch.org/docs/latest) or one of the commonly visited pages below will help. If you need further support, please use the feedback feature on the right side of the screen to get in touch. - [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) - [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) From 2a2af26b8ccd2d934ca3f7791b38207336f01d33 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Tue, 13 Jun 2023 09:55:41 -0700 Subject: [PATCH 009/286] updateinfo (#4304) Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi --- _tuning-your-cluster/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/index.md b/_tuning-your-cluster/index.md index 5365b88c66..8172cfff18 100644 --- a/_tuning-your-cluster/index.md +++ b/_tuning-your-cluster/index.md @@ -120,16 +120,16 @@ node.roles: [] ## Step 3: Bind a cluster to specific IP addresses -`network_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6: +`network.bind_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6: ```yml -network.host: [_local_, _site_] +network.bind_host: [_local_, _site_] ``` To form a multi-node cluster, specify the IP address of the node: ```yml -network.host: +network.bind_host: ``` Make sure to configure these settings on all of your nodes. From 15a951746a3864d7ca4aebdcd8e9f6565dc633de Mon Sep 17 00:00:00 2001 From: David Venable Date: Tue, 13 Jun 2023 13:53:29 -0500 Subject: [PATCH 010/286] Documents the Data Prepper opensearch sink's template_type parameter. (#4290) * Documents the Data Prepper opensearch sink's template_type parameter. Signed-off-by: David Venable * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _data-prepper/pipelines/configuration/sinks/opensearch.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md index 0990e5f7dc..81ebe0dbc4 100644 --- a/_data-prepper/pipelines/configuration/sinks/opensearch.md +++ b/_data-prepper/pipelines/configuration/sinks/opensearch.md @@ -66,7 +66,8 @@ insecure | No | Boolean | Whether or not to verify SSL certificates. If set to t proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "<host name or IP>:<port>". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted. index | Conditionally | String | Name of the export index. Applicable and required only when the `index_type` is `custom`. index_type | No | String | This index type tells the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, `management-disabled`. Default value is `custom`. -template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (for example, `/your/local/template-file.json`) if `index_type` is `custom`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example. +template_type | No | String | Defines what type of OpenSearch template to use. The available options are `v1` and `index-template`. The default value is `v1`, which uses the original OpenSearch templates available at the `_template` API endpoints. The `index-template` option uses composable [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) which are available through OpenSearch's `_index_template` API. Composable index types offer more flexibility than the default and are necessary when an OpenSearch cluster has already existing index templates. Composable templates are available for all versions of OpenSearch and some later versions of Elasticsearch. +template_file | No | String | The path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file such as `/your/local/template-file.json` when `index_type` is set to `custom`. For an example template file, see [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json). If you supply a template file it must match the template format specified by the `template_type` parameter. document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (for example, `"my-field"`) if `index_type` is `custom`. dlq_file | No | String | The path to your preferred dead letter queue file (for example, `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster. dlq | No | N/A | DLQ configurations. See [Dead Letter Queues]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/dlq/) for details. If the `dlq_file` option is also available, the sink will fail. From 94579cbd38c7a5ee0673db9f1483343757806f28 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 13 Jun 2023 15:06:50 -0400 Subject: [PATCH 011/286] Fix links for link checker (#4309) Signed-off-by: Fanit Kolchina Signed-off-by: Melissa Vagi --- _clients/ruby.md | 2 +- _plugins/link-checker.rb | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/_clients/ruby.md b/_clients/ruby.md index 59fa413a6c..7d582927c6 100644 --- a/_clients/ruby.md +++ b/_clients/ruby.md @@ -634,7 +634,7 @@ puts MultiJson.dump(response, pretty: "true") # Ruby AWS Sigv4 Client -The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby/tree/main/opensearch-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: +The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: ```ruby require 'opensearch-aws-sigv4' diff --git a/_plugins/link-checker.rb b/_plugins/link-checker.rb index 0c7df2b116..25f1c6e7af 100644 --- a/_plugins/link-checker.rb +++ b/_plugins/link-checker.rb @@ -55,6 +55,7 @@ module Jekyll::LinkChecker 'playground.opensearch.org', # inifite redirect, https://github.com/opensearch-project/dashboards-anywhere/issues/172 'crates.io', # 404s on bots 'www.cloudflare.com', # 403s on bots + 'example.issue.link', # a fake example link from the template ] ## From 156703567c2d91a9448994b7b48ce0c594650277 Mon Sep 17 00:00:00 2001 From: "Daniel (dB.) Doubrovkine" Date: Tue, 13 Jun 2023 16:25:08 -0400 Subject: [PATCH 012/286] Use a different user-agent. (#4313) Signed-off-by: dblock Signed-off-by: Melissa Vagi --- _plugins/link-checker.rb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_plugins/link-checker.rb b/_plugins/link-checker.rb index 25f1c6e7af..5dfd53c9f1 100644 --- a/_plugins/link-checker.rb +++ b/_plugins/link-checker.rb @@ -104,7 +104,8 @@ def self.init(site) @external_link_checker = LinkChecker::Typhoeus::Hydra::Checker.new( logger: Jekyll.logger, hydra: { max_concurrency: 2 }, - retries: 3 + retries: 3, + user_agent: 'OpenSearch Documentation Website Link Checker/1.0' ) @external_link_checker.on :failure, :error do |result| From afd2c503ee19973e17cd878bc170e601a4834c37 Mon Sep 17 00:00:00 2001 From: Chris Moore <107723039+cwillum@users.noreply.github.com> Date: Tue, 13 Jun 2023 14:06:46 -0700 Subject: [PATCH 013/286] fix#4315 fix sec config example (#4327) Signed-off-by: cwillum Signed-off-by: Melissa Vagi --- _security/configuration/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index 77018bccfe..5226f6733d 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -16,7 +16,7 @@ The main configuration file for authentication and authorization backends is `co `config.yml` has three main parts: ```yml -opensearch_security: +config: dynamic: http: ... From 14ee20bc73eca6cb7d9c2ad8ad082e05eac3f061 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 13 Jun 2023 17:08:02 -0400 Subject: [PATCH 014/286] Add hyperparameters and undeploy to vocabulary (#4328) Signed-off-by: Fanit Kolchina Signed-off-by: Melissa Vagi --- .github/vale/styles/Vocab/OpenSearch/Words/accept.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index ae2c16d12b..26b9d99616 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -35,6 +35,7 @@ GeoHex gibibyte [Hh]ashmap [Hh]ostname +[Hh]yperparameters [Ii]mpactful [Ii]ngress [Ii]nitializer @@ -103,6 +104,7 @@ tebibyte [Uu]nary [Uu]ncheck [Uu]ncomment +[Uu]ndeploy [Uu]nigram [Uu]nnesting [Uu]nrecovered From 1a9b22515e6f20d06b3392ffe5291b8d18488fe6 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Tue, 13 Jun 2023 14:45:17 -0700 Subject: [PATCH 015/286] fixsnapshot (#4330) Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi --- .../availability-and-recovery/snapshots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/index.md b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md index 3fde2804b7..32b46a92ff 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/index.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md @@ -3,7 +3,7 @@ layout: default title: Snapshots nav_order: 5 has_children: true -parent: Availability and Recovery +parent: Availability and recovery redirect_from: - /opensearch/snapshots/ - /opensearch/snapshots/index/ From 2d8d3c282f7e39b89bdd9770ae1a5e758ffea6f3 Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Tue, 13 Jun 2023 15:22:11 -0700 Subject: [PATCH 016/286] fixsnapshottopics (#4332) Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 +- .../availability-and-recovery/snapshots/sm-api.md | 2 +- .../availability-and-recovery/snapshots/snapshot-management.md | 2 +- .../availability-and-recovery/snapshots/snapshot-restore.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index a28b4d9c58..2de6a32c75 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -3,7 +3,7 @@ layout: default title: Searchable snapshots parent: Snapshots nav_order: 40 -grand_parent: Availability and Recovery +grand_parent: Availability and recovery redirect_from: - /opensearch/snapshots/searchable_snapshot/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md index 1bb5b87cb3..3f059fa970 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md @@ -4,7 +4,7 @@ title: Snapshot management API parent: Snapshots nav_order: 30 has_children: false -grand_parent: Availability and Recovery +grand_parent: Availability and recovery redirect_from: - /opensearch/snapshots/sm-api/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md index 9a25b28683..b557111e49 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md @@ -4,7 +4,7 @@ title: Snapshot management parent: Snapshots nav_order: 20 has_children: false -grand_parent: Availability and Recovery +grand_parent: Availability and recovery redirect_from: - /opensearch/snapshots/snapshot-management/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md index 7889c3a018..2a79e7adbc 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md @@ -4,7 +4,7 @@ title: Take and restore snapshots parent: Snapshots nav_order: 10 has_children: false -grand_parent: Availability and Recovery +grand_parent: Availability and recovery redirect_from: - /opensearch/snapshots/snapshot-restore/ - /opensearch/snapshot-restore/ From 61ffe7bbb7c923586b7e861a844311341deebc2c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 16:18:56 -0600 Subject: [PATCH 017/286] Add processor index page and content Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 97d85c47d5..e027bb75e3 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -7,3 +7,15 @@ nav_order: 50 # Ingest processors +Ingest processors have a crucial role in preparing and enriching data before it is stored and analyzed and improving data quality and usability. They are a set of functionalities or operations applied to incoming data during the ingestion process and allow for real-time data transformation, manipulation, and enrichment. + +Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape data as it enters a system, making it more suitable for downstream operations such as indexing, analysis, or storage. They have a range of capabilities--data extraction, validation, filtering, enrichment, and normalization--that can be performed on different aspects of the data, such as extracting specfic fields, converting data types, removing or modifying unwanted data, or enriching data with additional information. + +OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: + +```json +GET /_nodes/ingest +``` +{% include copy-curl.html %} + +Learn more about processor types within their respective documentation. From 56e1bdfccd766b6d2fb5f33ead08c5e5356035e1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 16:28:31 -0600 Subject: [PATCH 018/286] Add processor index page and content Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index e027bb75e3..95b0a6da58 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -18,4 +18,4 @@ GET /_nodes/ingest ``` {% include copy-curl.html %} -Learn more about processor types within their respective documentation. +Learn more about the processor types within their respective documentation. From 31d2294a037d60f72700cfa08ab1e5eb78e65dc8 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 16:44:15 -0600 Subject: [PATCH 019/286] Add each processor page Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/append.md | 0 _api-reference/ingest-apis/processors-list/bytes.md | 0 _api-reference/ingest-apis/processors-list/convert.md | 0 _api-reference/ingest-apis/processors-list/csv.md | 0 _api-reference/ingest-apis/processors-list/date-index-name.md | 0 _api-reference/ingest-apis/processors-list/date.md | 0 _api-reference/ingest-apis/processors-list/dissect.md | 0 _api-reference/ingest-apis/processors-list/dot-expander.md | 0 _api-reference/ingest-apis/processors-list/drop.md | 0 _api-reference/ingest-apis/processors-list/fail.md | 0 _api-reference/ingest-apis/processors-list/foreach.md | 0 _api-reference/ingest-apis/processors-list/geoip.md | 0 _api-reference/ingest-apis/processors-list/geojson-feature.md | 0 _api-reference/ingest-apis/processors-list/grok.md | 0 _api-reference/ingest-apis/processors-list/gsub.md | 0 _api-reference/ingest-apis/processors-list/html-strip.md | 0 _api-reference/ingest-apis/processors-list/join.md | 0 _api-reference/ingest-apis/processors-list/json.md | 0 _api-reference/ingest-apis/processors-list/kv.md | 0 _api-reference/ingest-apis/processors-list/lowercase.md | 0 _api-reference/ingest-apis/processors-list/pipeline.md | 0 _api-reference/ingest-apis/processors-list/remove.md | 0 _api-reference/ingest-apis/processors-list/rename.md | 0 _api-reference/ingest-apis/processors-list/script.md | 0 _api-reference/ingest-apis/processors-list/set.md | 0 _api-reference/ingest-apis/processors-list/sort.md | 0 _api-reference/ingest-apis/processors-list/split.md | 0 _api-reference/ingest-apis/processors-list/text-embedding.md | 0 _api-reference/ingest-apis/processors-list/trim.md | 0 _api-reference/ingest-apis/processors-list/uppercase.md | 0 _api-reference/ingest-apis/processors-list/urldecode.md | 0 _api-reference/ingest-apis/processors-list/user-agent.md | 0 32 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 _api-reference/ingest-apis/processors-list/append.md create mode 100644 _api-reference/ingest-apis/processors-list/bytes.md create mode 100644 _api-reference/ingest-apis/processors-list/convert.md create mode 100644 _api-reference/ingest-apis/processors-list/csv.md create mode 100644 _api-reference/ingest-apis/processors-list/date-index-name.md create mode 100644 _api-reference/ingest-apis/processors-list/date.md create mode 100644 _api-reference/ingest-apis/processors-list/dissect.md create mode 100644 _api-reference/ingest-apis/processors-list/dot-expander.md create mode 100644 _api-reference/ingest-apis/processors-list/drop.md create mode 100644 _api-reference/ingest-apis/processors-list/fail.md create mode 100644 _api-reference/ingest-apis/processors-list/foreach.md create mode 100644 _api-reference/ingest-apis/processors-list/geoip.md create mode 100644 _api-reference/ingest-apis/processors-list/geojson-feature.md create mode 100644 _api-reference/ingest-apis/processors-list/grok.md create mode 100644 _api-reference/ingest-apis/processors-list/gsub.md create mode 100644 _api-reference/ingest-apis/processors-list/html-strip.md create mode 100644 _api-reference/ingest-apis/processors-list/join.md create mode 100644 _api-reference/ingest-apis/processors-list/json.md create mode 100644 _api-reference/ingest-apis/processors-list/kv.md create mode 100644 _api-reference/ingest-apis/processors-list/lowercase.md create mode 100644 _api-reference/ingest-apis/processors-list/pipeline.md create mode 100644 _api-reference/ingest-apis/processors-list/remove.md create mode 100644 _api-reference/ingest-apis/processors-list/rename.md create mode 100644 _api-reference/ingest-apis/processors-list/script.md create mode 100644 _api-reference/ingest-apis/processors-list/set.md create mode 100644 _api-reference/ingest-apis/processors-list/sort.md create mode 100644 _api-reference/ingest-apis/processors-list/split.md create mode 100644 _api-reference/ingest-apis/processors-list/text-embedding.md create mode 100644 _api-reference/ingest-apis/processors-list/trim.md create mode 100644 _api-reference/ingest-apis/processors-list/uppercase.md create mode 100644 _api-reference/ingest-apis/processors-list/urldecode.md create mode 100644 _api-reference/ingest-apis/processors-list/user-agent.md diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/csv.md b/_api-reference/ingest-apis/processors-list/csv.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/date-index-name.md b/_api-reference/ingest-apis/processors-list/date-index-name.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/date.md b/_api-reference/ingest-apis/processors-list/date.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/dissect.md b/_api-reference/ingest-apis/processors-list/dissect.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/dot-expander.md b/_api-reference/ingest-apis/processors-list/dot-expander.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/drop.md b/_api-reference/ingest-apis/processors-list/drop.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/fail.md b/_api-reference/ingest-apis/processors-list/fail.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/foreach.md b/_api-reference/ingest-apis/processors-list/foreach.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/geoip.md b/_api-reference/ingest-apis/processors-list/geoip.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/geojson-feature.md b/_api-reference/ingest-apis/processors-list/geojson-feature.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/grok.md b/_api-reference/ingest-apis/processors-list/grok.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/gsub.md b/_api-reference/ingest-apis/processors-list/gsub.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/html-strip.md b/_api-reference/ingest-apis/processors-list/html-strip.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/join.md b/_api-reference/ingest-apis/processors-list/join.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/json.md b/_api-reference/ingest-apis/processors-list/json.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/kv.md b/_api-reference/ingest-apis/processors-list/kv.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/lowercase.md b/_api-reference/ingest-apis/processors-list/lowercase.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/pipeline.md b/_api-reference/ingest-apis/processors-list/pipeline.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/remove.md b/_api-reference/ingest-apis/processors-list/remove.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/rename.md b/_api-reference/ingest-apis/processors-list/rename.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/script.md b/_api-reference/ingest-apis/processors-list/script.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/set.md b/_api-reference/ingest-apis/processors-list/set.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/sort.md b/_api-reference/ingest-apis/processors-list/sort.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/split.md b/_api-reference/ingest-apis/processors-list/split.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/text-embedding.md b/_api-reference/ingest-apis/processors-list/text-embedding.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/trim.md b/_api-reference/ingest-apis/processors-list/trim.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/uppercase.md b/_api-reference/ingest-apis/processors-list/uppercase.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/urldecode.md b/_api-reference/ingest-apis/processors-list/urldecode.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/_api-reference/ingest-apis/processors-list/user-agent.md b/_api-reference/ingest-apis/processors-list/user-agent.md new file mode 100644 index 0000000000..e69de29bb2 From 3058dd5a30f270064c24abca0a692e5a7fd7a01c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:03:54 -0600 Subject: [PATCH 020/286] Write append processor content Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/append.md | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index e69de29bb2..1e25afa68b 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -0,0 +1,62 @@ +--- +layout: default +title: Append +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 10 +--- + +# Append + +The append ingest processor enriches incoming data during the ingestion process by appending additonal fields or values to each document. The append processor operates on a per-dcoument basis, meaning it processes each incoming doucment individually. Learn how to use the append processor in your data processing workflows in the following documentation. + +## Getting started + +To use the append processor, make sure you have the necessary permissions and access rights to configure and deploy ingest processors. + +## Configuration + +The append processor requires the following configuration parameters to specify the field or value to append to incomming documents: + +- **Field name**: Specify the name of the field where the additional data should be appended. +- **Value**: Specify the value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. + +Optional configuration parameters include: + +__ + +## Usage examples + +Some usage examples include the following. + +#### Example: Appending timestamps + +To add a timestamp field to each incoming document, use the following configuration: + +```json +processors: + - append: + field: timestamp + value: "{{_ingest.timestamp}}" +``` + +In the example, the `timestamp` field is appended with the current timestamp when the document is ingested. The `{{_ingest.timestamp}}` expression retrieves the current timestamp provided by the ingestion framework. + +#### Example: Enriching with geolocation data + +To enrich documents with geolocation information based on IP addresses, use the following configuration: + +```json +processors: + - append: + field: location + value: "{{geoip.ip}}" +``` + +In the example, the `location` field is appended with the IP address extracted from the `geoip` field. The `{{geoip.ip}}` expression retrieves the IP address information from the existing `geoip` field. + +## Best practices + +- Data validation: Make sure the values being appended are valid and compatible with the target field's data type and format. +- Efficiency: Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. +- Error handling: Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. \ No newline at end of file From 83a5fadc2fc16ae1d7f954aa4c8886e522a3ce30 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:24:53 -0600 Subject: [PATCH 021/286] Revert "fixsnapshottopics (#4332)" This reverts commit 0e31fb58e8d14057e09bd3030a788cf35d07c2e3. Signed-off-by: Melissa Vagi --- .../availability-and-recovery/snapshots/searchable_snapshot.md | 2 +- .../availability-and-recovery/snapshots/sm-api.md | 2 +- .../availability-and-recovery/snapshots/snapshot-management.md | 2 +- .../availability-and-recovery/snapshots/snapshot-restore.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 2de6a32c75..a28b4d9c58 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -3,7 +3,7 @@ layout: default title: Searchable snapshots parent: Snapshots nav_order: 40 -grand_parent: Availability and recovery +grand_parent: Availability and Recovery redirect_from: - /opensearch/snapshots/searchable_snapshot/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md index 3f059fa970..1bb5b87cb3 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md @@ -4,7 +4,7 @@ title: Snapshot management API parent: Snapshots nav_order: 30 has_children: false -grand_parent: Availability and recovery +grand_parent: Availability and Recovery redirect_from: - /opensearch/snapshots/sm-api/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md index b557111e49..9a25b28683 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md @@ -4,7 +4,7 @@ title: Snapshot management parent: Snapshots nav_order: 20 has_children: false -grand_parent: Availability and recovery +grand_parent: Availability and Recovery redirect_from: - /opensearch/snapshots/snapshot-management/ --- diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md index 2a79e7adbc..7889c3a018 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md @@ -4,7 +4,7 @@ title: Take and restore snapshots parent: Snapshots nav_order: 10 has_children: false -grand_parent: Availability and recovery +grand_parent: Availability and Recovery redirect_from: - /opensearch/snapshots/snapshot-restore/ - /opensearch/snapshot-restore/ From abedab44132b15d054ab4dacc59891ad7962117c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:03 -0600 Subject: [PATCH 022/286] Revert "fixsnapshot (#4330)" This reverts commit c43161cfb26a05ff8c50f67fd1472dee6403eb78. Signed-off-by: Melissa Vagi --- .../availability-and-recovery/snapshots/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/index.md b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md index 32b46a92ff..3fde2804b7 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/index.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md @@ -3,7 +3,7 @@ layout: default title: Snapshots nav_order: 5 has_children: true -parent: Availability and recovery +parent: Availability and Recovery redirect_from: - /opensearch/snapshots/ - /opensearch/snapshots/index/ From 5915dbf28a794a777e9e8654d603ccc05dc54403 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:09 -0600 Subject: [PATCH 023/286] Revert "Add hyperparameters and undeploy to vocabulary (#4328)" This reverts commit aff91cb361f04a54081ce193b96d9f2a2be12ec4. Signed-off-by: Melissa Vagi --- .github/vale/styles/Vocab/OpenSearch/Words/accept.txt | 2 -- 1 file changed, 2 deletions(-) diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 26b9d99616..ae2c16d12b 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -35,7 +35,6 @@ GeoHex gibibyte [Hh]ashmap [Hh]ostname -[Hh]yperparameters [Ii]mpactful [Ii]ngress [Ii]nitializer @@ -104,7 +103,6 @@ tebibyte [Uu]nary [Uu]ncheck [Uu]ncomment -[Uu]ndeploy [Uu]nigram [Uu]nnesting [Uu]nrecovered From 566b37a9e0dfb1d14efe7cf0f60bb12ad033f62b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:13 -0600 Subject: [PATCH 024/286] Revert "fix#4315 fix sec config example (#4327)" This reverts commit f8363cf44c3057a3974cf45af4e9f989ce7fb2e0. Signed-off-by: Melissa Vagi --- _security/configuration/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index 5226f6733d..77018bccfe 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -16,7 +16,7 @@ The main configuration file for authentication and authorization backends is `co `config.yml` has three main parts: ```yml -config: +opensearch_security: dynamic: http: ... From a99126d644fbb1d6577df5af462cc8efd5a84068 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:16 -0600 Subject: [PATCH 025/286] Revert "Use a different user-agent. (#4313)" This reverts commit 10058d116470b5a11a77134f432d439915d73153. Signed-off-by: Melissa Vagi --- _plugins/link-checker.rb | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/_plugins/link-checker.rb b/_plugins/link-checker.rb index 5dfd53c9f1..25f1c6e7af 100644 --- a/_plugins/link-checker.rb +++ b/_plugins/link-checker.rb @@ -104,8 +104,7 @@ def self.init(site) @external_link_checker = LinkChecker::Typhoeus::Hydra::Checker.new( logger: Jekyll.logger, hydra: { max_concurrency: 2 }, - retries: 3, - user_agent: 'OpenSearch Documentation Website Link Checker/1.0' + retries: 3 ) @external_link_checker.on :failure, :error do |result| From 8aeb88f00c26bcdab32d02ef617a40c96c2c5e11 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:21 -0600 Subject: [PATCH 026/286] Revert "Fix links for link checker (#4309)" This reverts commit 6aab0ca73cebb66d95953ecd418d1f48caaf04b4. Signed-off-by: Melissa Vagi --- _clients/ruby.md | 2 +- _plugins/link-checker.rb | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/_clients/ruby.md b/_clients/ruby.md index 7d582927c6..59fa413a6c 100644 --- a/_clients/ruby.md +++ b/_clients/ruby.md @@ -634,7 +634,7 @@ puts MultiJson.dump(response, pretty: "true") # Ruby AWS Sigv4 Client -The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: +The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby/tree/main/opensearch-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: ```ruby require 'opensearch-aws-sigv4' diff --git a/_plugins/link-checker.rb b/_plugins/link-checker.rb index 25f1c6e7af..0c7df2b116 100644 --- a/_plugins/link-checker.rb +++ b/_plugins/link-checker.rb @@ -55,7 +55,6 @@ module Jekyll::LinkChecker 'playground.opensearch.org', # inifite redirect, https://github.com/opensearch-project/dashboards-anywhere/issues/172 'crates.io', # 404s on bots 'www.cloudflare.com', # 403s on bots - 'example.issue.link', # a fake example link from the template ] ## From edd3aff6183e8afc5552ff71d4cf3eecdfc3192c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:24 -0600 Subject: [PATCH 027/286] Revert "Documents the Data Prepper opensearch sink's template_type parameter. (#4290)" This reverts commit ef507578d5614e26efecbeff4e778e076cc5745b. Signed-off-by: Melissa Vagi --- _data-prepper/pipelines/configuration/sinks/opensearch.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md index 81ebe0dbc4..0990e5f7dc 100644 --- a/_data-prepper/pipelines/configuration/sinks/opensearch.md +++ b/_data-prepper/pipelines/configuration/sinks/opensearch.md @@ -66,8 +66,7 @@ insecure | No | Boolean | Whether or not to verify SSL certificates. If set to t proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "<host name or IP>:<port>". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted. index | Conditionally | String | Name of the export index. Applicable and required only when the `index_type` is `custom`. index_type | No | String | This index type tells the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, `management-disabled`. Default value is `custom`. -template_type | No | String | Defines what type of OpenSearch template to use. The available options are `v1` and `index-template`. The default value is `v1`, which uses the original OpenSearch templates available at the `_template` API endpoints. The `index-template` option uses composable [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) which are available through OpenSearch's `_index_template` API. Composable index types offer more flexibility than the default and are necessary when an OpenSearch cluster has already existing index templates. Composable templates are available for all versions of OpenSearch and some later versions of Elasticsearch. -template_file | No | String | The path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file such as `/your/local/template-file.json` when `index_type` is set to `custom`. For an example template file, see [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json). If you supply a template file it must match the template format specified by the `template_type` parameter. +template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (for example, `/your/local/template-file.json`) if `index_type` is `custom`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example. document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (for example, `"my-field"`) if `index_type` is `custom`. dlq_file | No | String | The path to your preferred dead letter queue file (for example, `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster. dlq | No | N/A | DLQ configurations. See [Dead Letter Queues]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/dlq/) for details. If the `dlq_file` option is also available, the sink will fail. From 52314338268564daf1f856c5d3b41d25ce415023 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:28 -0600 Subject: [PATCH 028/286] Revert "updateinfo (#4304)" This reverts commit c84983f2dbe9f2db8810365d31951fbe994ef2d2. Signed-off-by: Melissa Vagi --- _tuning-your-cluster/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_tuning-your-cluster/index.md b/_tuning-your-cluster/index.md index 8172cfff18..5365b88c66 100644 --- a/_tuning-your-cluster/index.md +++ b/_tuning-your-cluster/index.md @@ -120,16 +120,16 @@ node.roles: [] ## Step 3: Bind a cluster to specific IP addresses -`network.bind_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6: +`network_host` defines the IP address used to bind the node. By default, OpenSearch listens on a local host, which limits the cluster to a single node. You can also use `_local_` and `_site_` to bind to any loopback or site-local address, whether IPv4 or IPv6: ```yml -network.bind_host: [_local_, _site_] +network.host: [_local_, _site_] ``` To form a multi-node cluster, specify the IP address of the node: ```yml -network.bind_host: +network.host: ``` Make sure to configure these settings on all of your nodes. From 595a3950c26da34d493423080cd678199342edc0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:31 -0600 Subject: [PATCH 029/286] Revert "Final text for 404 (#4302)" This reverts commit 26b94c7b1354c87924091f9826ea33b111967d43. Signed-off-by: Melissa Vagi --- 404.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/404.md b/404.md index 60a1bc8847..5165c9e449 100644 --- a/404.md +++ b/404.md @@ -8,7 +8,8 @@ nav_exclude: true ## Oops, this isn't the page you're looking for. -Maybe our [home page](https://opensearch.org/docs/latest) or one of the commonly visited pages below will help. If you need further support, please use the feedback feature on the right side of the screen to get in touch. +Maybe our [homepage](https://opensearch.org/docs/latest) +or one of the popular pages listed below can help. - [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) - [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) From 968f1cd72bc92d8391cca8d5569cc0c22d68cd8b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:35 -0600 Subject: [PATCH 030/286] Revert "Updated functions documentation (#4232)" This reverts commit 4b2fd6c4270032861827b37aad3ebae3a35baadf. Signed-off-by: Melissa Vagi --- _search-plugins/sql/functions.md | 292 +++++++++++++++---------------- 1 file changed, 144 insertions(+), 148 deletions(-) diff --git a/_search-plugins/sql/functions.md b/_search-plugins/sql/functions.md index de3b578e1a..e065c80db4 100644 --- a/_search-plugins/sql/functions.md +++ b/_search-plugins/sql/functions.md @@ -18,170 +18,166 @@ The SQL plugin supports the following common functions shared across the SQL and ## Mathematical -| Function | Specification | Example | -|:-----------|:-----------------------------------------------------------------|:-----------------------------------------------| -| `abs` | `abs(number T) -> T` | `SELECT abs(0.5)` | -| `add` | `add(number T, number T) -> T` | `SELECT add(1, 5)` | -| `cbrt` | `cbrt(number T) -> double` | `SELECT cbrt(8)` | -| `ceil` | `ceil(number T) -> T` | `SELECT ceil(0.5)` | -| `conv` | `conv(string T, integer, integer) -> string` | `SELECT conv('2C', 16, 10), conv(1111, 2, 10)` | -| `crc32` | `crc32(string) -> string` | `SELECT crc32('MySQL')` | -| `divide` | `divide(number T, number T) -> T` | `SELECT divide(1, 0.5)` | -| `e` | `e() -> double` | `SELECT e()` | -| `exp` | `exp(number T) -> double` | `SELECT exp(0.5)` | -| `expm1` | `expm1(number T) -> double` | `SELECT expm1(0.5)` | -| `floor` | `floor(number T) -> long` | `SELECT floor(0.5)` | -| `ln` | `ln(number T) -> double` | `SELECT ln(10)` | -| `log` | `log(number T) -> double` or `log(number T, number T) -> double` | `SELECT log(10)`, `SELECT log(2, 16)` | -| `log2` | `log2(number T) -> double` | `SELECT log2(10)` | -| `log10` | `log10(number T) -> double` | `SELECT log10(10)` | -| `mod` | `mod(number T, number T) -> T` | `SELECT mod(2, 3)` | -| `modulus` | `modulus(number T, number T) -> T` | `SELECT modulus(2, 3)` | -| `multiply` | `multiply(number T, number T) -> T` | `SELECT multiply(2, 3)` | -| `pi` | `pi() -> double` | `SELECT pi()` | -| `pow` | `pow(number T, number T) -> double` | `SELECT pow(2, 3)` | -| `power` | `power(number T, number T) -> double` | `SELECT power(2, 3)` | -| `rand` | `rand() -> float` or `rand(number T) -> float` | `SELECT rand()`, `SELECT rand(0.5)` | -| `rint` | `rint(number T) -> double` | `SELECT rint(1.5)` | -| `round` | `round(number T) -> T` or `round(number T, integer) -> T` | `SELECT round(1.5)`, `SELECT round(1.175, 2)` | -| `sign` | `sign(number T) -> integer` | `SELECT sign(1.5)` | -| `signum` | `signum(number T) -> integer` | `SELECT signum(0.5)` | -| `sqrt` | `sqrt(number T) -> double` | `SELECT sqrt(0.5)` | -| `strcmp` | `strcmp(string T, string T) -> integer` | `SELECT strcmp('hello', 'hello world')` | -| `subtract` | `subtract(number T, number T) -> T` | `SELECT subtract(3, 2)` | -| `truncate` | `truncate(number T, number T) -> T` | `SELECT truncate(56.78, 1)` | -| `+` | `number T + number T -> T` | `SELECT 1 + 5` | -| `-` | `number T - number T -> T` | `SELECT 3 - 2` | -| `*` | `number T * number T -> T` | `SELECT 2 * 3` | -| `/` | `number T / number T -> T` | `SELECT 1 / 0.5` | -| `%` | `number T % number T -> T` | `SELECT 2 % 3` | +Function | Specification | Example +:--- | :--- | :--- +abs | `abs(number T) -> T` | `SELECT abs(0.5) FROM my-index LIMIT 1` +add | `add(number T, number) -> T` | `SELECT add(1, 5) FROM my-index LIMIT 1` +cbrt | `cbrt(number T) -> T` | `SELECT cbrt(0.5) FROM my-index LIMIT 1` +ceil | `ceil(number T) -> T` | `SELECT ceil(0.5) FROM my-index LIMIT 1` +conv | `conv(string T, int a, int b) -> T` | `SELECT CONV('12', 10, 16), CONV('2C', 16, 10), CONV(12, 10, 2), CONV(1111, 2, 10) FROM my-index LIMIT 1` +crc32 | `crc32(string T) -> T` | `SELECT crc32('MySQL') FROM my-index LIMIT 1` +divide | `divide(number T, number) -> T` | `SELECT divide(1, 0.5) FROM my-index LIMIT 1` +e | `e() -> double` | `SELECT e() FROM my-index LIMIT 1` +exp | `exp(number T) -> T` | `SELECT exp(0.5) FROM my-index LIMIT 1` +expm1 | `expm1(number T) -> T` | `SELECT expm1(0.5) FROM my-index LIMIT 1` +floor | `floor(number T) -> T` | `SELECT floor(0.5) AS Rounded_Down FROM my-index LIMIT 1` +ln | `ln(number T) -> double` | `SELECT ln(10) FROM my-index LIMIT 1` +log | `log(number T) -> double` or `log(number T, number) -> double` | `SELECT log(10) FROM my-index LIMIT 1` +log2 | `log2(number T) -> double` | `SELECT log2(10) FROM my-index LIMIT 1` +log10 | `log10(number T) -> double` | `SELECT log10(10) FROM my-index LIMIT 1` +mod | `mod(number T, number) -> T` | `SELECT modulus(2, 3) FROM my-index LIMIT 1` +multiply | `multiply(number T, number) -> number` | `SELECT multiply(2, 3) FROM my-index LIMIT 1` +pi | `pi() -> double` | `SELECT pi() FROM my-index LIMIT 1` +pow | `pow(number T) -> T` or `pow(number T, number) -> T` | `SELECT pow(2, 3) FROM my-index LIMIT 1` +power | `power(number T) -> T` or `power(number T, number) -> T` | `SELECT power(2, 3) FROM my-index LIMIT 1` +rand | `rand() -> number` or `rand(number T) -> T` | `SELECT rand(0.5) FROM my-index LIMIT 1` +rint | `rint(number T) -> T` | `SELECT rint(1.5) FROM my-index LIMIT 1` +round | `round(number T) -> T` | `SELECT round(1.5) FROM my-index LIMIT 1` +sign | `sign(number T) -> T` | `SELECT sign(1.5) FROM my-index LIMIT 1` +signum | `signum(number T) -> T` | `SELECT signum(0.5) FROM my-index LIMIT 1` +sqrt | `sqrt(number T) -> T` | `SELECT sqrt(0.5) FROM my-index LIMIT 1` +strcmp | `strcmp(string T, string T) -> T` | `SELECT strcmp('hello', 'hello') FROM my-index LIMIT 1` +subtract | `subtract(number T, number) -> T` | `SELECT subtract(3, 2) FROM my-index LIMIT 1` +truncate | `truncate(number T, number T) -> T` | `SELECT truncate(56.78, 1) FROM my-index LIMIT 1` +/ | `number [op] number -> number` | `SELECT 1 / 100 FROM my-index LIMIT 1` +% | `number [op] number -> number` | `SELECT 1 % 100 FROM my-index LIMIT 1` ## Trigonometric -| Function | Specification | Example | -|:----------|:--------------------------------------|:-----------------------| -| `acos` | `acos(number T) -> double` | `SELECT acos(0.5)` | -| `asin` | `asin(number T) -> double` | `SELECT asin(0.5)` | -| `atan` | `atan(number T) -> double` | `SELECT atan(0.5)` | -| `atan2` | `atan2(number T, number T) -> double` | `SELECT atan2(1, 0.5)` | -| `cos` | `cos(number T) -> double` | `SELECT cos(0.5)` | -| `cosh` | `cosh(number T) -> double` | `SELECT cosh(0.5)` | -| `cot` | `cot(number T) -> double` | `SELECT cot(0.5)` | -| `degrees` | `degrees(number T) -> double` | `SELECT degrees(0.5)` | -| `radians` | `radians(number T) -> double` | `SELECT radians(0.5)` | -| `sin` | `sin(number T) -> double` | `SELECT sin(0.5)` | -| `sinh` | `sinh(number T) -> double` | `SELECT sinh(0.5)` | -| `tan` | `tan(number T) -> double` | `SELECT tan(0.5)` | +Function | Specification | Example +:--- | :--- | :--- +acos | `acos(number T) -> double` | `SELECT acos(0.5) FROM my-index LIMIT 1` +asin | `asin(number T) -> double` | `SELECT asin(0.5) FROM my-index LIMIT 1` +atan | `atan(number T) -> double` | `SELECT atan(0.5) FROM my-index LIMIT 1` +atan2 | `atan2(number T, number) -> double` | `SELECT atan2(1, 0.5) FROM my-index LIMIT 1` +cos | `cos(number T) -> double` | `SELECT cos(0.5) FROM my-index LIMIT 1` +cosh | `cosh(number T) -> double` | `SELECT cosh(0.5) FROM my-index LIMIT 1` +cot | `cot(number T) -> double` | `SELECT cot(0.5) FROM my-index LIMIT 1` +degrees | `degrees(number T) -> double` | `SELECT degrees(0.5) FROM my-index LIMIT 1` +radians | `radians(number T) -> double` | `SELECT radians(0.5) FROM my-index LIMIT 1` +sin | `sin(number T) -> double` | `SELECT sin(0.5) FROM my-index LIMIT 1` +sinh | `sinh(number T) -> double` | `SELECT sinh(0.5) FROM my-index LIMIT 1` +tan | `tan(number T) -> double` | `SELECT tan(0.5) FROM my-index LIMIT 1` ## Date and time Functions marked with * are only available in SQL. -| Function | Specification | Example | -|:---------------------|:---------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| -| `adddate` | `adddate(date, INTERVAL expr unit) -> date` | `SELECT adddate(date('2020-08-26'), INTERVAL 1 hour)` | -| `addtime` | `addtime(date, date) -> date` | `SELECT addtime(date('2008-12-12'), date('2008-12-12'))` | -| `convert_tz` | `convert_tz(date, string, string) -> date` | `SELECT convert_tz('2008-12-25 05:30:00', '+00:00', 'America/Los_Angeles')` | -| `curtime` | `curtime() -> time` | `SELECT curtime()` | -| `curdate` | `curdate() -> date` | `SELECT curdate()` | -| `current_date` | `current_date() -> date` | `SELECT current_date()` | -| `current_time` | `current_time() -> time` | `SELECT current_time()` | -| `current_timestamp` | `current_timestamp() -> date` | `SELECT current_timestamp()` | -| `date` | `date(date) -> date` | `SELECT date('2000-01-02')` | -| `datediff` | `datediff(date, date) -> integer` | `SELECT datediff(date('2000-01-02'), date('2000-01-01'))` | -| `datetime` | `datetime(string) -> datetime` | `SELECT datetime('2008-12-25 00:00:00')` | -| `date_add` | `date_add(date, INTERVAL integer UNIT)` | `SELECT date_add('2020-08-26', INTERVAL 1 HOUR)` | -| `date_format` | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date('2020-08-26'), 'Y')` | -| `date_sub` | `date_sub(date, INTERVAL expr unit) -> date` | `SELECT date_sub(date('2008-01-02'), INTERVAL 31 day)` | -| `dayofmonth` | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date('2001-05-07'))` | -| `day` | `day(date) -> integer` | `SELECT day(date('2020-08-25'))` | -| `dayname` | `dayname(date) -> string` | `SELECT dayname(date('2020-08-26'))` | -| `dayofmonth` | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date('2020-08-26'))` | -| `dayofweek` | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26'))` | -| `dayofyear` | `dayofyear(date) -> integer` | `SELECT dayofyear(date('2020-08-26'))` | -| `dayofweek` | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26'))` | -| `day_of_month`\* | `day_of_month(date) -> integer` | `SELECT day_of_month(date('2020-08-26'))` | -| `day_of_week`\* | `day_of_week(date) -> integer` | `SELECT day_of_week(date('2020-08-26'))` | -| `day_of_year`\* | `day_of_year(date) -> integer` | `SELECT day_of_year(date('2020-08-26'))` | -| `extract`\* | `extract(part FROM date) -> integer` | `SELECT extract(MONTH FROM datetime('2020-08-26 10:11:12'))` | -| `from_days` | `from_days(N) -> integer` | `SELECT from_days(733687)` | -| `from_unixtime` | `from_unixtime(N) -> date` | `SELECT from_unixtime(1220249547)` | -| `get_format` | `get_format(PART, string) -> string` | `SELECT get_format(DATE, 'USA')` | -| `hour` | `hour(time) -> integer` | `SELECT hour(time '01:02:03')` | -| `hour_of_day`\* | `hour_of_day(time) -> integer` | `SELECT hour_of_day(time '01:02:03')` | -| `last_day`\* | `last_day(date) -> integer` | `SELECT last_day(date('2020-08-26'))` | -| `localtime` | `localtime() -> date` | `SELECT localtime()` | -| `localtimestamp` | `localtimestamp() -> date` | `SELECT localtimestamp()` | -| `makedate` | `makedate(double, double) -> date` | `SELECT makedate(1945, 5.9)` | -| `maketime` | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3)` | -| `microsecond` | `microsecond(expr) -> integer` | `SELECT microsecond(time '01:02:03.123456')` | -| `minute` | `minute(expr) -> integer` | `SELECT minute(time '01:02:03')` | -| `minute_of_day`\* | `minute_of_day(expr) -> integer` | `SELECT minute_of_day(time '01:02:03')` | -| `minute_of_hour`\* | `minute_of_hour(expr) -> integer` | `SELECT minute_of_hour(time '01:02:03')` | -| `month` | `month(date) -> integer` | `SELECT month(date('2020-08-26'))` | -| `month_of_year`\* | `month_of_year(date) -> integer` | `SELECT month_of_year(date('2020-08-26'))` | -| `monthname` | `monthname(date) -> string` | `SELECT monthname(date('2020-08-26'))` | -| `now` | `now() -> date` | `SELECT now()` | -| `period_add` | `period_add(integer, integer)` | `SELECT period_add(200801, 2)` | -| `period_diff` | `period_diff(integer, integer)` | `SELECT period_diff(200802, 200703)` | -| `quarter` | `quarter(date) -> integer` | `SELECT quarter(date('2020-08-26'))` | -| `second` | `second(time) -> integer` | `SELECT second(time '01:02:03')` | -| `second_of_minute`\* | `second_of_minute(time) -> integer` | `SELECT second_of_minute(time '01:02:03')` | -| `sec_to_time`\* | `sec_to_time(integer) -> date` | `SELECT sec_to_time(10000)` | -| `subdate` | `subdate(date, INTERVAL expr unit) -> date, datetime` | `SELECT subdate(date('2008-01-02'), INTERVAL 31 day)` | -| `subtime` | `subtime(date, date) -> date` | `SELECT subtime(date('2008-12-12'), date('2008-11-15'))` | -| `str_to_date`\* | `str_to_date(string, format) -> date` | `SELECT str_to_date("01,5,2013", "%d,%m,%Y")` | -| `time` | `time(expr) -> time` | `SELECT time('13:49:00')` | -| `timediff` | `timediff(time, time) -> time` | `SELECT timediff(time('23:59:59'), time('13:00:00'))` | -| `timestamp` | `timestamp(date) -> date` | `SELECT timestamp('2001-05-07 00:00:00')` | -| `timestampadd` | `timestampadd(interval, integer, date) -> date)` | `SELECT timestampadd(DAY, 17, datetime('2000-01-01 00:00:00'))` | -| `timestampdiff` | `timestampdiff(interval, date, date) -> integer` | `SELECT timestampdiff(YEAR, '1997-01-01 00:00:00', '2001-03-06 00:00:00')` | -| `time_format` | `time_format(date, string) -> string` | `SELECT time_format('1998-01-31 13:14:15.012345', '%f %H %h %I %i %p %r %S %s %T')` | -| `time_to_sec` | `time_to_sec(time) -> long` | `SELECT time_to_sec(time '22:23:00')` | -| `to_days` | `to_days(date) -> long` | `SELECT to_days(date '2008-10-07')` | -| `to_seconds` | `to_seconds(date) -> integer` | `SELECT to_seconds(date('2008-10-07'))` | -| `unix_timestamp` | `unix_timestamp(date) -> double` | `SELECT unix_timestamp(timestamp('1996-11-15 17:05:42'))` | -| `utc_date` | `utc_date() -> date` | `SELECT utc_date()` | -| `utc_time` | `utc_time() -> date` | `SELECT utc_time()` | -| `utc_timestamp` | `utc_timestamp() -> date` | `SELECT utc_timestamp()` | -| `week` | `week(date[mode]) -> integer` | `SELECT week(date('2008-02-20'))` | -| `weekofyear` | `weekofyear(date[mode]) -> integer` | `SELECT weekofyear(date('2008-02-20'))` | -| `week_of_year`\* | `week_of_year(date[mode]) -> integer` | `SELECT week_of_year(date('2008-02-20'))` | -| `year` | `year(date) -> integer` | `SELECT year(date('2001-07-05'))` | -| `yearweek`\* | `yearweek(date[mode]) -> integer` | `SELECT yearweek(date('2008-02-20'))` | +Function | Specification | Example +:--- | :--- | :--- +adddate | `adddate(date, INTERVAL expr unit) -> date` | `SELECT adddate(date('2020-08-26'), INTERVAL 1 hour) FROM my-index LIMIT 1` +addtime | `addtime(date, date) -> date` | `SELECT addtime(date('2008-12-12'), date('2008-12-12'))` +convert_tz | `convert_tz(date, string, string) -> date` | `SELECT convert_tz('2008-12-25 05:30:00', '+00:00', 'America/Los_Angeles')` +curtime | `curtime() -> time` | `SELECT curtime()` +curdate | `curdate() -> date` | `SELECT curdate() FROM my-index LIMIT 1` +current_date | `current_date() -> date` | `SELECT current_date() FROM my-index LIMIT 1` +current_time | `current_time() -> time` | `SELECT current_time()` +current_timestamp | `current_timestamp() -> date` | `SELECT current_timestamp() FROM my-index LIMIT 1` +date | `date(date) -> date` | `SELECT date() FROM my-index LIMIT 1` +datediff | `datediff(date, date) -> integer` | `SELECT datediff(date('2000-01-02'), date('2000-01-01'))` +datetime | `datetime(string) -> datetime` | `SELECT datetime('2008-12-25 00:00:00')` +date_add | `date_add(date, INTERVAL integer UNIT)` | `SELECT date_add('2020-08-26'), INTERVAL 1 HOUR)` +date_format | `date_format(date, string) -> string` or `date_format(date, string, string) -> string` | `SELECT date_format(date, 'Y') FROM my-index LIMIT 1` +date_sub | `date_sub(date, INTERVAL expr unit) -> date` | `SELECT date_sub(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` +dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` +day | `day(date) -> integer` | `SELECT day(date('2020-08-25'))` +dayname | `dayname(date) -> string` | `SELECT dayname(date('2020-08-26')) FROM my-index LIMIT 1` +dayofmonth | `dayofmonth(date) -> integer` | `SELECT dayofmonth(date) FROM my-index LIMIT 1` +dayofweek | `dayofweek(date) -> integer` | `SELECT dayofweek(date) FROM my-index LIMIT 1` +dayofyear | `dayofyear(date) -> integer` | `SELECT dayofyear(date('2020-08-26')) FROM my-index LIMIT 1` +dayofweek | `dayofweek(date) -> integer` | `SELECT dayofweek(date('2020-08-26')) FROM my-index LIMIT 1` +day_of_month\* | `day_of_month(date) -> integer` | `SELECT day_of_month(date) FROM my-index LIMIT 1` +day_of_week\* | `day_of_week(date) -> integer` | `SELECT day_of_week(date('2020-08-26')) FROM my-index LIMIT 1` +day_of_year\* | `day_of_year(date) -> integer` | `SELECT day_of_year(date('2020-08-26')) FROM my-index LIMIT 1` +extract\* | `extract(part FROM date) -> integer` | `SELECT extract(MONTH FROM datetime('2020-08-26 10:11:12'))` +from_days | `from_days(N) -> integer` | `SELECT from_days(733687) FROM my-index LIMIT 1` +from_unixtime | `from_unixtime(N) -> date` | `SELECT from_unixtime(1220249547)` +get_format | `get_format(PART, string) -> string` | `SELECT get_format(DATE, 'USA')` +hour | `hour(time) -> integer` | `SELECT hour((time '01:02:03')) FROM my-index LIMIT 1` +hour_of_day\* | `hour_of_day(time) -> integer` | `SELECT hour_of_day((time '01:02:03')) FROM my-index LIMIT 1` +last_day\* | `last_day(date) -> integer` | `SELECT last_day(date('2020-08-26'))` +localtime | `localtime() -> date` | `SELECT localtime() FROM my-index LIMIT 1` +localtimestamp | `localtimestamp() -> date` | `SELECT localtimestamp() FROM my-index LIMIT 1` +makedate | `makedate(double, double) -> date` | `SELECT makedate(1945, 5.9)` +maketime | `maketime(integer, integer, integer) -> date` | `SELECT maketime(1, 2, 3) FROM my-index LIMIT 1` +microsecond | `microsecond(expr) -> integer` | `SELECT microsecond((time '01:02:03.123456')) FROM my-index LIMIT 1` +minute | `minute(expr) -> integer` | `SELECT minute((time '01:02:03')) FROM my-index LIMIT 1` +minute_of_day\* | `minute_of_day(expr) -> integer` | `SELECT minute_of_day((time '01:02:03')) FROM my-index LIMIT 1` +minute_of_hour\* | `minute_of_hour(expr) -> integer` | `SELECT minute_of_hour((time '01:02:03')) FROM my-index LIMIT 1` +month | `month(date) -> integer` | `SELECT month(date) FROM my-index` +month_of_year\* | `month_of_year(date) -> integer` | `SELECT month_of_year(date) FROM my-index` +monthname | `monthname(date) -> string` | `SELECT monthname(date) FROM my-index` +now | `now() -> date` | `SELECT now() FROM my-index LIMIT 1` +period_add | `period_add(integer, integer)` | `SELECT period_add(200801, 2)` +period_diff | `period_diff(integer, integer)` | `SELECT period_diff(200802, 200703)` +quarter | `quarter(date) -> integer` | `SELECT quarter(date('2020-08-26')) FROM my-index LIMIT 1` +second | `second(time) -> integer` | `SELECT second((time '01:02:03')) FROM my-index LIMIT 1` +second_of_minute\* | `second_of_minute(time) -> integer` | `SELECT second_of_minute((time '01:02:03')) FROM my-index LIMIT 1` +sec_to_time\* | `sec_to_time(integer) -> date` | `SELECT sec_to_time(10000)` +subdate | `subdate(date, INTERVAL expr unit) -> date, datetime` | `SELECT subdate(date('2008-01-02'), INTERVAL 31 day) FROM my-index LIMIT 1` +subtime | `subtime(date, date) -> date` | `SELECT subtime(date('2008-12-12'), date('2008-11-15'))` +str_to_date\* | `str_to_date(string, format) -> date` | `SELECT str_to_date("March 10 2000", %M %d %Y")` +time | `time(expr) -> time` | `SELECT time('13:49:00') FROM my-index LIMIT 1` +timediff | `timediff(time, time) -> time` | `SELECT timediff(time('23:59:59'), time('13:00:00'))` +timestamp | `timestamp(date) -> date` | `SELECT timestamp(date) FROM my-index LIMIT 1` +timestampadd | `timestampadd(interval, integer, date) -> date)` | `SELECT timestampadd(DAY, 17, datetime('2000-01-01 00:00:00'))` +timestampdiff | `timestampdiff(interval, date, date) -> integer` | `SELECT timestampdiff(YEAR, '1997-01-01 00:00:00, '2001-03-06 00:00:00')` +time_format | `time_format(date, string) -> string` | `SELECT time_format('1998-01-31 13:14:15.012345', '%f %H %h %I %i %p %r %S %s %T')` +time_to_sec | `time_to_sec(time) -> long` | `SELECT time_to_sec(time '22:23:00') FROM my-index LIMIT 1` +to_days | `to_days(date) -> long` | `SELECT to_days(date '2008-10-07') FROM my-index LIMIT 1` +to_seconds | `to_seconds(date) -> integer` | `SELECT to_seconds(date('2008-10-07')` +unix_timestamp | `unix_timestamp(date) -> double` | `SELECT unix_timestamp(timestamp('1996-11-15 17:05:42'))` +utc_date | `utc_date() -> date` | `SELECT utc_date()` +utc_time | `utc_time() -> date` | `SELECT utc_time()` +utc_timestamp | `utc_timestamp() -> date` | `SELECT utc_timestamp()` +week | `week(date[mode]) -> integer` | `SELECT week(date('2008-02-20')) FROM my-index LIMIT 1` +weekofyear | `weekofyear(date[mode]) -> integer` | `SELECT weekofyear(date('2008-02-20')) FROM my-index LIMIT 1` +week_of_year\* | `week_of_year(date[mode]) -> integer` | `SELECT week_of_year(date('2008-02-20')) FROM my-index LIMIT 1` +year | `year(date) -> integer` | `SELECT year(date) FROM my-index LIMIT 1` +yearweek\* | `yearweek(date[mode]) -> integer` | `SELECT yearweek(date('2008-02-20')) FROM my-index LIMIT 1` ## String -| Function | Specification | Example | -|:------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------| -| `ascii` | `ascii(string) -> integer` | `SELECT ascii('h')` | -| `concat` | `concat(string, string) -> string` | `SELECT concat('hello', 'world')` | -| `concat_ws` | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws(" ", "Hello", "World!")` | -| `left` | `left(string, integer) -> string` | `SELECT left('hello', 2)` | -| `length` | `length(string) -> integer` | `SELECT length('hello')` | -| `locate` | `locate(string, string, integer) -> integer` or `locate(string, string) -> integer` | `SELECT locate('o', 'hello')`, `locate('l', 'hello world', 5)` | -| `replace` | `replace(string, string, string) -> string` | `SELECT replace('hello', 'l', 'x')` | -| `right` | `right(string, integer) -> string` | `SELECT right('hello', 2)` | -| `rtrim` | `rtrim(string) -> string` | `SELECT rtrim('hello ')` | -| `substring` | `substring(string, integer, integer) -> string` | `SELECT substring('hello', 2, 4)` | -| `trim` | `trim(string) -> string` | `SELECT trim(' hello')` | -| `upper` | `upper(string) -> string` | `SELECT upper('hello world')` | +Function | Specification | Example +:--- | :--- | :--- +ascii | `ascii(string T) -> integer` | `SELECT ascii(name.keyword) FROM my-index LIMIT 1` +concat | `concat(str1, str2) -> string` | `SELECT concat('hello', 'world') FROM my-index LIMIT 1` +concat_ws | `concat_ws(separator, string, string…) -> string` | `SELECT concat_ws("-", "Tutorial", "is", "fun!") FROM my-index LIMIT 1` +left | `left(string T, integer) -> T` | `SELECT left('hello', 2) FROM my-index LIMIT 1` +length | `length(string) -> integer` | `SELECT length('hello') FROM my-index LIMIT 1` +locate | `locate(string, string, integer) -> integer` or `locate(string, string) -> INTEGER` | `SELECT locate('o', 'hello') FROM my-index LIMIT 1`, `SELECT locate('l', 'hello', 3) FROM my-index LIMIT 1` +replace | `replace(string T, string, string) -> T` | `SELECT replace('hello', 'l', 'x') FROM my-index LIMIT 1` +right | `right(string T, integer) -> T` | `SELECT right('hello', 1) FROM my-index LIMIT 1` +rtrim | `rtrim(string T) -> T` | `SELECT rtrim(name.keyword) FROM my-index LIMIT 1` +substring | `substring(string T, integer, integer) -> T` | `SELECT substring(name.keyword, 2,5) FROM my-index LIMIT 1` +trim | `trim(string T) -> T` | `SELECT trim(' hello') FROM my-index LIMIT 1` +upper | `upper(string T) -> T` | `SELECT upper('helloworld') FROM my-index LIMIT 1` ## Aggregate -| Function | Specification | Example | -|:---------|:-------------------------|:-----------------------------------| -| `avg` | `avg(number T) -> T` | `SELECT avg(column) FROM my-index` | -| `count` | `count(number T) -> T` | `SELECT count(date) FROM my-index` | -| `min` | `min(number T) -> T` | `SELECT min(column) FROM my-index` | -| `show` | `show(string) -> string` | `SHOW TABLES LIKE my-index` | +Function | Specification | Example +:--- | :--- | :--- +avg | `avg(number T) -> T` | `SELECT avg(2, 3) FROM my-index LIMIT 1` +count | `count(number T) -> T` | `SELECT count(date) FROM my-index LIMIT 1` +min | `min(number T, number) -> T` | `SELECT min(2, 3) FROM my-index LIMIT 1` +show | `show(string T) -> T` | `SHOW TABLES LIKE my-index` ## Advanced -| Function | Specification | Example | -|:---------|:-------------------------------------------|:----------------------------------------| -| `if` | `if(boolean, os_type, os_type) -> os_type` | `SELECT if(false, 0, 1),if(true, 0, 1)` | -| `ifnull` | `ifnull(os_type, os_type) -> os_type` | `SELECT ifnull(0, 1), ifnull(null, 1)` | -| `isnull` | `isnull(os_type) -> integer` | `SELECT isnull(null), isnull(1)` | +Function | Specification | Example +:--- | :--- | :--- +if | `if(boolean, es_type, es_type) -> es_type` | `SELECT if(false, 0, 1) FROM my-index LIMIT 1`, `SELECT if(true, 0, 1) FROM my-index LIMIT 1` +ifnull | `ifnull(es_type, es_type) -> es_type` | `SELECT ifnull('hello', 1) FROM my-index LIMIT 1`, `SELECT ifnull(null, 1) FROM my-index LIMIT 1` +isnull | `isnull(es_type) -> integer` | `SELECT isnull(null) FROM my-index LIMIT 1`, `SELECT isnull(1) FROM my-index LIMIT 1` ## Relevance-based search (full-text search) From 00672a6eecc3a1a0a400fc1087191bf4a49a7d03 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:38 -0600 Subject: [PATCH 031/286] Revert "Add documentation for API rate limiting (#4287)" This reverts commit aebbf6a0ca23c6e5974f8551ffcc2e5e12be6fa4. Signed-off-by: Melissa Vagi --- _security/configuration/configuration.md | 79 +----------------------- 1 file changed, 1 insertion(+), 78 deletions(-) diff --git a/_security/configuration/configuration.md b/_security/configuration/configuration.md index 77018bccfe..3ca2e607fd 100755 --- a/_security/configuration/configuration.md +++ b/_security/configuration/configuration.md @@ -136,84 +136,7 @@ In most cases, you set the `challenge` flag to `true`. The flag defines the beha If `challenge` is set to `true`, the Security plugin sends a response with status `UNAUTHORIZED` (401) back to the client. If the client is accessing the cluster with a browser, this triggers the authentication dialog box, and the user is prompted to enter a user name and password. -If `challenge` is set to `false` and no `Authorization` header field is set, the Security plugin does not send a `WWW-Authenticate` response back to the client, and authentication fails. Consider using this setting if you have more than one challenge `http_authenticator` keys in your configured authentication domains. This might be the case, for example, when you plan to use basic authentication and OpenID Connect together. - - -## API rate limiting - -API rate limiting is typically used to restrict the number of API calls that users can make in a set span of time, thereby helping to manage the rate of API traffic. For security purposes, rate limiting features have the potential to defend against DoS attacks, or repeated login attempts to gain access through trial and error, by restricting failed login attempts. - -You have the option to configure the Security plugin for username rate limiting, IP address rate limiting, or both. These configurations are made in the `config.yml` file. See the following sections for information about each type of rate limiting configuration. - - -### Username rate limiting - -This configuration limits login attempts by username. When a login fails, the username is blocked for any machine in the network. The following example shows `config.yml` file settings configured for username rate limiting: - -```yml -auth_failure_listeners: - internal_authentication_backend_limiting: - type: username - authentication_backend: internal - allowed_tries: 3 - time_window_seconds: 60 - block_expiry_seconds: 60 - max_blocked_clients: 100000 - max_tracked_clients: 100000 -``` -{% include copy.html %} - -The following table describes the individual settings for this type of configuration. - -| Setting | Description | -| :--- | :--- | -| `type` | The type of rate limiting. In this case, `username`. | -| `authentication_backend` | The internal backend. Enter `internal`. | -| `allowed_tries` | The number of login attempts allowed before login is blocked. Be aware that increasing the number increases heap usage. | -| `time_window_seconds` | The window of time in which the value for `allowed_tries` is enforced. For example, if `allowed_tries` is `3` and `time_window_seconds` is `60`, a username has three attempts to log in successfully within a 60-second time span before login is blocked. | -| `block_expiry_seconds` | The duration of time that login remains blocked after a failed login. After this time elapses, login is reset and the username can attempt successful login again. | -| `max_blocked_clients` | The maximum number of blocked usernames. This limits heap usage to avoid a potential DoS. | -| `max_tracked_clients` | The maximum number of tracked usernames that have failed login. This limits heap usage to avoid a potential DoS. | - - -### IP address rate limiting - -This configuration limits login attempts by IP address. When a login fails, the IP address specific to the machine being used for login is blocked. - -There are two steps for configuring IP address rate limiting. First, set the `challenge` setting to `false` in the `http_authenticator` section of the `config.yml` file. - -```yml -http_authenticator: - type: basic - challenge: false -``` - -For more information about this setting, see [HTTP basic authentication](#http-basic-authentication). - -Second, configure the IP address rate limiting settings. The following example shows a completed configuration: - -```yml -auth_failure_listeners: - ip_rate_limiting: - type: ip - allowed_tries: 1 - time_window_seconds: 20 - block_expiry_seconds: 180 - max_blocked_clients: 100000 - max_tracked_clients: 100000 -``` -{% include copy.html %} - -The following table describes the individual settings for this type of configuration. - -| Setting | Description | -| :--- | :--- | -| `type` | The type of rate limiting. In this case, `ip`. | -| `allowed_tries` | The number of login attempts allowed before login is blocked. Be aware that increasing the number increases heap usage. | -| `time_window_seconds` | The window of time in which the value for `allowed_tries` is enforced. For example, if `allowed_tries` is `3` and `time_window_seconds` is `60`, an IP address has three attempts to log in successfully within a 60-second time span before login is blocked. | -| `block_expiry_seconds` | The duration of time that login remains blocked after a failed login. After this time elapses, login is reset and the IP address can attempt successful login again. | -| `max_blocked_clients` | The maximum number of blocked IP addresses. This limits heap usage to avoid a potential DoS. | -| `max_tracked_clients` | The maximum number of tracked IP addresses that have failed login. This limits heap usage to avoid a potential DoS. | +If `challenge` is set to `false` and no `Authorization` header field is set, the Security plugin does not send a `WWW-Authenticate` response back to the client, and authentication fails. You might want to use this setting if you have another challenge `http_authenticator` in your configured authentication domains. One such scenario is when you plan to use basic authentication and OpenID Connect together. ## Backend configuration examples From 45ff2496e7a1fd587373f36b0c900ded6daac684 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:42 -0600 Subject: [PATCH 032/286] Revert "Add info to enable search pipelines (#4297)" This reverts commit 2fad24fccdcc209c04da7f81a806b8e553477bf0. Signed-off-by: Melissa Vagi --- _search-plugins/search-pipelines/index.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index 1aec24b864..5a557cff8b 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -13,15 +13,6 @@ This is an experimental feature and is not recommended for use in a production e You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application. -## Enabling search pipelines - -Search pipeline functionality is disabled by default. To enable it, edit the configuration in `opensearch.yml` and then restart your cluster: - -1. Navigate to the OpenSearch config directory. -1. Open the `opensearch.yml` configuration file. -1. Add `opensearch.experimental.feature.search_pipeline.enabled: true` and save the configuration file. -1. Restart your cluster. - ## Terminology The following is a list of search pipeline terminology: From 9aebc8219360d337ae71f97094c687e70c5a8e74 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:45 -0600 Subject: [PATCH 033/286] Revert "Add redirect for ML Dashboard (#4294)" This reverts commit 45e4d3d188a57084ed7eec13cba108d4eb5fd784. Signed-off-by: Melissa Vagi --- _ml-commons-plugin/ml-dashboard.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/_ml-commons-plugin/ml-dashboard.md b/_ml-commons-plugin/ml-dashboard.md index 31f919a28c..11d6e12c40 100644 --- a/_ml-commons-plugin/ml-dashboard.md +++ b/_ml-commons-plugin/ml-dashboard.md @@ -2,8 +2,6 @@ layout: default title: Managing ML models in OpenSearch Dashboards nav_order: 120 -redirect_from: - - /ml-commons-plugin/ml-dashbaord/ --- Released in OpenSearch 2.6, the machine learning (ML) functionality in OpenSearch Dashboards is experimental and can't be used in a production environment. For updates or to leave feedback, see the [OpenSearch Forum discussion](https://forum.opensearch.org/t/feedback-ml-commons-ml-model-health-dashboard-for-admins-experimental-release/12494). From 594da071cc123121150cb7b339a42b0fe73c61be Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:48 -0600 Subject: [PATCH 034/286] Revert "Updated the 404 message (#4288)" This reverts commit 0cfc0b10da8d961ccf4170b182a5a5c81b0148f4. Signed-off-by: Melissa Vagi --- 404.md | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/404.md b/404.md index 5165c9e449..498a540ce8 100644 --- a/404.md +++ b/404.md @@ -6,15 +6,6 @@ heading_anchors: false nav_exclude: true --- -## Oops, this isn't the page you're looking for. - -Maybe our [homepage](https://opensearch.org/docs/latest) -or one of the popular pages listed below can help. - -- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/) -- [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) -- [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) -- [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/) -- [API Reference]({{site.url}}{{site.baseurl}}/api-reference/index/) - +# OpenSearch cannot find that page. +Perhaps we moved something around, or you mistyped the URL? Try using search or go to the [OpenSearch Documentation home page](https://opensearch.org/docs/latest/). If you need further help, see the [OpenSearch community forum](https://forum.opensearch.org/). From e48e4d8dd46b40270f9b8d56396110196d943a85 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:25:51 -0600 Subject: [PATCH 035/286] Revert "Add documentation for score based password estimator settings (#4267)" This reverts commit cb975521c3aacf8a1241c38072dbbe67c5268284. Signed-off-by: Melissa Vagi --- _security/configuration/yaml.md | 46 ++++----------------------------- 1 file changed, 5 insertions(+), 41 deletions(-) diff --git a/_security/configuration/yaml.md b/_security/configuration/yaml.md index 1d10e50268..e7a34a07de 100644 --- a/_security/configuration/yaml.md +++ b/_security/configuration/yaml.md @@ -120,22 +120,6 @@ plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opend node.max_local_storage_nodes: 3 ``` -### Refining your configuration - -The `plugins.security.allow_default_init_securityindex` setting, when set to `true`, sets the Security plugin to its default security settings if an attempt to create the security index fails when OpenSearch launches. Default security settings are stored in YAML files contained in the `opensearch-project/security/config` directory. By default, this setting is `false`. - -```yml -plugins.security.allow_default_init_securityindex: true -``` - -An authentication cache for the Security plugin exists to help speed up authentication by temporarily storing user objects returned from the backend so that the Security plugin is not required to make repeated requests for them. To determine how long it takes for caching to time out, you can use the `plugins.security.cache.ttl_minutes` property to set a value in minutes. The default is `60`. You can disable caching by setting the value to `0`. - -```yml -plugins.security.cache.ttl_minutes: 60 -``` - -### Password settings - If you want to run your users' passwords against some validation, specify a regular expression (regex) in this file. You can also include an error message that loads when passwords don't pass validation. The following example demonstrates how to include a regex so OpenSearch requires new passwords to be a minimum of eight characters with at least one uppercase, one lowercase, one digit, and one special character. Note that OpenSearch validates only users and passwords created through OpenSearch Dashboards or the REST API. @@ -145,36 +129,16 @@ plugins.security.restapi.password_validation_regex: '(?=.*[A-Z])(?=.*[^a-zA-Z\d] plugins.security.restapi.password_validation_error_message: "Password must be minimum 8 characters long and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character." ``` -In addition, a score-based password strength estimator allows you to set a threshold for password strength when creating a new internal user or updating a user's password. This feature makes use of the [zxcvbn library](https://github.com/dropbox/zxcvbn) to apply a policy that emphasizes a password's complexity rather than its capacity to meet traditional criteria such as uppercase keys, numerals, and special characters. - -For information about creating users, see [Create users]({{site.url}}{{site.baseurl}}/security/access-control/users-roles/#create-users). - -This feature is not compatible with users specified as reserved. For information about reserved resources, see [Reserved and hidden resources]({{site.url}}{{site.baseurl}}/security/access-control/api#reserved-and-hidden-resources). -{: .important } - -Score-based password strength requires two settings to configure the feature. The following table describes the two settings. - -| Setting | Description | -| :--- | :--- | -| `plugins.security.restapi.password_min_length` | Sets the minimum number of characters for the password length. The default is `8`. This is also the minimum. | -| `plugins.security.restapi.password_score_based_validation_strength` | Sets a threshold to determine whether the password is strong or weak. There are four values that represent a threshold's increasing complexity.
`fair`--A very "guessable" password: provides protection from throttled online attacks.
`good`--A somewhat guessable password: provides protection from unthrottled online attacks.
`strong`--A safely "unguessable" password: provides moderate protection from an offline, slow-hash scenario.
`very_strong`--A very unguessable password: provides strong protection from an offline, slow-hash scenario. | - -The following example shows the settings configured for the `opensearch.yml` file and enabling a password with a minimum of 10 characters and a threshold requiring the highest strength: +The opensearch.yml file also contains the `plugins.security.allow_default_init_securityindex` property. When set to `true`, the Security plugin uses default security settings if an attempt to create the security index fails when OpenSearch launches. Default security settings are stored in YAML files contained in the `opensearch-project/security/config` directory. By default, this setting is `false`. ```yml -plugins.security.restapi.password_min_length: 10 -plugins.security.restapi.password_score_based_validation_strength: very_strong +plugins.security.allow_default_init_securityindex: true ``` -When you try to create a user with a password that doesn't reach the specified threshold, the system generates a "weak password" warning, indicating that the password needs to be modified before you can save the user. - -The following example shows the response from the [Create user]({{site.url}}{{site.baseurl}}/security/access-control/api/#create-user) API when the password is weak: +Authentication cache for the Security plugin exists to help speed up authentication by temporarily storing user objects returned from the backend so that the Security plugin is not required to make repeated requests for them. To determine how long it takes for caching to time out, you can use the `plugins.security.cache.ttl_minutes` property to set a value in minutes. The default is `60`. You can disable caching by setting the value to `0`. -```json -{ - "status": "error", - "reason": "Weak password" -} +```yml +plugins.security.cache.ttl_minutes: 60 ``` ## allowlist.yml From ae385797873837f40d3f4ce2623ad3996c949126 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 14 Jun 2023 18:35:46 -0600 Subject: [PATCH 036/286] Writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- _api-reference/ingest-apis/processors-list/append.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 95b0a6da58..255567e6cf 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -9,7 +9,7 @@ nav_order: 50 Ingest processors have a crucial role in preparing and enriching data before it is stored and analyzed and improving data quality and usability. They are a set of functionalities or operations applied to incoming data during the ingestion process and allow for real-time data transformation, manipulation, and enrichment. -Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape data as it enters a system, making it more suitable for downstream operations such as indexing, analysis, or storage. They have a range of capabilities--data extraction, validation, filtering, enrichment, and normalization--that can be performed on different aspects of the data, such as extracting specfic fields, converting data types, removing or modifying unwanted data, or enriching data with additional information. +Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape data as it enters a system, making it more suitable for downstream operations such as indexing, analysis, or storage. They have a range of capabilities--data extraction, validation, filtering, enrichment, and normalization--that can be performed on different aspects of the data, such as extracting specific fields, converting data types, removing or modifying unwanted data, or enriching data with additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 1e25afa68b..6711104ee2 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -8,7 +8,7 @@ nav_order: 10 # Append -The append ingest processor enriches incoming data during the ingestion process by appending additonal fields or values to each document. The append processor operates on a per-dcoument basis, meaning it processes each incoming doucment individually. Learn how to use the append processor in your data processing workflows in the following documentation. +The append ingest processor enriches incoming data during the ingestion process by appending additional fields or values to each document. The append processor operates on a per-dcoument basis, meaning it processes each incoming document individually. Learn how to use the append processor in your data processing workflows in the following documentation. ## Getting started @@ -16,7 +16,7 @@ To use the append processor, make sure you have the necessary permissions and ac ## Configuration -The append processor requires the following configuration parameters to specify the field or value to append to incomming documents: +The append processor requires the following configuration parameters to specify the field or value to append to incoming documents: - **Field name**: Specify the name of the field where the additional data should be appended. - **Value**: Specify the value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. From 49722a857a2eef89ff5377c242483a14655d7443 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 09:28:31 -0600 Subject: [PATCH 037/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/append.md | 72 ++++++++++--------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 6711104ee2..1b8a2c5401 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -14,49 +14,55 @@ The append ingest processor enriches incoming data during the ingestion process To use the append processor, make sure you have the necessary permissions and access rights to configure and deploy ingest processors. -## Configuration +## Configuration parameters -The append processor requires the following configuration parameters to specify the field or value to append to incoming documents: +The append processor requires the following configuration parameters to specify the target field or value to append to incoming documents. -- **Field name**: Specify the name of the field where the additional data should be appended. -- **Value**: Specify the value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`field` | Required | Name of the field where the data should be appended. | +`value` | Required| Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | +`fields` | A list of fields from which to copy values. | +`ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | +`fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. +`allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. +`ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. -Optional configuration parameters include: - -__ - -## Usage examples - -Some usage examples include the following. - -#### Example: Appending timestamps - -To add a timestamp field to each incoming document, use the following configuration: +#### Example: Append configuration ```json -processors: - - append: - field: timestamp - value: "{{_ingest.timestamp}}" +{ + "description": "Appends the current timestamp to the document", + "processors": [ + { + "append": { + "field": "timestamp", + "value": "{{_timestamp}}" + } + } + ] +} ``` -In the example, the `timestamp` field is appended with the current timestamp when the document is ingested. The `{{_ingest.timestamp}}` expression retrieves the current timestamp provided by the ingestion framework. - -#### Example: Enriching with geolocation data - -To enrich documents with geolocation information based on IP addresses, use the following configuration: +#### Example: Adding the append configuration to an ingest pipeline using the REST API ```json -processors: - - append: - field: location - value: "{{geoip.ip}}" +PUT _ingest/pipeline/ +{ + "description": "A pipeline that appends the current timestamp to the document", + "processors": [ + { + "append": { + "field": "timestamp", + "value": "{{_timestamp}}" + } + } + ] +} ``` -In the example, the `location` field is appended with the IP address extracted from the `geoip` field. The `{{geoip.ip}}` expression retrieves the IP address information from the existing `geoip` field. - ## Best practices -- Data validation: Make sure the values being appended are valid and compatible with the target field's data type and format. -- Efficiency: Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. -- Error handling: Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. \ No newline at end of file +- **Data validation:** Make sure the values being appended are valid and compatible with the target field's data type and format. +- **Efficiency:** Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. +- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. \ No newline at end of file From e56cab129c61e84511e3ed0fe9d6bb638124ccf6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 09:57:49 -0600 Subject: [PATCH 038/286] Update append.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/append.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 1b8a2c5401..7fabb2ff8b 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -28,6 +28,8 @@ The append processor requires the following configuration parameters to specify `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. +Following are examples of an append processor configuration and how to add it to an ingest pipeline. + #### Example: Append configuration ```json @@ -65,4 +67,4 @@ PUT _ingest/pipeline/ - **Data validation:** Make sure the values being appended are valid and compatible with the target field's data type and format. - **Efficiency:** Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. -- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. \ No newline at end of file +- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. From 0437b2be56a413df0d85bebedf59b0c9cbe91571 Mon Sep 17 00:00:00 2001 From: adaisley <90253063+adaisley@users.noreply.github.com> Date: Wed, 14 Jun 2023 20:54:35 +0100 Subject: [PATCH 039/286] Update searchable snapshot documentation to be more correct (#4203) * Update searchable_snapshot.md This is more in-line with what is defined in the example Opensearch docker-compose I tried the current approach and it straight-up did not work in it's current form. Signed-off-by: adaisley <90253063+adaisley@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: adaisley <90253063+adaisley@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi --- .../snapshots/searchable_snapshot.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index a28b4d9c58..4b7284daca 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -23,7 +23,7 @@ To configure the searchable snapshots feature, create a node in your opensearch. node.roles: [ search ] ``` -If you're running Docker, you can create a node with the `search` node role by adding the line `- node.roles: [ search ]` to your docker-compose.yml file: +If you're running Docker, you can create a node with the `search` node role by adding the line `- node.roles=search` to your `docker-compose.yml` file: ```bash version: '3' @@ -34,7 +34,7 @@ services: environment: - cluster.name=opensearch-cluster - node.name=opensearch-node1 - - node.roles: [ search ] + - node.roles=search ``` ## Create a searchable snapshot index From 7e410424e704fdede809c7e31ca868b85fd22caa Mon Sep 17 00:00:00 2001 From: William Beckler Date: Fri, 16 Jun 2023 16:46:37 -0400 Subject: [PATCH 040/286] Update plugins.md (#4353) Fixed link to the dashboards plugin instead of the core plugin. Signed-off-by: William Beckler Signed-off-by: Melissa Vagi --- _install-and-configure/install-dashboards/plugins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/install-dashboards/plugins.md b/_install-and-configure/install-dashboards/plugins.md index f937940cb2..73a2f54783 100644 --- a/_install-and-configure/install-dashboards/plugins.md +++ b/_install-and-configure/install-dashboards/plugins.md @@ -39,7 +39,7 @@ The following table lists available OpenSearch Dashboards plugins. | Gantt Chart Dashboards | [gantt-chart](https://github.com/opensearch-project/dashboards-visualizations/tree/main/gantt-chart) | 1.0.0 | | Index Management Dashboards | [index-management-dashboards-plugin](https://github.com/opensearch-project/index-management-dashboards-plugin) | 1.0.0 | | Notebooks Dashboards | [dashboards-notebooks](https://github.com/opensearch-project/dashboards-notebooks) | 1.0.0 | -| Notifications Dashboards | [notifications](https://github.com/opensearch-project/notifications) | 2.0.0 | +| Notifications Dashboards | [dashboards-notifications](https://github.com/opensearch-project/dashboards-notifications) | 2.0.0 | | Observability Dashboards | [dashboards-observability](https://github.com/opensearch-project/dashboards-observability) | 2.0.0 | | Query Workbench Dashboards | [query-workbench](https://github.com/opensearch-project/dashboards-query-workbench) | 1.0.0 | | Reports Dashboards | [dashboards-reporting](https://github.com/opensearch-project/dashboards-reporting) | 1.0.0 | From 698fff61b8374f85c92d0021bc39f3399e7d308f Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 19 Jun 2023 14:24:06 -0400 Subject: [PATCH 041/286] Add date nanoseconds field type (#4348) * Add date nanoseconds field type Signed-off-by: Fanit Kolchina * Fix links Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi Signed-off-by: Melissa Vagi --- .../supported-field-types/date-nanos.md | 290 ++++++++++++++++++ _field-types/supported-field-types/date.md | 3 +- _field-types/supported-field-types/dates.md | 17 + .../supported-field-types/geographic.md | 2 +- _field-types/supported-field-types/index.md | 2 +- .../supported-field-types/object-fields.md | 2 +- _field-types/supported-field-types/rank.md | 2 +- _field-types/supported-field-types/string.md | 2 +- 8 files changed, 314 insertions(+), 6 deletions(-) create mode 100644 _field-types/supported-field-types/date-nanos.md create mode 100644 _field-types/supported-field-types/dates.md diff --git a/_field-types/supported-field-types/date-nanos.md b/_field-types/supported-field-types/date-nanos.md new file mode 100644 index 0000000000..12399a69d4 --- /dev/null +++ b/_field-types/supported-field-types/date-nanos.md @@ -0,0 +1,290 @@ +--- +layout: default +title: Date nanoseconds +nav_order: 35 +has_children: false +parent: Date field types +grand_parent: Supported field types +--- + +# Date nanoseconds field type + +The `date_nanos` field type is similar to the [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) field type in that it holds a date. However, `date` stores the date in millisecond resolution, while `date_nanos` stores the date in nanosecond resolution. Dates are stored as `long` values that correspond to nanoseconds since the epoch. Therefore, the range of supported dates is approximately 1970--2262. + +Queries on `date_nanos` fields are converted to range queries on the field value's `long` representation. Then the stored fields and aggregation results are converted to a string using the format set on the field. + +The `date_nanos` field supports all [formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#formats) and [parameters]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#parameters) that `date` supports. You can use multiple formats separated by `||`. +{: .note} + +For `date_nanos` fields, you can use the `strict_date_optional_time_nanos` format to preserve nanosecond resolution. If you don't specify the format when mapping a field as `date_nanos`, the default format is `strict_date_optional_time||epoch_millis` that lets you pass values in either `strict_date_optional_time` or `epoch_millis` format. The `strict_date_optional_time` format supports dates in nanosecond resolution, but the `epoch_millis` format supports dates in millisecond resolution only. + +## Example + +Create a mapping with the `date` field of type `date_nanos` that has the `strict_date_optional_time_nanos` format: + +```json +PUT testindex/_mapping +{ + "properties": { + "date": { + "type": "date_nanos", + "format" : "strict_date_optional_time_nanos" + } + } +} +``` +{% include copy-curl.html %} + +Index two documents into the index: + +```json +PUT testindex/_doc/1 +{ "date": "2022-06-15T10:12:52.382719622Z" } +``` +{% include copy-curl.html %} + +```json +PUT testindex/_doc/2 +{ "date": "2022-06-15T10:12:52.382719624Z" } +``` +{% include copy-curl.html %} + +You can use a range query to search for a date range: + +```json +GET testindex/_search +{ + "query": { + "range": { + "date": { + "gte": "2022-06-15T10:12:52.382719621Z", + "lte": "2022-06-15T10:12:52.382719623Z" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains the document whose date is in the specified range: + +```json +{ + "took": 43, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": 1, + "_source": { + "date": "2022-06-15T10:12:52.382719622Z" + } + } + ] + } +} +``` + +When querying documents with `date_nanos` fields, you can use `fields` or `docvalue_fields`: + +```json +GET testindex/_search +{ + "fields": ["date"] +} +``` +{% include copy-curl.html %} + +```json +GET testindex/_search +{ + "docvalue_fields" : [ + { + "field" : "date" + } + ] +} +``` +{% include copy-curl.html %} + +The response to either of the preceding queries contains both indexed documents: + +```json +{ + "took": 4, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": 1, + "_source": { + "date": "2022-06-15T10:12:52.382719622Z" + }, + "fields": { + "date": [ + "2022-06-15T10:12:52.382719622Z" + ] + } + }, + { + "_index": "testindex", + "_id": "2", + "_score": 1, + "_source": { + "date": "2022-06-15T10:12:52.382719624Z" + }, + "fields": { + "date": [ + "2022-06-15T10:12:52.382719624Z" + ] + } + } + ] + } +} +``` + +You can sort on a `date_nanos` field as follows: + +```json +GET testindex/_search +{ + "sort": { + "date": "asc" + } +} +``` +{% include copy-curl.html %} + +The response contains the sorted documents: + +```json +{ + "took": 5, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": null, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": null, + "_source": { + "date": "2022-06-15T10:12:52.382719622Z" + }, + "sort": [ + 1655287972382719700 + ] + }, + { + "_index": "testindex", + "_id": "2", + "_score": null, + "_source": { + "date": "2022-06-15T10:12:52.382719624Z" + }, + "sort": [ + 1655287972382719700 + ] + } + ] + } +} +``` + +You can also use a Painless script to access the nanoseconds part of the field: + +```json +GET testindex/_search +{ + "script_fields" : { + "my_field" : { + "script" : { + "lang" : "painless", + "source" : "doc['date'].value.nano" + } + } + } +} +``` +{% include copy-curl.html %} + +The response contains only the nanosecond parts of the fields: + +```json +{ + "took": 4, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "testindex", + "_id": "1", + "_score": 1, + "fields": { + "my_field": [ + 382719622 + ] + } + }, + { + "_index": "testindex", + "_id": "2", + "_score": 1, + "fields": { + "my_field": [ + 382719624 + ] + } + } + ] + } +} +``` \ No newline at end of file diff --git a/_field-types/supported-field-types/date.md b/_field-types/supported-field-types/date.md index ea09311718..da551a1dd1 100644 --- a/_field-types/supported-field-types/date.md +++ b/_field-types/supported-field-types/date.md @@ -3,7 +3,8 @@ layout: default title: Date nav_order: 25 has_children: false -parent: Supported field types +parent: Date field types +grand_parent: Supported field types redirect_from: - /opensearch/supported-field-types/date/ - /field-types/date/ diff --git a/_field-types/supported-field-types/dates.md b/_field-types/supported-field-types/dates.md new file mode 100644 index 0000000000..7c6e47cb60 --- /dev/null +++ b/_field-types/supported-field-types/dates.md @@ -0,0 +1,17 @@ +--- +layout: default +title: Date field types +nav_order: 25 +has_children: true +has_toc: false +parent: Supported field types +--- + +# Date field types + +Date field types contain a date value that can be formatted using different date formats. The following table lists all date field types that OpenSearch supports. + +Field data type | Description +:--- | :--- +[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date stored in millisecond resolution. +[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/) | A date stored in nanosecond resolution. diff --git a/_field-types/supported-field-types/geographic.md b/_field-types/supported-field-types/geographic.md index 07d0382082..cbe3982a4d 100644 --- a/_field-types/supported-field-types/geographic.md +++ b/_field-types/supported-field-types/geographic.md @@ -12,7 +12,7 @@ redirect_from: # Geographic field types -The following table lists all geographic field types that OpenSearch supports. +Geographic fields contain values that represent points or shapes on a map. The following table lists all geographic field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 38b45860ba..3cb8bff8cd 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -19,7 +19,7 @@ Field data type | Description [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/) | A binary value in Base64 encoding. [Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | `byte`, `double`, `float`, `half_float`, `integer`, `long`, `unsigned_long`, `scaled_float`, `short`. [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/) | A Boolean value. -[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date value as a formatted string, a long value, or an integer. +[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/) | `date`, `date_nanos`. [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) | An IP address in IPv4 or IPv6 format. [Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | `integer_range`, `long_range`,`double_range`, `float_range`, `date_range`,`ip_range`. [Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/) | `object`, `nested`, `join`. diff --git a/_field-types/supported-field-types/object-fields.md b/_field-types/supported-field-types/object-fields.md index 64869fc34d..429c5b94c7 100644 --- a/_field-types/supported-field-types/object-fields.md +++ b/_field-types/supported-field-types/object-fields.md @@ -12,7 +12,7 @@ redirect_from: # Object field types -The following table lists all object field types that OpenSearch supports. +Object field types contain values that are objects or relations. The following table lists all object field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/rank.md b/_field-types/supported-field-types/rank.md index c46467f8a5..a4ec0fac4c 100644 --- a/_field-types/supported-field-types/rank.md +++ b/_field-types/supported-field-types/rank.md @@ -23,7 +23,7 @@ Rank feature and rank features fields can be queried with [rank feature queries] ## Rank feature -A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. +A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. ### Example diff --git a/_field-types/supported-field-types/string.md b/_field-types/supported-field-types/string.md index 21cee52dad..f24dea2325 100644 --- a/_field-types/supported-field-types/string.md +++ b/_field-types/supported-field-types/string.md @@ -12,7 +12,7 @@ redirect_from: # String field types -The following table lists all string field types that OpenSearch supports. +String field types contain text values or values derived from text. The following table lists all string field types that OpenSearch supports. Field data type | Description :--- | :--- From 3bb298b2257fb424051eafbbba7394cf727eaeaf Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 19 Jun 2023 15:25:13 -0400 Subject: [PATCH 042/286] Add model access control documentation for ML Commons (#4223) * Add model access control documentation for ML Commons Signed-off-by: Fanit Kolchina * Remove permissions for delete API Signed-off-by: Fanit Kolchina * Add copy buttons Signed-off-by: Fanit Kolchina * Updated model-level APIs Signed-off-by: Fanit Kolchina * Add delete model Signed-off-by: Fanit Kolchina * Reworded role-related text Signed-off-by: Fanit Kolchina * Implemented tech review comments Signed-off-by: Fanit Kolchina * Rewording Signed-off-by: Fanit Kolchina * Remove experimental warning Signed-off-by: Fanit Kolchina * Register a model group in note format Signed-off-by: Fanit Kolchina * Implement tech review comments Signed-off-by: Fanit Kolchina * Resolved Vale comments Signed-off-by: Fanit Kolchina * Remove space Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina * Implemented editorial comments Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add more editorial comments Signed-off-by: Fanit Kolchina * Add more editorial comments Signed-off-by: Fanit Kolchina * Fix links Signed-off-by: Fanit Kolchina * Fix more links Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Co-authored-by: Melissa Vagi Signed-off-by: Melissa Vagi --- _ml-commons-plugin/algorithms.md | 10 +- _ml-commons-plugin/api.md | 231 +++++--- _ml-commons-plugin/index.md | 11 +- _ml-commons-plugin/model-access-control.md | 593 +++++++++++++++++++++ 4 files changed, 771 insertions(+), 74 deletions(-) create mode 100644 _ml-commons-plugin/model-access-control.md diff --git a/_ml-commons-plugin/algorithms.md b/_ml-commons-plugin/algorithms.md index 7fccd92d8b..1db8b432a9 100644 --- a/_ml-commons-plugin/algorithms.md +++ b/_ml-commons-plugin/algorithms.md @@ -27,7 +27,7 @@ distance_type | enum, such as `EUCLIDEAN`, `COSINE`, or `L1` | The type of measu ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -77,7 +77,7 @@ optimizerType | OptimizerType | The optimizer used in the model. | SIMPLE_SGD ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) ### Example @@ -189,7 +189,7 @@ time_zone | string | The time zone for the `time_field` field. | "UTC" ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -211,7 +211,7 @@ RCF Summarize is a clustering algorithm based on the Clustering Using Representa ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -429,7 +429,7 @@ A classification algorithm, logistic regression models the probability of a disc ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) ### Example: Train/Predict with Iris data diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 3d5fe2358e..8e4535eb54 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -9,31 +9,49 @@ nav_order: 99 --- -#### Table of contents +
+ + Table of contents + + {: .text-delta } - TOC {:toc} - +
--- -The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same data set. +The ML Commons API lets you train machine learning (ML) algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same dataset. -In order to train tasks through the API, three inputs are required. +To train tasks through the API, three inputs are required: - Algorithm name: Must be one of a [FunctionName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md). -- Model hyper parameters: Adjust these parameters to make the model train better. -- Input data: The data input that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use data frame. +- Model hyperparameters: Adjust these parameters to improve model accuracy. +- Input data: The data that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use a data frame. + +## Model access control considerations + +For clusters with model access control enabled, users can perform API operations on models in model groups with specified access levels as follows: + +- `public` model group: Any user. +- `restricted` model group: Only the model owner or users who share at least one backend role with the model group. +- `private` model group: Only the model owner. + +For clusters with model access control disabled, any user can perform API operations on models in any model group. +Admin users can perform API operations for models in any model group. -## Train model +For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). -The train operation trains a model based on a selected algorithm. Training can occur both synchronously and asynchronously. + +## Training the model + +The train API operation trains a model based on a selected algorithm. Training can occur both synchronously and asynchronously. ### Request -The following examples use the kmeans algorithm to train index data. +The following examples use the k-means algorithm to train index data. -**Train with kmeans synchronously** +**Train with k-means synchronously** ```json POST /_plugins/_ml/_train/kmeans @@ -52,8 +70,9 @@ POST /_plugins/_ml/_train/kmeans ] } ``` +{% include copy-curl.html %} -**Train with kmeans asynchronously** +**Train with k-means asynchronously** ```json POST /_plugins/_ml/_train/kmeans?async=true @@ -72,12 +91,13 @@ POST /_plugins/_ml/_train/kmeans?async=true ] } ``` +{% include copy-curl.html %} ### Response -**Synchronously** +**Synchronous** -For synchronous responses, the API returns the model_id, which can be used to get or delete a model. +For synchronous responses, the API returns the `model_id`, which can be used to get or delete a model. ```json { @@ -86,9 +106,9 @@ For synchronous responses, the API returns the model_id, which can be used to ge } ``` -**Asynchronously** +**Asynchronous** -For asynchronous responses, the API returns the task_id, which can be used to get or delete a task. +For asynchronous responses, the API returns the `task_id`, which can be used to get or delete a task. ```json { @@ -99,30 +119,56 @@ For asynchronous responses, the API returns the task_id, which can be used to ge ## Getting model information -You can retrieve information on your model using the model_id. +You can retrieve model information using the `model_id`. + +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods ```json GET /_plugins/_ml/models/ ``` +{% include copy-curl.html %} -The API returns information on the model, the algorithm used, and the content found within the model. +The response contains the following model information: ```json { - "name" : "KMEANS", - "algorithm" : "KMEANS", - "version" : 1, - "content" : "" +"name" : "all-MiniLM-L6-v2_onnx", +"algorithm" : "TEXT_EMBEDDING", +"version" : "1", +"model_format" : "TORCH_SCRIPT", +"model_state" : "LOADED", +"model_content_size_in_bytes" : 83408741, +"model_content_hash_value" : "9376c2ebd7c83f99ec2526323786c348d2382e6d86576f750c89ea544d6bbb14", +"model_config" : { + "model_type" : "bert", + "embedding_dimension" : 384, + "framework_type" : "SENTENCE_TRANSFORMERS", + "all_config" : """{"_name_or_path":"nreimers/MiniLM-L6-H384-uncased","architectures":["BertModel"],"attention_probs_dropout_prob":0.1,"gradient_checkpointing":false,"hidden_act":"gelu","hidden_dropout_prob":0.1,"hidden_size":384,"initializer_range":0.02,"intermediate_size":1536,"layer_norm_eps":1e-12,"max_position_embeddings":512,"model_type":"bert","num_attention_heads":12,"num_hidden_layers":6,"pad_token_id":0,"position_embedding_type":"absolute","transformers_version":"4.8.2","type_vocab_size":2,"use_cache":true,"vocab_size":30522}""" +}, +"created_time" : 1665961344044, +"last_uploaded_time" : 1665961373000, +"last_loaded_time" : 1665961815959, +"total_chunks" : 9 } ``` ## Registering a model -Use the register operation to register a custom model to a model index. ML Commons splits the model into smaller chunks and saves those chunks in the model's index. +Before you register a model, you must [register a model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#registering-a-model-group) for the model. +{: .important} + +All versions of a particular model are held in a model group. After you register a model group, you can register a model to the model group. ML Commons splits the model into smaller chunks and saves those chunks in the model's index. + +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods ```json POST /_plugins/_ml/models/_register ``` +{% include copy-curl.html %} ### Request fields @@ -130,11 +176,13 @@ All request fields are required. Field | Data type | Description :--- | :--- | :--- -`name`| string | The name of the model. | -`version` | integer | The version number of the model. | -`model_format` | string | The portable format of the model file. Currently only supports `TORCH_SCRIPT`. | -`model_config` | json object | The model's configuration, including the `model_type`, `embedding_dimension`, and `framework_type`. `all_config` is an optional JSON string which contains all model configurations. | -`url` | string | The URL which contains the model. | +`name`| String | The model's name. | +`version` | Integer | The model's version number. | +`model_format` | String | The portable format of the model file. Currently only supports `TORCH_SCRIPT`. | +`model_group_id` | String | The model group ID for the model. +`model_content_hash_value` | String | The model content hash generated using the SHA-256 hashing algorithm. +`model_config` | JSON object | The model's configuration, including the `model_type`, `embedding_dimension`, and `framework_type`. `all_config` is an optional JSON string that contains all model configurations. | +`url` | String | The URL that contains the model. | ### Example @@ -143,18 +191,22 @@ The following example request registers a version `1.0.0` of an NLP sentence tra ```json POST /_plugins/_ml/models/_register { - "name": "all-MiniLM-L6-v2", - "version": "1.0.0", - "description": "test model", - "model_format": "TORCH_SCRIPT", - "model_config": { - "model_type": "bert", - "embedding_dimension": 384, - "framework_type": "sentence_transformers", - }, - "url": "https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true" + "name": "all-MiniLM-L6-v2", + "version": "1.0.0", + "description": "test model", + "model_format": "TORCH_SCRIPT", + "model_group_id": "FTNlQ4gBYW0Qyy5ZoxfR", + "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f", + "model_config": { + "model_type": "bert", + "embedding_dimension": 384, + "framework_type": "sentence_transformers", + "all_config": "{\"_name_or_path\":\"nreimers/MiniLM-L6-H384-uncased\",\"architectures\":[\"BertModel\"],\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}" + }, + "url": "https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/all-MiniLM-L6-v2/1.0.1/torch_script/sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip" } ``` +{% include copy-curl.html %} ### Response @@ -167,18 +219,27 @@ OpenSearch responds with the `task_id` and task `status`. } ``` -To see the status of your model registration, enter the `task_id` in the [task API] ... +To see the status of your model registration and retrieve the model ID created for the new model version, pass the `task_id` as a path parameter to the Tasks API: + +```json +GET /_plugins/_ml/tasks/ +``` +{% include copy-curl.html %} + +The response contains the model ID of the model version: ```json { - "model_id" : "WWQI44MBbzI2oUKAvNUt", - "task_type" : "UPLOAD_MODEL", - "function_name" : "TEXT_EMBEDDING", - "state" : "REGISTERED", - "worker_node" : "KzONM8c8T4Od-NoUANQNGg", - "create_time" : 1665961344003, - "last_update_time" : 1665961373047, - "is_async" : true + "model_id": "Qr1YbogBYOqeeqR7sI9L", + "task_type": "DEPLOY_MODEL", + "function_name": "TEXT_EMBEDDING", + "state": "COMPLETED", + "worker_node": [ + "N77RInqjTSq_UaLh1k0BUg" + ], + "create_time": 1685478486057, + "last_update_time": 1685478491090, + "is_async": true } ``` @@ -186,6 +247,10 @@ To see the status of your model registration, enter the `task_id` in the [task A The deploy model operation reads the model's chunks from the model index and then creates an instance of the model to cache into memory. This operation requires the `model_id`. +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods + ```json POST /_plugins/_ml/models//_deploy ``` @@ -197,6 +262,7 @@ In this example request, OpenSearch deploys the model to any available OpenSearc ```json POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy ``` +{% include copy-curl.html %} ### Example: Deploying to a specific node @@ -208,6 +274,7 @@ POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy "node_ids": ["4PLK7KJWReyX0oWKnBA8nA"] } ``` +{% include copy-curl.html %} ### Response @@ -220,7 +287,11 @@ POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy ## Undeploying a model -To undeploy a model from memory, use the undeploy operation: +To undeploy a model from memory, use the undeploy operation. + +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods ```json POST /_plugins/_ml/models//_undeploy @@ -231,6 +302,7 @@ POST /_plugins/_ml/models//_undeploy ```json POST /_plugins/_ml/models/MGqJhYMBbbh0ushjm8p_/_undeploy ``` +{% include copy-curl.html %} ### Response: Undeploying a model from all ML nodes @@ -253,7 +325,7 @@ POST /_plugins/_ml/models/_undeploy "model_ids": ["KDo2ZYQB-v9VEDwdjkZ4"] } ``` - +{% include copy-curl.html %} ### Response: Undeploying specific models from specific nodes @@ -287,6 +359,7 @@ POST /_plugins/_ml/models/_undeploy "model_ids": ["KDo2ZYQB-v9VEDwdjkZ4"] } ``` +{% include copy-curl.html %} ### Response: Undeploying specific models from all nodes @@ -302,15 +375,24 @@ POST /_plugins/_ml/models/_undeploy ## Searching for a model -Use this command to search models you've already created. +Use this command to search for models you've already created. +The response will contain only those model versions to which you have access. For example, if you send a match all query, model versions for the following model group types will be returned: + +- All public model groups in the index. +- Private model groups for which you are the model owner. +- Model groups that share at least one backend role with your backend roles. + +For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). + +### Path and HTTP methods ```json POST /_plugins/_ml/models/_search {query} ``` -### Example: Querying all models +### Example: Searching for all models ```json POST /_plugins/_ml/models/_search @@ -321,8 +403,9 @@ POST /_plugins/_ml/models/_search "size": 1000 } ``` +{% include copy-curl.html %} -### Example: Querying models with algorithm "FIT_RCF" +### Example: Searching for models with algorithm "FIT_RCF" ```json POST /_plugins/_ml/models/_search @@ -336,6 +419,7 @@ POST /_plugins/_ml/models/_search } } ``` +{% include copy-curl.html %} ### Response @@ -393,9 +477,14 @@ POST /_plugins/_ml/models/_search Deletes a model based on the `model_id`. +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods + ```json DELETE /_plugins/_ml/models/ ``` +{% include copy-curl.html %} The API returns the following: @@ -430,8 +519,8 @@ GET /_plugins/_ml/profile/tasks Parameter | Data type | Description :--- | :--- | :--- -model_id | string | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. -tasks | string | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. +`model_id` | String | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. +`tasks`| String | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. ### Request fields @@ -439,11 +528,11 @@ All profile body request fields are optional. Field | Data type | Description :--- | :--- | :--- -node_ids | string | Returns all tasks and profiles from a specific node. -model_ids | string | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. -task_ids | string | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. -return_all_tasks | boolean | Determines whether or not a request returns all tasks. When set to `false` task profiles are left out of the response. -return_all_models | boolean | Determines whether or not a profile request returns all models. When set to `false` model profiles are left out of the response. +`node_ids` | String | Returns all tasks and profiles from a specific node. +`model_ids` | String | Returns runtime data for a specific model. You can string together multiple model IDs to return multiple model profiles. +`task_ids` | String | Returns runtime data for a specific task. You can string together multiple task IDs to return multiple task profiles. +`return_all_tasks` | Boolean | Determines whether or not a request returns all tasks. When set to `false`, task profiles are left out of the response. +`return_all_models` | Boolean | Determines whether or not a profile request returns all models. When set to `false`, model profiles are left out of the response. ### Example: Returning all tasks and models on a specific node @@ -455,6 +544,7 @@ GET /_plugins/_ml/profile "return_all_models": true } ``` +{% include copy-curl.html %} ### Response: Returning all tasks and models on a specific node @@ -500,6 +590,10 @@ GET /_plugins/_ml/profile ML Commons can predict new data with your trained model either from indexed data or a data frame. To use the Predict API, the `model_id` is required. +For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). + +### Path and HTTP methods + ```json POST /_plugins/_ml/_predict// ``` @@ -518,6 +612,7 @@ POST /_plugins/_ml/_predict/kmeans/ ] } ``` +{% include copy-curl.html %} ### Response @@ -587,15 +682,14 @@ POST /_plugins/_ml/_predict/kmeans/ ## Train and predict -Use to train and then immediately predict against the same training data set. Can only be used with unsupervised learning models and the following algorithms: +Use to train and then immediately predict against the same training dataset. Can only be used with unsupervised learning models and the following algorithms: - BATCH_RCF - FIT_RCF -- kmeans +- k-means ### Example: Train and predict with indexed data - ```json POST /_plugins/_ml/_train_predict/kmeans { @@ -625,6 +719,7 @@ POST /_plugins/_ml/_train_predict/kmeans ] } ``` +{% include copy-curl.html %} ### Example: Train and predict with data directly @@ -724,6 +819,7 @@ POST /_plugins/_ml/_train_predict/kmeans } } ``` +{% include copy-curl.html %} ### Response @@ -798,6 +894,7 @@ You can retrieve information about a task using the task_id. ```json GET /_plugins/_ml/tasks/ ``` +{% include copy-curl.html %} The response includes information about the task. @@ -825,7 +922,7 @@ GET /_plugins/_ml/tasks/_search ``` -### Example: Search task which "function_name" is "KMEANS" +### Example: Search task which `function_name` is `KMEANS` ```json GET /_plugins/_ml/tasks/_search @@ -843,6 +940,7 @@ GET /_plugins/_ml/tasks/_search } } ``` +{% include copy-curl.html %} ### Response @@ -916,6 +1014,7 @@ ML Commons does not check the task status when running the `Delete` request. The ```json DELETE /_plugins/_ml/tasks/{task_id} ``` +{% include copy-curl.html %} The API returns the following: @@ -944,24 +1043,28 @@ To receive all stats, use: ```json GET /_plugins/_ml/stats ``` +{% include copy-curl.html %} To receive stats for a specific node, use: ```json GET /_plugins/_ml//stats/ ``` +{% include copy-curl.html %} -To receive stats for a specific node and return a specified stat, use: +To receive stats for a specific node and return a specified stat, use: ```json GET /_plugins/_ml//stats/ ``` +{% include copy-curl.html %} To receive information on a specific stat from all nodes, use: ```json GET /_plugins/_ml/stats/ ``` +{% include copy-curl.html %} ### Example: Get all stats @@ -969,6 +1072,7 @@ GET /_plugins/_ml/stats/ ```json GET /_plugins/_ml/stats ``` +{% include copy-curl.html %} ### Response @@ -1033,6 +1137,7 @@ POST /_plugins/_ml/_execute/anomaly_localization "num_outputs": 10 } ``` +{% include copy-curl.html %} Upon execution, the API returns the following: diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 2358b982cd..35ebb8d1e8 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -13,17 +13,16 @@ ML Commons for OpenSearch eases the development of machine learning features by Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [`ad`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#ad) and [`kmeans`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#kmeans) Piped Processing Language (PPL) commands. -Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. +Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#training-the-model) through the ML Commons plugin support model-based algorithms such as k-means. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. Should you not want to use a model, you can use the [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-and-predict) API to test your model without having to evaluate the model's performance. +# Permissions -## Permissions +The ML Commons plugin has two reserved roles: -There are two reserved user roles that can use of the ML Commons plugin. - -- `ml_full_access`: Full access to all ML features, including starting new ML tasks and reading or deleting models. -- `ml_readonly_access`: Can only read ML tasks, trained models and statistics relevant to the model's cluster. Cannot start nor delete ML tasks or models. +- `ml_full_access`: Grants full access to all ML features, including starting new ML tasks and reading or deleting models. +- `ml_readonly_access`: Grants read-only access to ML tasks, trained models, and statistics relevant to the model's cluster. Does not grant permissions to start or delete ML tasks or models. ## ML node diff --git a/_ml-commons-plugin/model-access-control.md b/_ml-commons-plugin/model-access-control.md new file mode 100644 index 0000000000..26c6a76d20 --- /dev/null +++ b/_ml-commons-plugin/model-access-control.md @@ -0,0 +1,593 @@ +--- +layout: default +title: Model access control +has_children: false +nav_order: 180 +--- + +# Model access control + +You can use the Security plugin with ML Commons to manage access to specific models for non-admin users. For example, one department in an organization might want to restrict users in other departments from accessing their models. + +To accomplish this, users are assigned one or more [_backend roles_]({{site.url}}{{site.baseurl}}/security/access-control/index/). Rather than assign individual roles to individual users during user configuration, backend roles provide a way to map a set of users to a role by assigning the backend role to users when they log in. For example, users may be assigned an `IT` backend role that includes the `ml_full_access` role and have full access to all ML Commons features. Alternatively, other users may be assigned an `HR` backend role that includes the `ml_readonly_access` role and be limited to read-only access to machine learning (ML) features. Given this flexibility, backend roles can provide finer-grained access to models and make it easier to assign multiple users to a role rather than mapping a user and role individually. + +## Model groups + +For access control, models are organized into _model groups_---collections of versions of a particular model. Like users, model groups can be assigned one or more backend roles. All versions of the same model share the same model name and have the same backend role or roles. + +You are considered a model _owner_ when you create a new model group. You remain the owner of the model and all its versions even if another user registers a model to this model group. When a model owner creates a model group, the owner can specify one of the following _access modes_ for this model group: + +- `public`: All users who have access to the cluster can access this model group. +- `private`: Only the model owner or an admin user can access this model group. +- `restricted`: The owner, an admin user, or any user who shares one of the model group's backend roles can access any model in this model group. When creating a `restricted` model group, the owner must attach one or more of the owner's backend roles to the model. + +An admin can access all model groups in the cluster regardless of their access mode. +{: .note} + +## Model access control prerequisites + +Before using model access control, you must satisfy the following prerequisites: + +1. Enable the Security plugin on your cluster. For more information, see [Security in OpenSearch]({{site.url}}{{site.baseurl}}/security/). +2. For `restricted` model groups, ensure that an admin has [assigned backend roles to users](#assigning-backend-roles-to-users). +3. [Enable model access control](#enabling-model-access-control) on your cluster. You can enable model access control dynamically by setting `plugins.ml_commons.model_access_control_enabled` to `true`. + +If any of the prerequisites are not met, all models in the cluster are `public` and can be accessed by any user who has access to the cluster. +{: .note} + +## Assigning backend roles to users + +Create the appropriate backend roles and assign those roles to users. Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security/configuration/saml/), but if you use the internal user database, you can use the REST API to [add them manually]({{site.url}}{{site.baseurl}}/security/access-control/api#create-user). + +Only admin users can assign backend roles to users. +{: .note} + +When assigning backend roles, consider the following example of two users: `alice` and `bob`. + +The following request assigns the user `alice` the `analyst` backend role: + +```json +PUT _plugins/_security/api/internalusers/alice +{ + "password": "alice", + "backend_roles": [ + "analyst" + ], + "attributes": {} +} +``` + +The next request assigns the user `bob` the `human-resources` backend role: + +```json +PUT _plugins/_security/api/internalusers/bob +{ + "password": "bob", + "backend_roles": [ + "human-resources" + ], + "attributes": {} +} +``` + +Finally, the last request assigns both `alice` and `bob` the role that gives them full access to ML Commons: + +```json +PUT _plugins/_security/api/rolesmapping/ml_full_access +{ + "backend_roles": [], + "hosts": [], + "users": [ + "alice", + "bob" + ] +} +``` + +If `alice` creates a model group and assigns it the `analyst` backend role, `bob` cannot access this model. + +## Enabling model access control + +You can enable model access control dynamically as follows: + +```json +PUT _cluster/settings +{ + "transient": { + "plugins.ml_commons.model_access_control_enabled": "true" + } +} +``` +{% include copy-curl.html %} + +## Registering a model group + +Use the `_register` endpoint to register a model group. You can register a model group with a `public`, `private`, or `restricted` access mode. + +### Path and HTTP method + +```json +POST /_plugins/_ml/model_groups/_register +``` + +### Request fields + +The following table lists the available request fields. + +Field |Data type | Description +:--- | :--- | :--- +`name` | String | The model group name. Required. +`description` | String | The model group description. Optional. +`model_access_mode` | String | The access mode for this model. Valid values are `public`, `private`, and `restricted`. When this parameter is set to `restricted`, you must specify either `backend_roles` or `add_all_backend_roles`, but not both. Optional. Default is `restricted`. +`backend_roles` | Array | A list of the model owner's backend roles to add to the model. Can be specified only if the `model_access_mode` is `restricted`. Cannot be specified at the same time as `add_all_backend_roles`. Optional. +`add_all_backend_roles` | Boolean | If `true`, all backend roles of the model owner are added to the model group. Default is `false`. Cannot be specified at the same time as `backend_roles`. Admin users cannot set this parameter to `true`. Optional. + +#### Example request + +```json +POST /_plugins/_ml/model_groups/_register +{ + "name": "test_model_group_public", + "description": "This is a public model group", + "model_access_mode": "public" +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "model_group_id": "GDNmQ4gBYW0Qyy5ZcBcg", + "status": "CREATED" +} +``` + +### Response fields + +The following table lists the available response fields. + +Field |Data type | Description +:--- | :--- | :--- +`model_group_id` | String | The model group ID that you can use to access this model group. +`status` | String | The operation status. + +### Registering a public model group + +If you register a model group with a `public` access mode, any model in this model group will be accessible to any user with access to the cluster. The following request registers a public model group: + +```json +POST /_plugins/_ml/model_groups/_register +{ + "name": "test_model_group_public", + "description": "This is a public model group", + "model_access_mode": "public" +} +``` +{% include copy-curl.html %} + +### Registering a restricted model group + +To limit access by backend role, you must register a model group with the `restricted` access mode. + +When registering a model group, you must attach one or more of your backend roles to the model using one but not both of the following methods: + - Provide a list of backend roles in the `backend_roles` parameter. + - Set the `add_all_backend_roles` parameter to `true` to add all your backend roles to the model group. This option is not available to admin users. + +Any user who shares a backend role with the model group can access any model in this model group. This grants the user the permissions included with the user role that is mapped to the backend role. + +An admin user can access all model groups regardless of their access mode. +{: .note} + +#### Example request: A list of backend roles + +The following request registers a restricted model group, which can be accessed only by users with the `IT` backend role: + +```json +POST /_plugins/_ml/model_groups/_register +{ + "name": "model_group_test", + "description": "This is an example description", + "model_access_mode": "restricted", + "backend_roles" : ["IT"] +} +``` +{% include copy-curl.html %} + +#### Example request: All backend roles + +The following request registers a restricted model group, adding all backend roles of the user to the model group: + +```json +POST /_plugins/_ml/model_groups/_register +{ + "name": "model_group_test", + "description": "This is an example description", + "model_access_mode": "restricted", + "add_all_backend_roles": "true" +} +``` +{% include copy-curl.html %} + +### Registering a private model group + +If you register a model group with a `private` access mode, any model in this model group will be accessible only to you and the admin users. The following request registers a private model group: + +```json +POST /_plugins/_ml/model_groups/_register +{ + "name": "model_group_test", + "description": "This is an example description", + "model_access_mode": "private" +} +``` +{% include copy-curl.html %} + +### Registering a model group in a cluster where model access control is disabled + +If model access control is disabled on your cluster (one of the [prerequisites](#model-access-control-prerequisites) is not met), you can register a model group with a `name` and `description` but cannot specify any of the access parameters (`model_access_name`, `backend_roles`, or `add_backend_roles`). By default, in such a cluster, all model groups are public. + +## Updating a model group + +To update a model group, send a request to the `_update` endpoint. + +When updating a model group, the following restrictions apply: + +- The model owner or an admin user can update all fields. Any user who shares one or more backend roles with the model group can update the `name` and `description` fields only. +- When updating the `model_access_mode` to `restricted`, you must specify one but not both `backend_roles` or `add_all_backend_roles`. + +### Path and HTTP method + +```json +PUT /_plugins/_ml/model_groups/_update +``` + +### Request fields + +Refer to [Request fields](#request-fields-1) for request field descriptions. + +#### Example request + +```json +PUT /_plugins/_ml/model_groups/_update +{ + "name": "model_group_test", + "description": "This is an example description", + "add_all_backend_roles": true +} +``` +{% include copy-curl.html %} + +### Updating a model group in a cluster where model access control is disabled + +If model access control is disabled on your cluster (one of the [prerequisites](#model-access-control-prerequisites) is not met), you can update only the `name` and `description` of a model group but cannot update any of the access parameters (`model_access_name`, `backend_roles`, or `add_backend_roles`). + +## Searching for a model group + +When you search for a model group, only those model groups to which you have access will be returned. For example, for a match all query, model groups that will be returned are: + +- All public model groups in the index +- Private model groups for which you are the owner +- Model groups that share at least one of the `backend_roles` with you + +### Path and HTTP method + +```json +POST /_plugins/_ml/model_groups/_search +GET /_plugins/_ml/model_groups/_search +``` + +#### Example request: Match all + +The following request is sent by `user1` who has the `IT` and `HR` roles: + +```json +POST /_plugins/_ml/model_groups/_search +{ + "query": { + "match_all": {} + }, + "size": 1000 +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "took": 31, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 7, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": ".plugins-ml-model-group", + "_id": "TRqZfYgBD7s2oEFdvrQj", + "_version": 1, + "_seq_no": 2, + "_primary_term": 1, + "_score": 1, + "_source": { + "backend_roles": [ + "HR", + "IT" + ], + "owner": { + "backend_roles": [ + "HR", + "IT" + ], + "custom_attribute_names": [], + "roles": [ + "ml_full_access", + "own_index", + "test_ml" + ], + "name": "user1", + "user_requested_tenant": "__user__" + }, + "created_time": 1685734407714, + "access": "restricted", + "latest_version": 0, + "last_updated_time": 1685734407714, + "name": "model_group_test", + "description": "This is an example description" + } + }, + { + "_index": ".plugins-ml-model-group", + "_id": "URqZfYgBD7s2oEFdyLTm", + "_version": 1, + "_seq_no": 3, + "_primary_term": 1, + "_score": 1, + "_source": { + "backend_roles": [ + "IT" + ], + "owner": { + "backend_roles": [ + "HR", + "IT" + ], + "custom_attribute_names": [], + "roles": [ + "ml_full_access", + "own_index", + "test_ml" + ], + "name": "user1", + "user_requested_tenant": "__user__" + }, + "created_time": 1685734410470, + "access": "restricted", + "latest_version": 0, + "last_updated_time": 1685734410470, + "name": "model_group_test", + "description": "This is an example description" + } + }, + ... + ] + } +} +``` + +#### Example request: Search for model groups with an owner name + +The following request to search for model groups of `user` is sent by `user2` who has the `IT` backend role: + +```json +GET /_plugins/_ml/model_groups/_search +{ + "query": { + "bool": { + "must": [ + { + "nested": { + "query": { + "term": { + "owner.name.keyword": { + "value": "user1", + "boost": 1 + } + } + }, + "path": "owner", + "ignore_unmapped": false, + "score_mode": "none", + "boost": 1 + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 4, + "relation": "eq" + }, + "max_score": 0, + "hits": [ + { + "_index": ".plugins-ml-model-group", + "_id": "TRqZfYgBD7s2oEFdvrQj", + "_version": 1, + "_seq_no": 2, + "_primary_term": 1, + "_score": 0, + "_source": { + "backend_roles": [ + "HR", + "IT" + ], + "owner": { + "backend_roles": [ + "HR", + "IT" + ], + "custom_attribute_names": [], + "roles": [ + "ml_full_access", + "own_index", + "test_ml" + ], + "name": "user1", + "user_requested_tenant": "__user__" + }, + "created_time": 1685734407714, + "access": "restricted", + "latest_version": 0, + "last_updated_time": 1685734407714, + "name": "model_group_test", + "description": "This is an example description" + } + }, + ... + ] + } +} +``` + +#### Example request: Search for model groups with a model group ID + +```json +GET /_plugins/_ml/model_groups/_search +{ + "query": { + "bool": { + "must": [ + { + "terms": { + "_id": [ + "HyPNK4gBwNxGowI0AtDk" + ] + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "took": 2, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": ".plugins-ml-model-group", + "_id": "HyPNK4gBwNxGowI0AtDk", + "_version": 3, + "_seq_no": 16, + "_primary_term": 5, + "_score": 1, + "_source": { + "backend_roles": [ + "IT" + ], + "owner": { + "backend_roles": [ + "", + "HR", + "IT" + ], + "custom_attribute_names": [], + "roles": [ + "ml_full_access", + "own_index", + "test-ml" + ], + "name": "user1", + "user_requested_tenant": null + }, + "created_time": 1684362035938, + "latest_version": 2, + "last_updated_time": 1684362571300, + "name": "model_group_test", + "description": "This is an example description" + } + } + ] + } +} +``` + +## Deleting a model group + +You can only delete a model group if it does not contain any model versions. +{: .important} + +If model access control is enabled on your cluster, only the owner or users with matching backend roles can delete the model group. Any users can delete any public model group. + +If model access control is disabled on your cluster, users with the `delete model group API` permission can delete any model group. + +Admin users can delete any model group. +{: .note} + +#### Example request + +```json +DELETE _plugins/_ml/model_groups/ +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "_index": ".plugins-ml-model-group", + "_id": "l8nnQogByXnLJ-QNpEk2", + "_version": 5, + "result": "deleted", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 70, + "_primary_term": 23 +} +``` From bc02ea2353ee56c5d80a6e735a8963f74f32e0d6 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 20 Jun 2023 18:23:42 -0400 Subject: [PATCH 043/286] Update nested.md (#4363) Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _field-types/supported-field-types/nested.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_field-types/supported-field-types/nested.md b/_field-types/supported-field-types/nested.md index d09caf0ea8..e6f2eec6c3 100644 --- a/_field-types/supported-field-types/nested.md +++ b/_field-types/supported-field-types/nested.md @@ -203,5 +203,5 @@ Parameter | Description :--- | :--- [`dynamic`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to this object. Valid values are `true`, `false`, and `strict`. Default is `true`. `include_in_parent` | A Boolean value that specifies whether all fields in the child nested object should also be added to the parent document in flattened form. Default is `false`. -`incude_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. +`include_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. From c1ddaa7a683d8e5fcd5cebf86bec34109715595e Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 20 Jun 2023 22:13:54 -0400 Subject: [PATCH 044/286] Reformat supported field types index page (#4349) * Reformat supported field types index page Signed-off-by: Fanit Kolchina * Reorder search TOC topics Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _config.yml | 20 +++++++------- _field-types/supported-field-types/index.md | 30 ++++++++++----------- 2 files changed, 24 insertions(+), 26 deletions(-) diff --git a/_config.yml b/_config.yml index 0d6ac85575..da62259ade 100644 --- a/_config.yml +++ b/_config.yml @@ -110,7 +110,7 @@ just_the_docs: name: OpenSearch Dashboards nav_fold: true tuning-your-cluster: - name: Tuning your cluster + name: Creating and tuning your cluster nav_fold: true security: name: Security in OpenSearch @@ -118,27 +118,24 @@ just_the_docs: security-analytics: name: Security analytics nav_fold: true + field-types: + name: Mappings and field types + nav_fold: true + query-dsl: + name: Query DSL, Aggregations, and Analyzers + nav_fold: true search-plugins: name: Search nav_fold: true ml-commons-plugin: name: Machine learning nav_fold: true - tuning-your-cluster: - name: Creating and tuning your cluster - nav_fold: true monitoring-your-cluster: name: Monitoring your cluster nav_fold: true observing-your-data: name: Observability nav_fold: true - query-dsl: - name: Query DSL, Aggregations, and Analyzers - nav_fold: true - field-types: - name: Mappings and field types - nav_fold: true clients: name: Clients nav_fold: true @@ -229,4 +226,5 @@ exclude: - vendor/gems/ - vendor/ruby/ - README.md - - .idea \ No newline at end of file + - .idea + - templates \ No newline at end of file diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 3cb8bff8cd..88fe8b038b 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -13,21 +13,21 @@ redirect_from: You can specify data types for your fields when creating a mapping. The following table lists all data field types that OpenSearch supports. -Field data type | Description -:--- | :--- -[`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/) | An additional name for an existing field. -[`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/) | A binary value in Base64 encoding. -[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | `byte`, `double`, `float`, `half_float`, `integer`, `long`, `unsigned_long`, `scaled_float`, `short`. -[`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/) | A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/) | `date`, `date_nanos`. -[`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) | An IP address in IPv4 or IPv6 format. -[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | `integer_range`, `long_range`,`double_range`, `float_range`, `date_range`,`ip_range`. -[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/) | `object`, `nested`, `join`. -String | [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/), [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/), [`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/). -[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) | `completion`, `search_as_you_type`. -[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/) | `geo_point`, `geo_shape`. -[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | `rank_feature`, `rank_features`. -[`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/) | Specifies to treat this field as a query. +Category | Field types and descriptions +:--- | :--- +Alias | [`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/): An additional name for an existing field. +Binary | [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/): A binary value in Base64 encoding. +[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/unsigned-long/), `scaled_float`, `short`). +Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/): A Boolean value. +[Date]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date-nanos/): A date stored in nanoseconds. +IP | [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. +[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). +[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. +[String]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. +[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. +[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/): A geographic shape. +[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). +Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/): Specifies to treat this field as a query. ## Arrays From c30af785f827e99a0724aa1201ff9805306ba914 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 21 Jun 2023 15:59:04 -0400 Subject: [PATCH 045/286] Update links in field types index (#4379) Signed-off-by: Fanit Kolchina Signed-off-by: Melissa Vagi --- _field-types/supported-field-types/index.md | 26 ++++++++++----------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 88fe8b038b..5dd2b8d694 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -15,19 +15,19 @@ You can specify data types for your fields when creating a mapping. The followin Category | Field types and descriptions :--- | :--- -Alias | [`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/): An additional name for an existing field. -Binary | [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/): A binary value in Base64 encoding. -[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/unsigned-long/), `scaled_float`, `short`). -Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/): A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date-nanos/): A date stored in nanoseconds. -IP | [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. -[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). -[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. -[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. -[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/): A geographic shape. -[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). -Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/): Specifies to treat this field as a query. +Alias | [`alias`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/alias/): An additional name for an existing field. +Binary | [`binary`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/binary/): A binary value in Base64 encoding. +[Numeric]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/), `scaled_float`, `short`). +Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/): A Boolean value. +[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/): A date stored in nanoseconds. +IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. +[Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). +[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. +[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. +[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. +[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. +[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). +Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. ## Arrays From a2cf71ee1c868e714fc0ad62661e0bc88b5027ab Mon Sep 17 00:00:00 2001 From: Heather Halter Date: Thu, 22 Jun 2023 07:30:49 -0700 Subject: [PATCH 046/286] updatestomigrationdoc (#4343) Signed-off-by: Heather Halter Signed-off-by: Melissa Vagi --- _upgrade-to/upgrade-to.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_upgrade-to/upgrade-to.md b/_upgrade-to/upgrade-to.md index fc82528c53..b4b9cb6126 100644 --- a/_upgrade-to/upgrade-to.md +++ b/_upgrade-to/upgrade-to.md @@ -18,6 +18,8 @@ If your existing cluster runs an older version of Elasticsearch OSS, the first s Cluster restart upgrades work between minor versions (for example, 6.5 to 6.8) and the next major version (for example, 6.x to 7.10.2). Cluster restart upgrades are faster to perform and require fewer intermediate upgrades, but require downtime. +To migrate a post-fork version of Elasticsearch (7.11+) to OpenSearch, you can use Logstash. You'll need to employ the Elasticsearch input plugin within Logstash to extract data from the Elasticsearch cluster, and the [Logstash Output OpenSearch plugin](https://github.com/opensearch-project/logstash-output-opensearch#configuration-for-logstash-output-opensearch-plugin) to write the data to the OpenSearch 2.x cluster. We suggest using Logstash version 7.13.4 or earlier, as newer versions may encounter compatibility issues when establishing a connection with OpenSearch due to changes introduced by Elasticsearch subsequent to the fork. We strongly recommend that users test this solution with their own data to ensure effectiveness. +{: .note} ## Migration paths @@ -62,7 +64,7 @@ If you are migrating an Open Distro for Elasticsearch cluster, we recommend firs sudo yum install elasticsearch-oss-7.10.2 --enablerepo=elasticsearch ``` - For tarball installations, extract to a new directory to ensure you **do not overwrite** your `config`, `data`, and `logs` directories. Ideally, these directories should have their own, independent paths and *not* be colocated with the Elasticsearch application directory. Then set the `ES_PATH_CONF` environment variable to the directory that contains `elasticsearch.yml` (for example, `/etc/elasticesarch/`). In `elasticsearch.yml`, set `path.data` and `path.logs` to your `data` and `logs` directories (for example, `/var/lib/elasticsearch` and `/var/log/opensearch`). + For tarball installations, extract to a new directory to ensure you **do not overwrite** your `config`, `data`, and `logs` directories. Ideally, these directories should have their own, independent paths and *not* be colocated with the Elasticsearch application directory. Then set the `ES_PATH_CONF` environment variable to the directory that contains `elasticsearch.yml` (for example, `/etc/elasticsearch/`). In `elasticsearch.yml`, set `path.data` and `path.logs` to your `data` and `logs` directories (for example, `/var/lib/elasticsearch` and `/var/log/opensearch`). 1. Restart Elasticsearch OSS on the node (rolling) or all nodes (cluster restart). From fe7943f60c5a8b740309e22eeb038ea850e92f33 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 09:53:34 -0600 Subject: [PATCH 047/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 255567e6cf..3488c0cf0a 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -3,6 +3,7 @@ layout: default title: Ingest processors parent: Ingest APIs nav_order: 50 +has_children: true --- # Ingest processors @@ -18,4 +19,4 @@ GET /_nodes/ingest ``` {% include copy-curl.html %} -Learn more about the processor types within their respective documentation. +To configure and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. From 47bbb6911017ec1b1cf054716e311dd9111740e6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 09:54:36 -0600 Subject: [PATCH 048/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/append.md | 4 ---- _api-reference/ingest-apis/processors-list/bytes.md | 11 +++++++++++ 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 7fabb2ff8b..1bee5b5b3a 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -10,10 +10,6 @@ nav_order: 10 The append ingest processor enriches incoming data during the ingestion process by appending additional fields or values to each document. The append processor operates on a per-dcoument basis, meaning it processes each incoming document individually. Learn how to use the append processor in your data processing workflows in the following documentation. -## Getting started - -To use the append processor, make sure you have the necessary permissions and access rights to configure and deploy ingest processors. - ## Configuration parameters The append processor requires the following configuration parameters to specify the target field or value to append to incoming documents. diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index e69de29bb2..8430b3e15b 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -0,0 +1,11 @@ +--- +layout: default +title: Bytes +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 20 +--- + +# Bytes + +The bytes ingest processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. \ No newline at end of file From fabc15e8032fd09c52806c81fabbbea1012dcd16 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 10:22:51 -0600 Subject: [PATCH 049/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/append.md | 4 +-- .../ingest-apis/processors-list/bytes.md | 33 ++++++++++++++++++- 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 1bee5b5b3a..92fd49154d 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -12,7 +12,7 @@ The append ingest processor enriches incoming data during the ingestion process ## Configuration parameters -The append processor requires the following configuration parameters to specify the target field or value to append to incoming documents. +The append processor supports the following parameters to append to incoming documents. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| @@ -26,7 +26,7 @@ The append processor requires the following configuration parameters to specify Following are examples of an append processor configuration and how to add it to an ingest pipeline. -#### Example: Append configuration +#### Example: Append processor configuration ```json { diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index 8430b3e15b..ad9f5d8998 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -8,4 +8,35 @@ nav_order: 20 # Bytes -The bytes ingest processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. \ No newline at end of file +The bytes ingest processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. + +## Configuration parameters + +The byte processor supports the following parameter options. The parameter `field` is required. All others are optional. + +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`field` | Required | Name of the field where the data should be converted. | +`target_field` | Required| Name of the field to store the converted vlaue. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`on_failure` | Optional | Action to take if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of a byte ingest processor configuration. + +#### Example: Byte processor configuration + +```json +{ + "bytes": { + "field": "file.size", + "target_field": "file.size_bytes", + "ignore_missing": true, + "description": "Converts the file size field to bytes" + } +} +``` + From 2d7205ec973bd536f99d5f9431a785896ed157c9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 10:24:19 -0600 Subject: [PATCH 050/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index ad9f5d8998..e982ea130b 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -17,7 +17,7 @@ The byte processor supports the following parameter options. The parameter `fiel **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. | -`target_field` | Required| Name of the field to store the converted vlaue. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`target_field` | Required | Name of the field to store the converted vlaue. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | From bc206027f7a5e62065914d1dfab2cc2601b42ef1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 11:17:08 -0600 Subject: [PATCH 051/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/bytes.md | 16 ++++++------ .../ingest-apis/processors-list/convert.md | 25 +++++++++++++++++++ 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index e982ea130b..2ba0338ac8 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -31,12 +31,14 @@ Following is an example of a byte ingest processor configuration. ```json { - "bytes": { - "field": "file.size", - "target_field": "file.size_bytes", - "ignore_missing": true, - "description": "Converts the file size field to bytes" - } + "description": "Converts the file size field to bytes", + "processors": [ + { + "bytes": { + "field": "file.size", + "target_field": "file.size_bytes" + } + } + ] } ``` - diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md index e69de29bb2..f9282990c6 100644 --- a/_api-reference/ingest-apis/processors-list/convert.md +++ b/_api-reference/ingest-apis/processors-list/convert.md @@ -0,0 +1,25 @@ +--- +layout: default +title: Convert +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 10 +--- + +# Convert + +The convert ingest processor converts a field in a document to a different type. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. + +Specifying `boolean` will set the field to `true` if its string value is equal to `true` (ignore case), to false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise. + +## Configuration parameters + +The byte processor supports the following parameter options. The parameters `field` and `type` are required. All others are optional. + +Following is an example of a convert ingest processor configuration. + +#### Example: Convert processor configuration + +```json + +``` \ No newline at end of file From 1827fae0e0225f3ca4472920ff3c416415463121 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 12:54:37 -0600 Subject: [PATCH 052/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/bytes.md | 2 +- .../ingest-apis/processors-list/convert.md | 24 ++++++++++++++++++- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index 2ba0338ac8..d53b1c24ab 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -17,7 +17,7 @@ The byte processor supports the following parameter options. The parameter `fiel **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. | -`target_field` | Required | Name of the field to store the converted vlaue. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`target_field` | Required | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md index f9282990c6..f9d21a6181 100644 --- a/_api-reference/ingest-apis/processors-list/convert.md +++ b/_api-reference/ingest-apis/processors-list/convert.md @@ -16,10 +16,32 @@ Specifying `boolean` will set the field to `true` if its string value is equal t The byte processor supports the following parameter options. The parameters `field` and `type` are required. All others are optional. +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`field` | Required | Name of the field where the data should be converted. | +`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`type` | Required | +`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`on_failure` | Optional | Action to take if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + Following is an example of a convert ingest processor configuration. #### Example: Convert processor configuration ```json - +{ + "description": "Converts the file size field to an integer", + "processors": [ + { + "convert": { + "field": "file.size", + "type": "integer" + } + } + ] +} ``` \ No newline at end of file From 765c287bad95c4adf54f10d1b8252cbc410071c4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 12:55:14 -0600 Subject: [PATCH 053/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md index f9d21a6181..3230a35fbb 100644 --- a/_api-reference/ingest-apis/processors-list/convert.md +++ b/_api-reference/ingest-apis/processors-list/convert.md @@ -44,4 +44,4 @@ Following is an example of a convert ingest processor configuration. } ] } -``` \ No newline at end of file +``` From 417373a31b80db8feefb0bc0efbf109a6d9ae6bd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 12:55:47 -0600 Subject: [PATCH 054/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md index 3230a35fbb..b15729b561 100644 --- a/_api-reference/ingest-apis/processors-list/convert.md +++ b/_api-reference/ingest-apis/processors-list/convert.md @@ -3,7 +3,7 @@ layout: default title: Convert parent: Ingest processors grand_parent: Ingest APIs -nav_order: 10 +nav_order: 30 --- # Convert From 3e73879ab129f32c0a5e56457e17b04befc97361 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 13:49:20 -0600 Subject: [PATCH 055/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/append.md | 2 +- .../ingest-apis/processors-list/bytes.md | 2 +- .../ingest-apis/processors-list/convert.md | 2 +- .../ingest-apis/processors-list/csv.md | 31 +++++++++++++++++++ 4 files changed, 34 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-list/append.md index 92fd49154d..97e0aebc28 100644 --- a/_api-reference/ingest-apis/processors-list/append.md +++ b/_api-reference/ingest-apis/processors-list/append.md @@ -12,7 +12,7 @@ The append ingest processor enriches incoming data during the ingestion process ## Configuration parameters -The append processor supports the following parameters to append to incoming documents. +The append processor supports the following parameters. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-list/bytes.md index d53b1c24ab..620f154c75 100644 --- a/_api-reference/ingest-apis/processors-list/bytes.md +++ b/_api-reference/ingest-apis/processors-list/bytes.md @@ -12,7 +12,7 @@ The bytes ingest processor converts a human-readable byte value to its equivalen ## Configuration parameters -The byte processor supports the following parameter options. The parameter `field` is required. All others are optional. +The byte processor supports the following parameters. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-list/convert.md index b15729b561..9454359531 100644 --- a/_api-reference/ingest-apis/processors-list/convert.md +++ b/_api-reference/ingest-apis/processors-list/convert.md @@ -14,7 +14,7 @@ Specifying `boolean` will set the field to `true` if its string value is equal t ## Configuration parameters -The byte processor supports the following parameter options. The parameters `field` and `type` are required. All others are optional. +The byte processor supports the following parameters. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| diff --git a/_api-reference/ingest-apis/processors-list/csv.md b/_api-reference/ingest-apis/processors-list/csv.md index e69de29bb2..7e76b9dafc 100644 --- a/_api-reference/ingest-apis/processors-list/csv.md +++ b/_api-reference/ingest-apis/processors-list/csv.md @@ -0,0 +1,31 @@ +--- +layout: default +title: CSV +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 30 +--- + +# CSV + +The CSV ingest processor is used to parse CSV data and store it as individual fields in a document. + +## Configuration parameters + +The CSV processor supports the following parameters. + +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`field` | Required | Name of the field where the data should be converted. | +`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`type` | Required | +`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`on_failure` | Optional | Action to take if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of a convert ingest processor configuration. + +#### Example: Convert processor configuration \ No newline at end of file From f499ddb9d292f6249f595ca9e4d93bf9987a5aba Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 14:21:31 -0600 Subject: [PATCH 056/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/csv.md | 35 ++++++++++++++----- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/csv.md b/_api-reference/ingest-apis/processors-list/csv.md index 7e76b9dafc..6846f42129 100644 --- a/_api-reference/ingest-apis/processors-list/csv.md +++ b/_api-reference/ingest-apis/processors-list/csv.md @@ -3,7 +3,7 @@ layout: default title: CSV parent: Ingest processors grand_parent: Ingest APIs -nav_order: 30 +nav_order: 40 --- # CSV @@ -16,16 +16,33 @@ The CSV processor supports the following parameters. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. | -`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`type` | Required | -`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`field` | Required | Name of the field to extract data from. | +`target_fields` | Required | Name of the field to store the parsed data in. | +`delimiter` | Optional | The delimiter used to separate the fields in the CSV data. | +`quote` | Optional | The character used to quote fields in the CSV data. | +`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | +`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of each field. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | `on_failure` | Optional | Action to take if an error occurs. | +`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of a convert ingest processor configuration. +Following is an example of a CSV ingest processor configuration. -#### Example: Convert processor configuration \ No newline at end of file +#### Example: CSV processor configuration + +```json +{ + "description": "Parses the CSV data in the `data` field", + "processors": [ + { + "csv": { + "field": "data", + "target_fields": ["field1", "field2", "field3"], + "ignore_missing": true + } + } + ] +} +``` From 1451142059d826c7cf6602a0302fa4983ce0c366 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 15:11:18 -0600 Subject: [PATCH 057/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/ingest-processors.md | 89 +++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 3488c0cf0a..9840c36290 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,3 +20,92 @@ GET /_nodes/ingest {% include copy-curl.html %} To configure and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. +{: .note} + +## Set up a processor + +Following is an example of how to set up a processor in OpenSearch. Replace `my_index` with the actual name you want to ingest the document into, and adjust the field names and values to match your specfic use case. + +```json +# Define the processor configuration +processor_config = { + "description": "Custom single-field processor", + "processors": [ + { + "set: { + "field": "my_field" + "value": "default_value" + } + } + ] +} + +# Create the processor using the OpenSearch ingest APIs or REST API +processor_name = "my_single_field_processor" +opensearch.ingest.put.pipeline(id=processor_name, body=processor_config) + +# Test the processor on a single document +document = { + "my_field": "orignal_value" +} + +# Ingest the document with the processor applied +ingest_config = { + "pipeline": = processor_name, + "document": document +} +result = opensearch.ingest(index="my_index", body=ingest_config) + +# Check the output +print(result) +``` + +## Create data source for ingest processor + +To create a data source for an ingest processor in OpenSearch, you can use the OpenSearch Dashboards API to define an index template and mapping. Following is an example of how you can create a data source with an ingest processor. Make sure you have OpenSearch running and accessible at the appropriate host and port before deploying the request. + +```json + +PUT /_index_template/my-index-template +{ + "index_patterns": ["my-index-*"], + "template": { + "settings": { + "index": { + "number_of_shards": 1, + "number_of_replicas": 0 + } + }, + "mappings": { + "properties": { + "my_field": { + "type": "text" + } + } + } + }, + "priority": 100, + "composed_of": ["my-pipeline"] +} +``` + +| Name | Description | +|------|-------------| +| `PUT` | Request used to create a new index template. In the example, replace "my-index-template" with your desired template name. | +| `index_patterns` | Field that specifies the pattern of index names to which the template should be applied. In the example, the pattern "my-index" is used, which matches all indexes starting with "my-index." Replace it with your desired index pattern. | +|`settings` | Defines index-level settings. Adjust these values according to your requirements. | +| `mappings` | Defines the field mappings for the index. Modify the field name and data type according to your needs. | +|`priority` | Optional field that can be used to control the order in which the templates are evaluated. A higher value indicates a higher priority. | +| `composed_of` | Field that specifes the pipeline(s) that should be applied to the document ingested into the index. In the example, replace "my-pipeline" with the actual name of your pipeline. | + +## Deleting a processor or data source + +To delete a processor, + + + +To delete a data source, + + + + From 86057864c3bd330a0466c95306803065d1ae3369 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 15:24:30 -0600 Subject: [PATCH 058/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/date-index-name.md | 7 +++++++ _api-reference/ingest-apis/processors-list/date.md | 7 +++++++ _api-reference/ingest-apis/processors-list/dissect.md | 7 +++++++ 3 files changed, 21 insertions(+) diff --git a/_api-reference/ingest-apis/processors-list/date-index-name.md b/_api-reference/ingest-apis/processors-list/date-index-name.md index e69de29bb2..13e56d99d6 100644 --- a/_api-reference/ingest-apis/processors-list/date-index-name.md +++ b/_api-reference/ingest-apis/processors-list/date-index-name.md @@ -0,0 +1,7 @@ +--- +layout: default +title: Date index name +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 60 +--- \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors-list/date.md b/_api-reference/ingest-apis/processors-list/date.md index e69de29bb2..1326b4e425 100644 --- a/_api-reference/ingest-apis/processors-list/date.md +++ b/_api-reference/ingest-apis/processors-list/date.md @@ -0,0 +1,7 @@ +--- +layout: default +title: Date +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 50 +--- \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors-list/dissect.md b/_api-reference/ingest-apis/processors-list/dissect.md index e69de29bb2..bc3bcefd79 100644 --- a/_api-reference/ingest-apis/processors-list/dissect.md +++ b/_api-reference/ingest-apis/processors-list/dissect.md @@ -0,0 +1,7 @@ +--- +layout: default +title: Dissect +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 70 +--- \ No newline at end of file From f33877856a4346fa5e359f4c9302ff5134512e48 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 22 Jun 2023 16:37:37 -0600 Subject: [PATCH 059/286] Writing processors backlog documentation Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/csv.md | 4 +- .../ingest-apis/processors-list/date.md | 45 ++++++++++++++++++- 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/processors-list/csv.md b/_api-reference/ingest-apis/processors-list/csv.md index 6846f42129..31c35b9236 100644 --- a/_api-reference/ingest-apis/processors-list/csv.md +++ b/_api-reference/ingest-apis/processors-list/csv.md @@ -17,7 +17,7 @@ The CSV processor supports the following parameters. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field to extract data from. | -`target_fields` | Required | Name of the field to store the parsed data in. | +`target_field` | Required | Name of the field to store the parsed data in. | `delimiter` | Optional | The delimiter used to separate the fields in the CSV data. | `quote` | Optional | The character used to quote fields in the CSV data. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | @@ -39,7 +39,7 @@ Following is an example of a CSV ingest processor configuration. { "csv": { "field": "data", - "target_fields": ["field1", "field2", "field3"], + "target_field": ["field1", "field2", "field3"], "ignore_missing": true } } diff --git a/_api-reference/ingest-apis/processors-list/date.md b/_api-reference/ingest-apis/processors-list/date.md index 1326b4e425..45d42ea072 100644 --- a/_api-reference/ingest-apis/processors-list/date.md +++ b/_api-reference/ingest-apis/processors-list/date.md @@ -4,4 +4,47 @@ title: Date parent: Ingest processors grand_parent: Ingest APIs nav_order: 50 ---- \ No newline at end of file +--- + +# Date + +The date ingest processor is used to parse dates from fields in a document annd store them as a timestamp. + +## Configuration parameters + +The date processor supports the following parameters. + +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`field` | Required | Name of the field to extract data from. | +`target_field` | Optional | Name of the field to store the parsed data in. | +`format` | Required | The format of the date in the `field` field. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | +`locale` | Optional | The locale to use when parsing the date. The default locale is | +`timezone ` | Optional | The timezone to use when parsing the date. | +`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`on_failure` | Optional | Action to take if an error occurs. | +`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of a date ingest processor configuration. + +#### Example: Date processor configuration + +```json +{ + "description": "Parses the date string in the `date_string` field and stores parsed date in the `date_timestamp` field", + "processors": [ + { + "date": { + "field": "date_string", + "target_field": ["date_timestamp"], + "format": "yyyy-MM-dd'T'HH:mm:ss.SSSZZ", + "locale": "en-US", + "ignore_missing": true + } + } + ] +} +``` From 0fa8fd6d94e13f0806d75c2f2e6746a199f28dfe Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:44:32 -0600 Subject: [PATCH 060/286] Revert "updatestomigrationdoc (#4343)" This reverts commit 812b38d59c344def28612c8548a867a9e1b5e0e8. Signed-off-by: Melissa Vagi --- _upgrade-to/upgrade-to.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/_upgrade-to/upgrade-to.md b/_upgrade-to/upgrade-to.md index b4b9cb6126..fc82528c53 100644 --- a/_upgrade-to/upgrade-to.md +++ b/_upgrade-to/upgrade-to.md @@ -18,8 +18,6 @@ If your existing cluster runs an older version of Elasticsearch OSS, the first s Cluster restart upgrades work between minor versions (for example, 6.5 to 6.8) and the next major version (for example, 6.x to 7.10.2). Cluster restart upgrades are faster to perform and require fewer intermediate upgrades, but require downtime. -To migrate a post-fork version of Elasticsearch (7.11+) to OpenSearch, you can use Logstash. You'll need to employ the Elasticsearch input plugin within Logstash to extract data from the Elasticsearch cluster, and the [Logstash Output OpenSearch plugin](https://github.com/opensearch-project/logstash-output-opensearch#configuration-for-logstash-output-opensearch-plugin) to write the data to the OpenSearch 2.x cluster. We suggest using Logstash version 7.13.4 or earlier, as newer versions may encounter compatibility issues when establishing a connection with OpenSearch due to changes introduced by Elasticsearch subsequent to the fork. We strongly recommend that users test this solution with their own data to ensure effectiveness. -{: .note} ## Migration paths @@ -64,7 +62,7 @@ If you are migrating an Open Distro for Elasticsearch cluster, we recommend firs sudo yum install elasticsearch-oss-7.10.2 --enablerepo=elasticsearch ``` - For tarball installations, extract to a new directory to ensure you **do not overwrite** your `config`, `data`, and `logs` directories. Ideally, these directories should have their own, independent paths and *not* be colocated with the Elasticsearch application directory. Then set the `ES_PATH_CONF` environment variable to the directory that contains `elasticsearch.yml` (for example, `/etc/elasticsearch/`). In `elasticsearch.yml`, set `path.data` and `path.logs` to your `data` and `logs` directories (for example, `/var/lib/elasticsearch` and `/var/log/opensearch`). + For tarball installations, extract to a new directory to ensure you **do not overwrite** your `config`, `data`, and `logs` directories. Ideally, these directories should have their own, independent paths and *not* be colocated with the Elasticsearch application directory. Then set the `ES_PATH_CONF` environment variable to the directory that contains `elasticsearch.yml` (for example, `/etc/elasticesarch/`). In `elasticsearch.yml`, set `path.data` and `path.logs` to your `data` and `logs` directories (for example, `/var/lib/elasticsearch` and `/var/log/opensearch`). 1. Restart Elasticsearch OSS on the node (rolling) or all nodes (cluster restart). From 9f6913c36631e8bc822e2af43150815e5b175dbd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:44:42 -0600 Subject: [PATCH 061/286] Revert "Update links in field types index (#4379)" This reverts commit 5cbef464b7c6ad8dd99378769d10027581fbe2ff. Signed-off-by: Melissa Vagi --- _field-types/supported-field-types/index.md | 26 ++++++++++----------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 5dd2b8d694..88fe8b038b 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -15,19 +15,19 @@ You can specify data types for your fields when creating a mapping. The followin Category | Field types and descriptions :--- | :--- -Alias | [`alias`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/alias/): An additional name for an existing field. -Binary | [`binary`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/binary/): A binary value in Base64 encoding. -[Numeric]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/), `scaled_float`, `short`). -Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/): A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/): A date stored in nanoseconds. -IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. -[Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). -[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. -[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. -[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. -[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). -Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. +Alias | [`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/): An additional name for an existing field. +Binary | [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/): A binary value in Base64 encoding. +[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/unsigned-long/), `scaled_float`, `short`). +Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/): A Boolean value. +[Date]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date-nanos/): A date stored in nanoseconds. +IP | [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. +[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). +[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. +[String]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. +[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. +[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/): A geographic shape. +[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). +Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/): Specifies to treat this field as a query. ## Arrays From 44533ad8dd72f05a53054c4f50e9627455ec7315 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:44:51 -0600 Subject: [PATCH 062/286] Revert "Reformat supported field types index page (#4349)" This reverts commit e4b81e68a5847b347c8494c775084e13cf88be55. Signed-off-by: Melissa Vagi --- _config.yml | 20 +++++++------- _field-types/supported-field-types/index.md | 30 ++++++++++----------- 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/_config.yml b/_config.yml index da62259ade..0d6ac85575 100644 --- a/_config.yml +++ b/_config.yml @@ -110,7 +110,7 @@ just_the_docs: name: OpenSearch Dashboards nav_fold: true tuning-your-cluster: - name: Creating and tuning your cluster + name: Tuning your cluster nav_fold: true security: name: Security in OpenSearch @@ -118,24 +118,27 @@ just_the_docs: security-analytics: name: Security analytics nav_fold: true - field-types: - name: Mappings and field types - nav_fold: true - query-dsl: - name: Query DSL, Aggregations, and Analyzers - nav_fold: true search-plugins: name: Search nav_fold: true ml-commons-plugin: name: Machine learning nav_fold: true + tuning-your-cluster: + name: Creating and tuning your cluster + nav_fold: true monitoring-your-cluster: name: Monitoring your cluster nav_fold: true observing-your-data: name: Observability nav_fold: true + query-dsl: + name: Query DSL, Aggregations, and Analyzers + nav_fold: true + field-types: + name: Mappings and field types + nav_fold: true clients: name: Clients nav_fold: true @@ -226,5 +229,4 @@ exclude: - vendor/gems/ - vendor/ruby/ - README.md - - .idea - - templates \ No newline at end of file + - .idea \ No newline at end of file diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 88fe8b038b..3cb8bff8cd 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -13,21 +13,21 @@ redirect_from: You can specify data types for your fields when creating a mapping. The following table lists all data field types that OpenSearch supports. -Category | Field types and descriptions -:--- | :--- -Alias | [`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/): An additional name for an existing field. -Binary | [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/): A binary value in Base64 encoding. -[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/unsigned-long/), `scaled_float`, `short`). -Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/): A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date-nanos/): A date stored in nanoseconds. -IP | [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. -[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). -[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/)| [`object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. -[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. -[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/): A geographic shape. -[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). -Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/): Specifies to treat this field as a query. +Field data type | Description +:--- | :--- +[`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/) | An additional name for an existing field. +[`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/) | A binary value in Base64 encoding. +[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | `byte`, `double`, `float`, `half_float`, `integer`, `long`, `unsigned_long`, `scaled_float`, `short`. +[`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/) | A Boolean value. +[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/) | `date`, `date_nanos`. +[`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) | An IP address in IPv4 or IPv6 format. +[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | `integer_range`, `long_range`,`double_range`, `float_range`, `date_range`,`ip_range`. +[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/) | `object`, `nested`, `join`. +String | [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/), [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/), [`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/). +[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) | `completion`, `search_as_you_type`. +[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/) | `geo_point`, `geo_shape`. +[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | `rank_feature`, `rank_features`. +[`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/) | Specifies to treat this field as a query. ## Arrays From 5756737b0b4730fd249edf5fec679bea8e3dc306 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:44:56 -0600 Subject: [PATCH 063/286] Revert "Update nested.md (#4363)" This reverts commit aead297cd08f84ce1948df363f431bfbe9664a03. Signed-off-by: Melissa Vagi --- _field-types/supported-field-types/nested.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_field-types/supported-field-types/nested.md b/_field-types/supported-field-types/nested.md index e6f2eec6c3..d09caf0ea8 100644 --- a/_field-types/supported-field-types/nested.md +++ b/_field-types/supported-field-types/nested.md @@ -203,5 +203,5 @@ Parameter | Description :--- | :--- [`dynamic`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to this object. Valid values are `true`, `false`, and `strict`. Default is `true`. `include_in_parent` | A Boolean value that specifies whether all fields in the child nested object should also be added to the parent document in flattened form. Default is `false`. -`include_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. +`incude_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. From e50c25b5f141a397d40cd6ce8d4803a4fb193ee3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:45:01 -0600 Subject: [PATCH 064/286] Revert "Add model access control documentation for ML Commons (#4223)" This reverts commit e69009b5db7f3850d3c88e57ac575d1d701be74c. Signed-off-by: Melissa Vagi --- _ml-commons-plugin/algorithms.md | 10 +- _ml-commons-plugin/api.md | 231 +++----- _ml-commons-plugin/index.md | 11 +- _ml-commons-plugin/model-access-control.md | 593 --------------------- 4 files changed, 74 insertions(+), 771 deletions(-) delete mode 100644 _ml-commons-plugin/model-access-control.md diff --git a/_ml-commons-plugin/algorithms.md b/_ml-commons-plugin/algorithms.md index 1db8b432a9..7fccd92d8b 100644 --- a/_ml-commons-plugin/algorithms.md +++ b/_ml-commons-plugin/algorithms.md @@ -27,7 +27,7 @@ distance_type | enum, such as `EUCLIDEAN`, `COSINE`, or `L1` | The type of measu ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -77,7 +77,7 @@ optimizerType | OptimizerType | The optimizer used in the model. | SIMPLE_SGD ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) ### Example @@ -189,7 +189,7 @@ time_zone | string | The time zone for the `time_field` field. | "UTC" ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -211,7 +211,7 @@ RCF Summarize is a clustering algorithm based on the Clustering Using Representa ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) * [Train and predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-and-predict) @@ -429,7 +429,7 @@ A classification algorithm, logistic regression models the probability of a disc ### APIs -* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#training-the-model) +* [Train]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#train-model) * [Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/#predict) ### Example: Train/Predict with Iris data diff --git a/_ml-commons-plugin/api.md b/_ml-commons-plugin/api.md index 8e4535eb54..3d5fe2358e 100644 --- a/_ml-commons-plugin/api.md +++ b/_ml-commons-plugin/api.md @@ -9,49 +9,31 @@ nav_order: 99 --- -
- - Table of contents - - {: .text-delta } +#### Table of contents - TOC {:toc} -
+ --- -The ML Commons API lets you train machine learning (ML) algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same dataset. +The Machine Learning (ML) commons API lets you train ML algorithms synchronously and asynchronously, make predictions with that trained model, and train and predict with the same data set. -To train tasks through the API, three inputs are required: +In order to train tasks through the API, three inputs are required. - Algorithm name: Must be one of a [FunctionName](https://github.com/opensearch-project/ml-commons/blob/1.3/common/src/main/java/org/opensearch/ml/common/parameter/FunctionName.java). This determines what algorithm the ML Engine runs. To add a new function, see [How To Add a New Function](https://github.com/opensearch-project/ml-commons/blob/main/docs/how-to-add-new-function.md). -- Model hyperparameters: Adjust these parameters to improve model accuracy. -- Input data: The data that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use a data frame. - -## Model access control considerations - -For clusters with model access control enabled, users can perform API operations on models in model groups with specified access levels as follows: - -- `public` model group: Any user. -- `restricted` model group: Only the model owner or users who share at least one backend role with the model group. -- `private` model group: Only the model owner. - -For clusters with model access control disabled, any user can perform API operations on models in any model group. +- Model hyper parameters: Adjust these parameters to make the model train better. +- Input data: The data input that trains the ML model, or applies the ML models to predictions. You can input data in two ways, query against your index or use data frame. -Admin users can perform API operations for models in any model group. -For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). +## Train model - -## Training the model - -The train API operation trains a model based on a selected algorithm. Training can occur both synchronously and asynchronously. +The train operation trains a model based on a selected algorithm. Training can occur both synchronously and asynchronously. ### Request -The following examples use the k-means algorithm to train index data. +The following examples use the kmeans algorithm to train index data. -**Train with k-means synchronously** +**Train with kmeans synchronously** ```json POST /_plugins/_ml/_train/kmeans @@ -70,9 +52,8 @@ POST /_plugins/_ml/_train/kmeans ] } ``` -{% include copy-curl.html %} -**Train with k-means asynchronously** +**Train with kmeans asynchronously** ```json POST /_plugins/_ml/_train/kmeans?async=true @@ -91,13 +72,12 @@ POST /_plugins/_ml/_train/kmeans?async=true ] } ``` -{% include copy-curl.html %} ### Response -**Synchronous** +**Synchronously** -For synchronous responses, the API returns the `model_id`, which can be used to get or delete a model. +For synchronous responses, the API returns the model_id, which can be used to get or delete a model. ```json { @@ -106,9 +86,9 @@ For synchronous responses, the API returns the `model_id`, which can be used to } ``` -**Asynchronous** +**Asynchronously** -For asynchronous responses, the API returns the `task_id`, which can be used to get or delete a task. +For asynchronous responses, the API returns the task_id, which can be used to get or delete a task. ```json { @@ -119,56 +99,30 @@ For asynchronous responses, the API returns the `task_id`, which can be used to ## Getting model information -You can retrieve model information using the `model_id`. - -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods +You can retrieve information on your model using the model_id. ```json GET /_plugins/_ml/models/ ``` -{% include copy-curl.html %} -The response contains the following model information: +The API returns information on the model, the algorithm used, and the content found within the model. ```json { -"name" : "all-MiniLM-L6-v2_onnx", -"algorithm" : "TEXT_EMBEDDING", -"version" : "1", -"model_format" : "TORCH_SCRIPT", -"model_state" : "LOADED", -"model_content_size_in_bytes" : 83408741, -"model_content_hash_value" : "9376c2ebd7c83f99ec2526323786c348d2382e6d86576f750c89ea544d6bbb14", -"model_config" : { - "model_type" : "bert", - "embedding_dimension" : 384, - "framework_type" : "SENTENCE_TRANSFORMERS", - "all_config" : """{"_name_or_path":"nreimers/MiniLM-L6-H384-uncased","architectures":["BertModel"],"attention_probs_dropout_prob":0.1,"gradient_checkpointing":false,"hidden_act":"gelu","hidden_dropout_prob":0.1,"hidden_size":384,"initializer_range":0.02,"intermediate_size":1536,"layer_norm_eps":1e-12,"max_position_embeddings":512,"model_type":"bert","num_attention_heads":12,"num_hidden_layers":6,"pad_token_id":0,"position_embedding_type":"absolute","transformers_version":"4.8.2","type_vocab_size":2,"use_cache":true,"vocab_size":30522}""" -}, -"created_time" : 1665961344044, -"last_uploaded_time" : 1665961373000, -"last_loaded_time" : 1665961815959, -"total_chunks" : 9 + "name" : "KMEANS", + "algorithm" : "KMEANS", + "version" : 1, + "content" : "" } ``` ## Registering a model -Before you register a model, you must [register a model group]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#registering-a-model-group) for the model. -{: .important} - -All versions of a particular model are held in a model group. After you register a model group, you can register a model to the model group. ML Commons splits the model into smaller chunks and saves those chunks in the model's index. - -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods +Use the register operation to register a custom model to a model index. ML Commons splits the model into smaller chunks and saves those chunks in the model's index. ```json POST /_plugins/_ml/models/_register ``` -{% include copy-curl.html %} ### Request fields @@ -176,13 +130,11 @@ All request fields are required. Field | Data type | Description :--- | :--- | :--- -`name`| String | The model's name. | -`version` | Integer | The model's version number. | -`model_format` | String | The portable format of the model file. Currently only supports `TORCH_SCRIPT`. | -`model_group_id` | String | The model group ID for the model. -`model_content_hash_value` | String | The model content hash generated using the SHA-256 hashing algorithm. -`model_config` | JSON object | The model's configuration, including the `model_type`, `embedding_dimension`, and `framework_type`. `all_config` is an optional JSON string that contains all model configurations. | -`url` | String | The URL that contains the model. | +`name`| string | The name of the model. | +`version` | integer | The version number of the model. | +`model_format` | string | The portable format of the model file. Currently only supports `TORCH_SCRIPT`. | +`model_config` | json object | The model's configuration, including the `model_type`, `embedding_dimension`, and `framework_type`. `all_config` is an optional JSON string which contains all model configurations. | +`url` | string | The URL which contains the model. | ### Example @@ -191,22 +143,18 @@ The following example request registers a version `1.0.0` of an NLP sentence tra ```json POST /_plugins/_ml/models/_register { - "name": "all-MiniLM-L6-v2", - "version": "1.0.0", - "description": "test model", - "model_format": "TORCH_SCRIPT", - "model_group_id": "FTNlQ4gBYW0Qyy5ZoxfR", - "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f", - "model_config": { - "model_type": "bert", - "embedding_dimension": 384, - "framework_type": "sentence_transformers", - "all_config": "{\"_name_or_path\":\"nreimers/MiniLM-L6-H384-uncased\",\"architectures\":[\"BertModel\"],\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}" - }, - "url": "https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/all-MiniLM-L6-v2/1.0.1/torch_script/sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip" + "name": "all-MiniLM-L6-v2", + "version": "1.0.0", + "description": "test model", + "model_format": "TORCH_SCRIPT", + "model_config": { + "model_type": "bert", + "embedding_dimension": 384, + "framework_type": "sentence_transformers", + }, + "url": "https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true" } ``` -{% include copy-curl.html %} ### Response @@ -219,27 +167,18 @@ OpenSearch responds with the `task_id` and task `status`. } ``` -To see the status of your model registration and retrieve the model ID created for the new model version, pass the `task_id` as a path parameter to the Tasks API: - -```json -GET /_plugins/_ml/tasks/ -``` -{% include copy-curl.html %} - -The response contains the model ID of the model version: +To see the status of your model registration, enter the `task_id` in the [task API] ... ```json { - "model_id": "Qr1YbogBYOqeeqR7sI9L", - "task_type": "DEPLOY_MODEL", - "function_name": "TEXT_EMBEDDING", - "state": "COMPLETED", - "worker_node": [ - "N77RInqjTSq_UaLh1k0BUg" - ], - "create_time": 1685478486057, - "last_update_time": 1685478491090, - "is_async": true + "model_id" : "WWQI44MBbzI2oUKAvNUt", + "task_type" : "UPLOAD_MODEL", + "function_name" : "TEXT_EMBEDDING", + "state" : "REGISTERED", + "worker_node" : "KzONM8c8T4Od-NoUANQNGg", + "create_time" : 1665961344003, + "last_update_time" : 1665961373047, + "is_async" : true } ``` @@ -247,10 +186,6 @@ The response contains the model ID of the model version: The deploy model operation reads the model's chunks from the model index and then creates an instance of the model to cache into memory. This operation requires the `model_id`. -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods - ```json POST /_plugins/_ml/models//_deploy ``` @@ -262,7 +197,6 @@ In this example request, OpenSearch deploys the model to any available OpenSearc ```json POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy ``` -{% include copy-curl.html %} ### Example: Deploying to a specific node @@ -274,7 +208,6 @@ POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy "node_ids": ["4PLK7KJWReyX0oWKnBA8nA"] } ``` -{% include copy-curl.html %} ### Response @@ -287,11 +220,7 @@ POST /_plugins/_ml/models/WWQI44MBbzI2oUKAvNUt/_deploy ## Undeploying a model -To undeploy a model from memory, use the undeploy operation. - -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods +To undeploy a model from memory, use the undeploy operation: ```json POST /_plugins/_ml/models//_undeploy @@ -302,7 +231,6 @@ POST /_plugins/_ml/models//_undeploy ```json POST /_plugins/_ml/models/MGqJhYMBbbh0ushjm8p_/_undeploy ``` -{% include copy-curl.html %} ### Response: Undeploying a model from all ML nodes @@ -325,7 +253,7 @@ POST /_plugins/_ml/models/_undeploy "model_ids": ["KDo2ZYQB-v9VEDwdjkZ4"] } ``` -{% include copy-curl.html %} + ### Response: Undeploying specific models from specific nodes @@ -359,7 +287,6 @@ POST /_plugins/_ml/models/_undeploy "model_ids": ["KDo2ZYQB-v9VEDwdjkZ4"] } ``` -{% include copy-curl.html %} ### Response: Undeploying specific models from all nodes @@ -375,24 +302,15 @@ POST /_plugins/_ml/models/_undeploy ## Searching for a model -Use this command to search for models you've already created. +Use this command to search models you've already created. -The response will contain only those model versions to which you have access. For example, if you send a match all query, model versions for the following model group types will be returned: - -- All public model groups in the index. -- Private model groups for which you are the model owner. -- Model groups that share at least one backend role with your backend roles. - -For more information, see [Model access control]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control/). - -### Path and HTTP methods ```json POST /_plugins/_ml/models/_search {query} ``` -### Example: Searching for all models +### Example: Querying all models ```json POST /_plugins/_ml/models/_search @@ -403,9 +321,8 @@ POST /_plugins/_ml/models/_search "size": 1000 } ``` -{% include copy-curl.html %} -### Example: Searching for models with algorithm "FIT_RCF" +### Example: Querying models with algorithm "FIT_RCF" ```json POST /_plugins/_ml/models/_search @@ -419,7 +336,6 @@ POST /_plugins/_ml/models/_search } } ``` -{% include copy-curl.html %} ### Response @@ -477,14 +393,9 @@ POST /_plugins/_ml/models/_search Deletes a model based on the `model_id`. -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods - ```json DELETE /_plugins/_ml/models/ ``` -{% include copy-curl.html %} The API returns the following: @@ -519,8 +430,8 @@ GET /_plugins/_ml/profile/tasks Parameter | Data type | Description :--- | :--- | :--- -`model_id` | String | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. -`tasks`| String | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. +model_id | string | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. +tasks | string | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. ### Request fields @@ -528,11 +439,11 @@ All profile body request fields are optional. Field | Data type | Description :--- | :--- | :--- -`node_ids` | String | Returns all tasks and profiles from a specific node. -`model_ids` | String | Returns runtime data for a specific model. You can string together multiple model IDs to return multiple model profiles. -`task_ids` | String | Returns runtime data for a specific task. You can string together multiple task IDs to return multiple task profiles. -`return_all_tasks` | Boolean | Determines whether or not a request returns all tasks. When set to `false`, task profiles are left out of the response. -`return_all_models` | Boolean | Determines whether or not a profile request returns all models. When set to `false`, model profiles are left out of the response. +node_ids | string | Returns all tasks and profiles from a specific node. +model_ids | string | Returns runtime data for a specific model. You can string together multiple `model_id`s to return multiple model profiles. +task_ids | string | Returns runtime data for a specific task. You can string together multiple `task_id`s to return multiple task profiles. +return_all_tasks | boolean | Determines whether or not a request returns all tasks. When set to `false` task profiles are left out of the response. +return_all_models | boolean | Determines whether or not a profile request returns all models. When set to `false` model profiles are left out of the response. ### Example: Returning all tasks and models on a specific node @@ -544,7 +455,6 @@ GET /_plugins/_ml/profile "return_all_models": true } ``` -{% include copy-curl.html %} ### Response: Returning all tasks and models on a specific node @@ -590,10 +500,6 @@ GET /_plugins/_ml/profile ML Commons can predict new data with your trained model either from indexed data or a data frame. To use the Predict API, the `model_id` is required. -For information about user access for this API, see [Model access control considerations](#model-access-control-considerations). - -### Path and HTTP methods - ```json POST /_plugins/_ml/_predict// ``` @@ -612,7 +518,6 @@ POST /_plugins/_ml/_predict/kmeans/ ] } ``` -{% include copy-curl.html %} ### Response @@ -682,14 +587,15 @@ POST /_plugins/_ml/_predict/kmeans/ ## Train and predict -Use to train and then immediately predict against the same training dataset. Can only be used with unsupervised learning models and the following algorithms: +Use to train and then immediately predict against the same training data set. Can only be used with unsupervised learning models and the following algorithms: - BATCH_RCF - FIT_RCF -- k-means +- kmeans ### Example: Train and predict with indexed data + ```json POST /_plugins/_ml/_train_predict/kmeans { @@ -719,7 +625,6 @@ POST /_plugins/_ml/_train_predict/kmeans ] } ``` -{% include copy-curl.html %} ### Example: Train and predict with data directly @@ -819,7 +724,6 @@ POST /_plugins/_ml/_train_predict/kmeans } } ``` -{% include copy-curl.html %} ### Response @@ -894,7 +798,6 @@ You can retrieve information about a task using the task_id. ```json GET /_plugins/_ml/tasks/ ``` -{% include copy-curl.html %} The response includes information about the task. @@ -922,7 +825,7 @@ GET /_plugins/_ml/tasks/_search ``` -### Example: Search task which `function_name` is `KMEANS` +### Example: Search task which "function_name" is "KMEANS" ```json GET /_plugins/_ml/tasks/_search @@ -940,7 +843,6 @@ GET /_plugins/_ml/tasks/_search } } ``` -{% include copy-curl.html %} ### Response @@ -1014,7 +916,6 @@ ML Commons does not check the task status when running the `Delete` request. The ```json DELETE /_plugins/_ml/tasks/{task_id} ``` -{% include copy-curl.html %} The API returns the following: @@ -1043,28 +944,24 @@ To receive all stats, use: ```json GET /_plugins/_ml/stats ``` -{% include copy-curl.html %} To receive stats for a specific node, use: ```json GET /_plugins/_ml//stats/ ``` -{% include copy-curl.html %} -To receive stats for a specific node and return a specified stat, use: +To receive stats for a specific node and return a specified stat, use: ```json GET /_plugins/_ml//stats/ ``` -{% include copy-curl.html %} To receive information on a specific stat from all nodes, use: ```json GET /_plugins/_ml/stats/ ``` -{% include copy-curl.html %} ### Example: Get all stats @@ -1072,7 +969,6 @@ GET /_plugins/_ml/stats/ ```json GET /_plugins/_ml/stats ``` -{% include copy-curl.html %} ### Response @@ -1137,7 +1033,6 @@ POST /_plugins/_ml/_execute/anomaly_localization "num_outputs": 10 } ``` -{% include copy-curl.html %} Upon execution, the API returns the following: diff --git a/_ml-commons-plugin/index.md b/_ml-commons-plugin/index.md index 35ebb8d1e8..2358b982cd 100644 --- a/_ml-commons-plugin/index.md +++ b/_ml-commons-plugin/index.md @@ -13,16 +13,17 @@ ML Commons for OpenSearch eases the development of machine learning features by Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [`ad`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#ad) and [`kmeans`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#kmeans) Piped Processing Language (PPL) commands. -Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#training-the-model) through the ML Commons plugin support model-based algorithms such as k-means. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. +Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely. Should you not want to use a model, you can use the [Train and Predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-and-predict) API to test your model without having to evaluate the model's performance. -# Permissions -The ML Commons plugin has two reserved roles: +## Permissions -- `ml_full_access`: Grants full access to all ML features, including starting new ML tasks and reading or deleting models. -- `ml_readonly_access`: Grants read-only access to ML tasks, trained models, and statistics relevant to the model's cluster. Does not grant permissions to start or delete ML tasks or models. +There are two reserved user roles that can use of the ML Commons plugin. + +- `ml_full_access`: Full access to all ML features, including starting new ML tasks and reading or deleting models. +- `ml_readonly_access`: Can only read ML tasks, trained models and statistics relevant to the model's cluster. Cannot start nor delete ML tasks or models. ## ML node diff --git a/_ml-commons-plugin/model-access-control.md b/_ml-commons-plugin/model-access-control.md deleted file mode 100644 index 26c6a76d20..0000000000 --- a/_ml-commons-plugin/model-access-control.md +++ /dev/null @@ -1,593 +0,0 @@ ---- -layout: default -title: Model access control -has_children: false -nav_order: 180 ---- - -# Model access control - -You can use the Security plugin with ML Commons to manage access to specific models for non-admin users. For example, one department in an organization might want to restrict users in other departments from accessing their models. - -To accomplish this, users are assigned one or more [_backend roles_]({{site.url}}{{site.baseurl}}/security/access-control/index/). Rather than assign individual roles to individual users during user configuration, backend roles provide a way to map a set of users to a role by assigning the backend role to users when they log in. For example, users may be assigned an `IT` backend role that includes the `ml_full_access` role and have full access to all ML Commons features. Alternatively, other users may be assigned an `HR` backend role that includes the `ml_readonly_access` role and be limited to read-only access to machine learning (ML) features. Given this flexibility, backend roles can provide finer-grained access to models and make it easier to assign multiple users to a role rather than mapping a user and role individually. - -## Model groups - -For access control, models are organized into _model groups_---collections of versions of a particular model. Like users, model groups can be assigned one or more backend roles. All versions of the same model share the same model name and have the same backend role or roles. - -You are considered a model _owner_ when you create a new model group. You remain the owner of the model and all its versions even if another user registers a model to this model group. When a model owner creates a model group, the owner can specify one of the following _access modes_ for this model group: - -- `public`: All users who have access to the cluster can access this model group. -- `private`: Only the model owner or an admin user can access this model group. -- `restricted`: The owner, an admin user, or any user who shares one of the model group's backend roles can access any model in this model group. When creating a `restricted` model group, the owner must attach one or more of the owner's backend roles to the model. - -An admin can access all model groups in the cluster regardless of their access mode. -{: .note} - -## Model access control prerequisites - -Before using model access control, you must satisfy the following prerequisites: - -1. Enable the Security plugin on your cluster. For more information, see [Security in OpenSearch]({{site.url}}{{site.baseurl}}/security/). -2. For `restricted` model groups, ensure that an admin has [assigned backend roles to users](#assigning-backend-roles-to-users). -3. [Enable model access control](#enabling-model-access-control) on your cluster. You can enable model access control dynamically by setting `plugins.ml_commons.model_access_control_enabled` to `true`. - -If any of the prerequisites are not met, all models in the cluster are `public` and can be accessed by any user who has access to the cluster. -{: .note} - -## Assigning backend roles to users - -Create the appropriate backend roles and assign those roles to users. Backend roles usually come from an [LDAP server]({{site.url}}{{site.baseurl}}/security/configuration/ldap/) or [SAML provider]({{site.url}}{{site.baseurl}}/security/configuration/saml/), but if you use the internal user database, you can use the REST API to [add them manually]({{site.url}}{{site.baseurl}}/security/access-control/api#create-user). - -Only admin users can assign backend roles to users. -{: .note} - -When assigning backend roles, consider the following example of two users: `alice` and `bob`. - -The following request assigns the user `alice` the `analyst` backend role: - -```json -PUT _plugins/_security/api/internalusers/alice -{ - "password": "alice", - "backend_roles": [ - "analyst" - ], - "attributes": {} -} -``` - -The next request assigns the user `bob` the `human-resources` backend role: - -```json -PUT _plugins/_security/api/internalusers/bob -{ - "password": "bob", - "backend_roles": [ - "human-resources" - ], - "attributes": {} -} -``` - -Finally, the last request assigns both `alice` and `bob` the role that gives them full access to ML Commons: - -```json -PUT _plugins/_security/api/rolesmapping/ml_full_access -{ - "backend_roles": [], - "hosts": [], - "users": [ - "alice", - "bob" - ] -} -``` - -If `alice` creates a model group and assigns it the `analyst` backend role, `bob` cannot access this model. - -## Enabling model access control - -You can enable model access control dynamically as follows: - -```json -PUT _cluster/settings -{ - "transient": { - "plugins.ml_commons.model_access_control_enabled": "true" - } -} -``` -{% include copy-curl.html %} - -## Registering a model group - -Use the `_register` endpoint to register a model group. You can register a model group with a `public`, `private`, or `restricted` access mode. - -### Path and HTTP method - -```json -POST /_plugins/_ml/model_groups/_register -``` - -### Request fields - -The following table lists the available request fields. - -Field |Data type | Description -:--- | :--- | :--- -`name` | String | The model group name. Required. -`description` | String | The model group description. Optional. -`model_access_mode` | String | The access mode for this model. Valid values are `public`, `private`, and `restricted`. When this parameter is set to `restricted`, you must specify either `backend_roles` or `add_all_backend_roles`, but not both. Optional. Default is `restricted`. -`backend_roles` | Array | A list of the model owner's backend roles to add to the model. Can be specified only if the `model_access_mode` is `restricted`. Cannot be specified at the same time as `add_all_backend_roles`. Optional. -`add_all_backend_roles` | Boolean | If `true`, all backend roles of the model owner are added to the model group. Default is `false`. Cannot be specified at the same time as `backend_roles`. Admin users cannot set this parameter to `true`. Optional. - -#### Example request - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "test_model_group_public", - "description": "This is a public model group", - "model_access_mode": "public" -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "model_group_id": "GDNmQ4gBYW0Qyy5ZcBcg", - "status": "CREATED" -} -``` - -### Response fields - -The following table lists the available response fields. - -Field |Data type | Description -:--- | :--- | :--- -`model_group_id` | String | The model group ID that you can use to access this model group. -`status` | String | The operation status. - -### Registering a public model group - -If you register a model group with a `public` access mode, any model in this model group will be accessible to any user with access to the cluster. The following request registers a public model group: - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "test_model_group_public", - "description": "This is a public model group", - "model_access_mode": "public" -} -``` -{% include copy-curl.html %} - -### Registering a restricted model group - -To limit access by backend role, you must register a model group with the `restricted` access mode. - -When registering a model group, you must attach one or more of your backend roles to the model using one but not both of the following methods: - - Provide a list of backend roles in the `backend_roles` parameter. - - Set the `add_all_backend_roles` parameter to `true` to add all your backend roles to the model group. This option is not available to admin users. - -Any user who shares a backend role with the model group can access any model in this model group. This grants the user the permissions included with the user role that is mapped to the backend role. - -An admin user can access all model groups regardless of their access mode. -{: .note} - -#### Example request: A list of backend roles - -The following request registers a restricted model group, which can be accessed only by users with the `IT` backend role: - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "model_group_test", - "description": "This is an example description", - "model_access_mode": "restricted", - "backend_roles" : ["IT"] -} -``` -{% include copy-curl.html %} - -#### Example request: All backend roles - -The following request registers a restricted model group, adding all backend roles of the user to the model group: - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "model_group_test", - "description": "This is an example description", - "model_access_mode": "restricted", - "add_all_backend_roles": "true" -} -``` -{% include copy-curl.html %} - -### Registering a private model group - -If you register a model group with a `private` access mode, any model in this model group will be accessible only to you and the admin users. The following request registers a private model group: - -```json -POST /_plugins/_ml/model_groups/_register -{ - "name": "model_group_test", - "description": "This is an example description", - "model_access_mode": "private" -} -``` -{% include copy-curl.html %} - -### Registering a model group in a cluster where model access control is disabled - -If model access control is disabled on your cluster (one of the [prerequisites](#model-access-control-prerequisites) is not met), you can register a model group with a `name` and `description` but cannot specify any of the access parameters (`model_access_name`, `backend_roles`, or `add_backend_roles`). By default, in such a cluster, all model groups are public. - -## Updating a model group - -To update a model group, send a request to the `_update` endpoint. - -When updating a model group, the following restrictions apply: - -- The model owner or an admin user can update all fields. Any user who shares one or more backend roles with the model group can update the `name` and `description` fields only. -- When updating the `model_access_mode` to `restricted`, you must specify one but not both `backend_roles` or `add_all_backend_roles`. - -### Path and HTTP method - -```json -PUT /_plugins/_ml/model_groups/_update -``` - -### Request fields - -Refer to [Request fields](#request-fields-1) for request field descriptions. - -#### Example request - -```json -PUT /_plugins/_ml/model_groups/_update -{ - "name": "model_group_test", - "description": "This is an example description", - "add_all_backend_roles": true -} -``` -{% include copy-curl.html %} - -### Updating a model group in a cluster where model access control is disabled - -If model access control is disabled on your cluster (one of the [prerequisites](#model-access-control-prerequisites) is not met), you can update only the `name` and `description` of a model group but cannot update any of the access parameters (`model_access_name`, `backend_roles`, or `add_backend_roles`). - -## Searching for a model group - -When you search for a model group, only those model groups to which you have access will be returned. For example, for a match all query, model groups that will be returned are: - -- All public model groups in the index -- Private model groups for which you are the owner -- Model groups that share at least one of the `backend_roles` with you - -### Path and HTTP method - -```json -POST /_plugins/_ml/model_groups/_search -GET /_plugins/_ml/model_groups/_search -``` - -#### Example request: Match all - -The following request is sent by `user1` who has the `IT` and `HR` roles: - -```json -POST /_plugins/_ml/model_groups/_search -{ - "query": { - "match_all": {} - }, - "size": 1000 -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 31, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 7, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": ".plugins-ml-model-group", - "_id": "TRqZfYgBD7s2oEFdvrQj", - "_version": 1, - "_seq_no": 2, - "_primary_term": 1, - "_score": 1, - "_source": { - "backend_roles": [ - "HR", - "IT" - ], - "owner": { - "backend_roles": [ - "HR", - "IT" - ], - "custom_attribute_names": [], - "roles": [ - "ml_full_access", - "own_index", - "test_ml" - ], - "name": "user1", - "user_requested_tenant": "__user__" - }, - "created_time": 1685734407714, - "access": "restricted", - "latest_version": 0, - "last_updated_time": 1685734407714, - "name": "model_group_test", - "description": "This is an example description" - } - }, - { - "_index": ".plugins-ml-model-group", - "_id": "URqZfYgBD7s2oEFdyLTm", - "_version": 1, - "_seq_no": 3, - "_primary_term": 1, - "_score": 1, - "_source": { - "backend_roles": [ - "IT" - ], - "owner": { - "backend_roles": [ - "HR", - "IT" - ], - "custom_attribute_names": [], - "roles": [ - "ml_full_access", - "own_index", - "test_ml" - ], - "name": "user1", - "user_requested_tenant": "__user__" - }, - "created_time": 1685734410470, - "access": "restricted", - "latest_version": 0, - "last_updated_time": 1685734410470, - "name": "model_group_test", - "description": "This is an example description" - } - }, - ... - ] - } -} -``` - -#### Example request: Search for model groups with an owner name - -The following request to search for model groups of `user` is sent by `user2` who has the `IT` backend role: - -```json -GET /_plugins/_ml/model_groups/_search -{ - "query": { - "bool": { - "must": [ - { - "nested": { - "query": { - "term": { - "owner.name.keyword": { - "value": "user1", - "boost": 1 - } - } - }, - "path": "owner", - "ignore_unmapped": false, - "score_mode": "none", - "boost": 1 - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 6, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4, - "relation": "eq" - }, - "max_score": 0, - "hits": [ - { - "_index": ".plugins-ml-model-group", - "_id": "TRqZfYgBD7s2oEFdvrQj", - "_version": 1, - "_seq_no": 2, - "_primary_term": 1, - "_score": 0, - "_source": { - "backend_roles": [ - "HR", - "IT" - ], - "owner": { - "backend_roles": [ - "HR", - "IT" - ], - "custom_attribute_names": [], - "roles": [ - "ml_full_access", - "own_index", - "test_ml" - ], - "name": "user1", - "user_requested_tenant": "__user__" - }, - "created_time": 1685734407714, - "access": "restricted", - "latest_version": 0, - "last_updated_time": 1685734407714, - "name": "model_group_test", - "description": "This is an example description" - } - }, - ... - ] - } -} -``` - -#### Example request: Search for model groups with a model group ID - -```json -GET /_plugins/_ml/model_groups/_search -{ - "query": { - "bool": { - "must": [ - { - "terms": { - "_id": [ - "HyPNK4gBwNxGowI0AtDk" - ] - } - } - ] - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 2, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": ".plugins-ml-model-group", - "_id": "HyPNK4gBwNxGowI0AtDk", - "_version": 3, - "_seq_no": 16, - "_primary_term": 5, - "_score": 1, - "_source": { - "backend_roles": [ - "IT" - ], - "owner": { - "backend_roles": [ - "", - "HR", - "IT" - ], - "custom_attribute_names": [], - "roles": [ - "ml_full_access", - "own_index", - "test-ml" - ], - "name": "user1", - "user_requested_tenant": null - }, - "created_time": 1684362035938, - "latest_version": 2, - "last_updated_time": 1684362571300, - "name": "model_group_test", - "description": "This is an example description" - } - } - ] - } -} -``` - -## Deleting a model group - -You can only delete a model group if it does not contain any model versions. -{: .important} - -If model access control is enabled on your cluster, only the owner or users with matching backend roles can delete the model group. Any users can delete any public model group. - -If model access control is disabled on your cluster, users with the `delete model group API` permission can delete any model group. - -Admin users can delete any model group. -{: .note} - -#### Example request - -```json -DELETE _plugins/_ml/model_groups/ -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "_index": ".plugins-ml-model-group", - "_id": "l8nnQogByXnLJ-QNpEk2", - "_version": 5, - "result": "deleted", - "_shards": { - "total": 2, - "successful": 1, - "failed": 0 - }, - "_seq_no": 70, - "_primary_term": 23 -} -``` From 2a2821f5ce1c607f38a07d506ebcbde4fea6dfc6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:45:06 -0600 Subject: [PATCH 065/286] Revert "Add date nanoseconds field type (#4348)" This reverts commit 79825979a1e40e8cf553a9fde8cadc7bcec1647b. Signed-off-by: Melissa Vagi --- .../supported-field-types/date-nanos.md | 290 ------------------ _field-types/supported-field-types/date.md | 3 +- _field-types/supported-field-types/dates.md | 17 - .../supported-field-types/geographic.md | 2 +- _field-types/supported-field-types/index.md | 2 +- .../supported-field-types/object-fields.md | 2 +- _field-types/supported-field-types/rank.md | 2 +- _field-types/supported-field-types/string.md | 2 +- 8 files changed, 6 insertions(+), 314 deletions(-) delete mode 100644 _field-types/supported-field-types/date-nanos.md delete mode 100644 _field-types/supported-field-types/dates.md diff --git a/_field-types/supported-field-types/date-nanos.md b/_field-types/supported-field-types/date-nanos.md deleted file mode 100644 index 12399a69d4..0000000000 --- a/_field-types/supported-field-types/date-nanos.md +++ /dev/null @@ -1,290 +0,0 @@ ---- -layout: default -title: Date nanoseconds -nav_order: 35 -has_children: false -parent: Date field types -grand_parent: Supported field types ---- - -# Date nanoseconds field type - -The `date_nanos` field type is similar to the [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) field type in that it holds a date. However, `date` stores the date in millisecond resolution, while `date_nanos` stores the date in nanosecond resolution. Dates are stored as `long` values that correspond to nanoseconds since the epoch. Therefore, the range of supported dates is approximately 1970--2262. - -Queries on `date_nanos` fields are converted to range queries on the field value's `long` representation. Then the stored fields and aggregation results are converted to a string using the format set on the field. - -The `date_nanos` field supports all [formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#formats) and [parameters]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#parameters) that `date` supports. You can use multiple formats separated by `||`. -{: .note} - -For `date_nanos` fields, you can use the `strict_date_optional_time_nanos` format to preserve nanosecond resolution. If you don't specify the format when mapping a field as `date_nanos`, the default format is `strict_date_optional_time||epoch_millis` that lets you pass values in either `strict_date_optional_time` or `epoch_millis` format. The `strict_date_optional_time` format supports dates in nanosecond resolution, but the `epoch_millis` format supports dates in millisecond resolution only. - -## Example - -Create a mapping with the `date` field of type `date_nanos` that has the `strict_date_optional_time_nanos` format: - -```json -PUT testindex/_mapping -{ - "properties": { - "date": { - "type": "date_nanos", - "format" : "strict_date_optional_time_nanos" - } - } -} -``` -{% include copy-curl.html %} - -Index two documents into the index: - -```json -PUT testindex/_doc/1 -{ "date": "2022-06-15T10:12:52.382719622Z" } -``` -{% include copy-curl.html %} - -```json -PUT testindex/_doc/2 -{ "date": "2022-06-15T10:12:52.382719624Z" } -``` -{% include copy-curl.html %} - -You can use a range query to search for a date range: - -```json -GET testindex/_search -{ - "query": { - "range": { - "date": { - "gte": "2022-06-15T10:12:52.382719621Z", - "lte": "2022-06-15T10:12:52.382719623Z" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the document whose date is in the specified range: - -```json -{ - "took": 43, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - } - } - ] - } -} -``` - -When querying documents with `date_nanos` fields, you can use `fields` or `docvalue_fields`: - -```json -GET testindex/_search -{ - "fields": ["date"] -} -``` -{% include copy-curl.html %} - -```json -GET testindex/_search -{ - "docvalue_fields" : [ - { - "field" : "date" - } - ] -} -``` -{% include copy-curl.html %} - -The response to either of the preceding queries contains both indexed documents: - -```json -{ - "took": 4, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - }, - "fields": { - "date": [ - "2022-06-15T10:12:52.382719622Z" - ] - } - }, - { - "_index": "testindex", - "_id": "2", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719624Z" - }, - "fields": { - "date": [ - "2022-06-15T10:12:52.382719624Z" - ] - } - } - ] - } -} -``` - -You can sort on a `date_nanos` field as follows: - -```json -GET testindex/_search -{ - "sort": { - "date": "asc" - } -} -``` -{% include copy-curl.html %} - -The response contains the sorted documents: - -```json -{ - "took": 5, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": null, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": null, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - }, - "sort": [ - 1655287972382719700 - ] - }, - { - "_index": "testindex", - "_id": "2", - "_score": null, - "_source": { - "date": "2022-06-15T10:12:52.382719624Z" - }, - "sort": [ - 1655287972382719700 - ] - } - ] - } -} -``` - -You can also use a Painless script to access the nanoseconds part of the field: - -```json -GET testindex/_search -{ - "script_fields" : { - "my_field" : { - "script" : { - "lang" : "painless", - "source" : "doc['date'].value.nano" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains only the nanosecond parts of the fields: - -```json -{ - "took": 4, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "fields": { - "my_field": [ - 382719622 - ] - } - }, - { - "_index": "testindex", - "_id": "2", - "_score": 1, - "fields": { - "my_field": [ - 382719624 - ] - } - } - ] - } -} -``` \ No newline at end of file diff --git a/_field-types/supported-field-types/date.md b/_field-types/supported-field-types/date.md index da551a1dd1..ea09311718 100644 --- a/_field-types/supported-field-types/date.md +++ b/_field-types/supported-field-types/date.md @@ -3,8 +3,7 @@ layout: default title: Date nav_order: 25 has_children: false -parent: Date field types -grand_parent: Supported field types +parent: Supported field types redirect_from: - /opensearch/supported-field-types/date/ - /field-types/date/ diff --git a/_field-types/supported-field-types/dates.md b/_field-types/supported-field-types/dates.md deleted file mode 100644 index 7c6e47cb60..0000000000 --- a/_field-types/supported-field-types/dates.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -layout: default -title: Date field types -nav_order: 25 -has_children: true -has_toc: false -parent: Supported field types ---- - -# Date field types - -Date field types contain a date value that can be formatted using different date formats. The following table lists all date field types that OpenSearch supports. - -Field data type | Description -:--- | :--- -[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date stored in millisecond resolution. -[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/) | A date stored in nanosecond resolution. diff --git a/_field-types/supported-field-types/geographic.md b/_field-types/supported-field-types/geographic.md index cbe3982a4d..07d0382082 100644 --- a/_field-types/supported-field-types/geographic.md +++ b/_field-types/supported-field-types/geographic.md @@ -12,7 +12,7 @@ redirect_from: # Geographic field types -Geographic fields contain values that represent points or shapes on a map. The following table lists all geographic field types that OpenSearch supports. +The following table lists all geographic field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 3cb8bff8cd..38b45860ba 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -19,7 +19,7 @@ Field data type | Description [`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/) | A binary value in Base64 encoding. [Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | `byte`, `double`, `float`, `half_float`, `integer`, `long`, `unsigned_long`, `scaled_float`, `short`. [`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/) | A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/) | `date`, `date_nanos`. +[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date value as a formatted string, a long value, or an integer. [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) | An IP address in IPv4 or IPv6 format. [Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | `integer_range`, `long_range`,`double_range`, `float_range`, `date_range`,`ip_range`. [Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/) | `object`, `nested`, `join`. diff --git a/_field-types/supported-field-types/object-fields.md b/_field-types/supported-field-types/object-fields.md index 429c5b94c7..64869fc34d 100644 --- a/_field-types/supported-field-types/object-fields.md +++ b/_field-types/supported-field-types/object-fields.md @@ -12,7 +12,7 @@ redirect_from: # Object field types -Object field types contain values that are objects or relations. The following table lists all object field types that OpenSearch supports. +The following table lists all object field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/rank.md b/_field-types/supported-field-types/rank.md index a4ec0fac4c..c46467f8a5 100644 --- a/_field-types/supported-field-types/rank.md +++ b/_field-types/supported-field-types/rank.md @@ -23,7 +23,7 @@ Rank feature and rank features fields can be queried with [rank feature queries] ## Rank feature -A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. +A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. ### Example diff --git a/_field-types/supported-field-types/string.md b/_field-types/supported-field-types/string.md index f24dea2325..21cee52dad 100644 --- a/_field-types/supported-field-types/string.md +++ b/_field-types/supported-field-types/string.md @@ -12,7 +12,7 @@ redirect_from: # String field types -String field types contain text values or values derived from text. The following table lists all string field types that OpenSearch supports. +The following table lists all string field types that OpenSearch supports. Field data type | Description :--- | :--- From 05ae4aa5ae60582d6a9eb99aace9cac0f48da382 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:45:10 -0600 Subject: [PATCH 066/286] Revert "Update plugins.md (#4353)" This reverts commit 219add75254ef4be9c29c13614c20938daa27760. Signed-off-by: Melissa Vagi --- _install-and-configure/install-dashboards/plugins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_install-and-configure/install-dashboards/plugins.md b/_install-and-configure/install-dashboards/plugins.md index 73a2f54783..f937940cb2 100644 --- a/_install-and-configure/install-dashboards/plugins.md +++ b/_install-and-configure/install-dashboards/plugins.md @@ -39,7 +39,7 @@ The following table lists available OpenSearch Dashboards plugins. | Gantt Chart Dashboards | [gantt-chart](https://github.com/opensearch-project/dashboards-visualizations/tree/main/gantt-chart) | 1.0.0 | | Index Management Dashboards | [index-management-dashboards-plugin](https://github.com/opensearch-project/index-management-dashboards-plugin) | 1.0.0 | | Notebooks Dashboards | [dashboards-notebooks](https://github.com/opensearch-project/dashboards-notebooks) | 1.0.0 | -| Notifications Dashboards | [dashboards-notifications](https://github.com/opensearch-project/dashboards-notifications) | 2.0.0 | +| Notifications Dashboards | [notifications](https://github.com/opensearch-project/notifications) | 2.0.0 | | Observability Dashboards | [dashboards-observability](https://github.com/opensearch-project/dashboards-observability) | 2.0.0 | | Query Workbench Dashboards | [query-workbench](https://github.com/opensearch-project/dashboards-query-workbench) | 1.0.0 | | Reports Dashboards | [dashboards-reporting](https://github.com/opensearch-project/dashboards-reporting) | 1.0.0 | From d282c22d86d16e69cec17bbf7c1664d9c44edc3f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 08:45:15 -0600 Subject: [PATCH 067/286] Revert "Update searchable snapshot documentation to be more correct (#4203)" This reverts commit 979e67f918b4475421c04be70b612056dd1d162c. Signed-off-by: Melissa Vagi --- .../snapshots/searchable_snapshot.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md index 4b7284daca..a28b4d9c58 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md @@ -23,7 +23,7 @@ To configure the searchable snapshots feature, create a node in your opensearch. node.roles: [ search ] ``` -If you're running Docker, you can create a node with the `search` node role by adding the line `- node.roles=search` to your `docker-compose.yml` file: +If you're running Docker, you can create a node with the `search` node role by adding the line `- node.roles: [ search ]` to your docker-compose.yml file: ```bash version: '3' @@ -34,7 +34,7 @@ services: environment: - cluster.name=opensearch-cluster - node.name=opensearch-node1 - - node.roles=search + - node.roles: [ search ] ``` ## Create a searchable snapshot index From 8e88c6ec5901d6f20f1930a5a53682cac6f79782 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 09:21:37 -0600 Subject: [PATCH 068/286] Update ingest-processors.md Signed-off-by: Melissa Vagi --- .../ingest-apis/ingest-processors.md | 90 +------------------ 1 file changed, 1 insertion(+), 89 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 9840c36290..8ba287fcb0 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -19,93 +19,5 @@ GET /_nodes/ingest ``` {% include copy-curl.html %} -To configure and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. +To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. {: .note} - -## Set up a processor - -Following is an example of how to set up a processor in OpenSearch. Replace `my_index` with the actual name you want to ingest the document into, and adjust the field names and values to match your specfic use case. - -```json -# Define the processor configuration -processor_config = { - "description": "Custom single-field processor", - "processors": [ - { - "set: { - "field": "my_field" - "value": "default_value" - } - } - ] -} - -# Create the processor using the OpenSearch ingest APIs or REST API -processor_name = "my_single_field_processor" -opensearch.ingest.put.pipeline(id=processor_name, body=processor_config) - -# Test the processor on a single document -document = { - "my_field": "orignal_value" -} - -# Ingest the document with the processor applied -ingest_config = { - "pipeline": = processor_name, - "document": document -} -result = opensearch.ingest(index="my_index", body=ingest_config) - -# Check the output -print(result) -``` - -## Create data source for ingest processor - -To create a data source for an ingest processor in OpenSearch, you can use the OpenSearch Dashboards API to define an index template and mapping. Following is an example of how you can create a data source with an ingest processor. Make sure you have OpenSearch running and accessible at the appropriate host and port before deploying the request. - -```json - -PUT /_index_template/my-index-template -{ - "index_patterns": ["my-index-*"], - "template": { - "settings": { - "index": { - "number_of_shards": 1, - "number_of_replicas": 0 - } - }, - "mappings": { - "properties": { - "my_field": { - "type": "text" - } - } - } - }, - "priority": 100, - "composed_of": ["my-pipeline"] -} -``` - -| Name | Description | -|------|-------------| -| `PUT` | Request used to create a new index template. In the example, replace "my-index-template" with your desired template name. | -| `index_patterns` | Field that specifies the pattern of index names to which the template should be applied. In the example, the pattern "my-index" is used, which matches all indexes starting with "my-index." Replace it with your desired index pattern. | -|`settings` | Defines index-level settings. Adjust these values according to your requirements. | -| `mappings` | Defines the field mappings for the index. Modify the field name and data type according to your needs. | -|`priority` | Optional field that can be used to control the order in which the templates are evaluated. A higher value indicates a higher priority. | -| `composed_of` | Field that specifes the pipeline(s) that should be applied to the document ingested into the index. In the example, replace "my-pipeline" with the actual name of your pipeline. | - -## Deleting a processor or data source - -To delete a processor, - - - -To delete a data source, - - - - From 478bc16bdaae98da14aa211540af0efa5c865c06 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 10:07:27 -0600 Subject: [PATCH 069/286] Move files to separate PRs Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-list/dissect.md | 7 ------- _api-reference/ingest-apis/processors-list/dot-expander.md | 0 _api-reference/ingest-apis/processors-list/drop.md | 0 _api-reference/ingest-apis/processors-list/fail.md | 0 _api-reference/ingest-apis/processors-list/foreach.md | 0 _api-reference/ingest-apis/processors-list/geoip.md | 0 .../ingest-apis/processors-list/geojson-feature.md | 0 _api-reference/ingest-apis/processors-list/grok.md | 0 _api-reference/ingest-apis/processors-list/gsub.md | 0 _api-reference/ingest-apis/processors-list/html-strip.md | 0 _api-reference/ingest-apis/processors-list/join.md | 0 _api-reference/ingest-apis/processors-list/json.md | 0 _api-reference/ingest-apis/processors-list/kv.md | 0 _api-reference/ingest-apis/processors-list/lowercase.md | 0 _api-reference/ingest-apis/processors-list/pipeline.md | 0 _api-reference/ingest-apis/processors-list/remove.md | 0 _api-reference/ingest-apis/processors-list/rename.md | 0 _api-reference/ingest-apis/processors-list/script.md | 0 _api-reference/ingest-apis/processors-list/set.md | 0 _api-reference/ingest-apis/processors-list/sort.md | 0 _api-reference/ingest-apis/processors-list/split.md | 0 .../ingest-apis/processors-list/text-embedding.md | 0 _api-reference/ingest-apis/processors-list/trim.md | 0 _api-reference/ingest-apis/processors-list/uppercase.md | 0 _api-reference/ingest-apis/processors-list/urldecode.md | 0 _api-reference/ingest-apis/processors-list/user-agent.md | 0 26 files changed, 7 deletions(-) delete mode 100644 _api-reference/ingest-apis/processors-list/dissect.md delete mode 100644 _api-reference/ingest-apis/processors-list/dot-expander.md delete mode 100644 _api-reference/ingest-apis/processors-list/drop.md delete mode 100644 _api-reference/ingest-apis/processors-list/fail.md delete mode 100644 _api-reference/ingest-apis/processors-list/foreach.md delete mode 100644 _api-reference/ingest-apis/processors-list/geoip.md delete mode 100644 _api-reference/ingest-apis/processors-list/geojson-feature.md delete mode 100644 _api-reference/ingest-apis/processors-list/grok.md delete mode 100644 _api-reference/ingest-apis/processors-list/gsub.md delete mode 100644 _api-reference/ingest-apis/processors-list/html-strip.md delete mode 100644 _api-reference/ingest-apis/processors-list/join.md delete mode 100644 _api-reference/ingest-apis/processors-list/json.md delete mode 100644 _api-reference/ingest-apis/processors-list/kv.md delete mode 100644 _api-reference/ingest-apis/processors-list/lowercase.md delete mode 100644 _api-reference/ingest-apis/processors-list/pipeline.md delete mode 100644 _api-reference/ingest-apis/processors-list/remove.md delete mode 100644 _api-reference/ingest-apis/processors-list/rename.md delete mode 100644 _api-reference/ingest-apis/processors-list/script.md delete mode 100644 _api-reference/ingest-apis/processors-list/set.md delete mode 100644 _api-reference/ingest-apis/processors-list/sort.md delete mode 100644 _api-reference/ingest-apis/processors-list/split.md delete mode 100644 _api-reference/ingest-apis/processors-list/text-embedding.md delete mode 100644 _api-reference/ingest-apis/processors-list/trim.md delete mode 100644 _api-reference/ingest-apis/processors-list/uppercase.md delete mode 100644 _api-reference/ingest-apis/processors-list/urldecode.md delete mode 100644 _api-reference/ingest-apis/processors-list/user-agent.md diff --git a/_api-reference/ingest-apis/processors-list/dissect.md b/_api-reference/ingest-apis/processors-list/dissect.md deleted file mode 100644 index bc3bcefd79..0000000000 --- a/_api-reference/ingest-apis/processors-list/dissect.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Dissect -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 70 ---- \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors-list/dot-expander.md b/_api-reference/ingest-apis/processors-list/dot-expander.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/drop.md b/_api-reference/ingest-apis/processors-list/drop.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/fail.md b/_api-reference/ingest-apis/processors-list/fail.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/foreach.md b/_api-reference/ingest-apis/processors-list/foreach.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/geoip.md b/_api-reference/ingest-apis/processors-list/geoip.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/geojson-feature.md b/_api-reference/ingest-apis/processors-list/geojson-feature.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/grok.md b/_api-reference/ingest-apis/processors-list/grok.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/gsub.md b/_api-reference/ingest-apis/processors-list/gsub.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/html-strip.md b/_api-reference/ingest-apis/processors-list/html-strip.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/join.md b/_api-reference/ingest-apis/processors-list/join.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/json.md b/_api-reference/ingest-apis/processors-list/json.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/kv.md b/_api-reference/ingest-apis/processors-list/kv.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/lowercase.md b/_api-reference/ingest-apis/processors-list/lowercase.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/pipeline.md b/_api-reference/ingest-apis/processors-list/pipeline.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/remove.md b/_api-reference/ingest-apis/processors-list/remove.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/rename.md b/_api-reference/ingest-apis/processors-list/rename.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/script.md b/_api-reference/ingest-apis/processors-list/script.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/set.md b/_api-reference/ingest-apis/processors-list/set.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/sort.md b/_api-reference/ingest-apis/processors-list/sort.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/split.md b/_api-reference/ingest-apis/processors-list/split.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/text-embedding.md b/_api-reference/ingest-apis/processors-list/text-embedding.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/trim.md b/_api-reference/ingest-apis/processors-list/trim.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/uppercase.md b/_api-reference/ingest-apis/processors-list/uppercase.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/urldecode.md b/_api-reference/ingest-apis/processors-list/urldecode.md deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/_api-reference/ingest-apis/processors-list/user-agent.md b/_api-reference/ingest-apis/processors-list/user-agent.md deleted file mode 100644 index e69de29bb2..0000000000 From 714e2ba6308fd6b22af9aa1d4fa8496fd5b149d6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 10:08:37 -0600 Subject: [PATCH 070/286] Move files to separate PRs Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-list/date-index-name.md | 7 ------- 1 file changed, 7 deletions(-) delete mode 100644 _api-reference/ingest-apis/processors-list/date-index-name.md diff --git a/_api-reference/ingest-apis/processors-list/date-index-name.md b/_api-reference/ingest-apis/processors-list/date-index-name.md deleted file mode 100644 index 13e56d99d6..0000000000 --- a/_api-reference/ingest-apis/processors-list/date-index-name.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -layout: default -title: Date index name -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 60 ---- \ No newline at end of file From a23449e149945c95b3fea1376c2113918768e868 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 10:10:22 -0600 Subject: [PATCH 071/286] Update ingest-processors.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 8ba287fcb0..66319762b8 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -21,3 +21,5 @@ GET /_nodes/ingest To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. {: .note} + +See the [Processors Reference]() section for more information about each ingest processor. From 3495404751f346e2aeb187a8524151154e4b5ffb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 23 Jun 2023 10:11:45 -0600 Subject: [PATCH 072/286] Update information architecture Signed-off-by: Melissa Vagi --- .../{processors-list => processors-reference}/append.md | 0 .../{processors-list => processors-reference}/bytes.md | 0 .../{processors-list => processors-reference}/convert.md | 0 .../ingest-apis/{processors-list => processors-reference}/csv.md | 0 .../ingest-apis/{processors-list => processors-reference}/date.md | 0 5 files changed, 0 insertions(+), 0 deletions(-) rename _api-reference/ingest-apis/{processors-list => processors-reference}/append.md (100%) rename _api-reference/ingest-apis/{processors-list => processors-reference}/bytes.md (100%) rename _api-reference/ingest-apis/{processors-list => processors-reference}/convert.md (100%) rename _api-reference/ingest-apis/{processors-list => processors-reference}/csv.md (100%) rename _api-reference/ingest-apis/{processors-list => processors-reference}/date.md (100%) diff --git a/_api-reference/ingest-apis/processors-list/append.md b/_api-reference/ingest-apis/processors-reference/append.md similarity index 100% rename from _api-reference/ingest-apis/processors-list/append.md rename to _api-reference/ingest-apis/processors-reference/append.md diff --git a/_api-reference/ingest-apis/processors-list/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md similarity index 100% rename from _api-reference/ingest-apis/processors-list/bytes.md rename to _api-reference/ingest-apis/processors-reference/bytes.md diff --git a/_api-reference/ingest-apis/processors-list/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md similarity index 100% rename from _api-reference/ingest-apis/processors-list/convert.md rename to _api-reference/ingest-apis/processors-reference/convert.md diff --git a/_api-reference/ingest-apis/processors-list/csv.md b/_api-reference/ingest-apis/processors-reference/csv.md similarity index 100% rename from _api-reference/ingest-apis/processors-list/csv.md rename to _api-reference/ingest-apis/processors-reference/csv.md diff --git a/_api-reference/ingest-apis/processors-list/date.md b/_api-reference/ingest-apis/processors-reference/date.md similarity index 100% rename from _api-reference/ingest-apis/processors-list/date.md rename to _api-reference/ingest-apis/processors-reference/date.md From 55cfa3b8e59e4c6d81e0a1a326dd4000c69117bd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 08:42:33 -0600 Subject: [PATCH 073/286] Add first draft remove processor Signed-off-by: Melissa Vagi --- .../processors-reference/remove.md | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 _api-reference/ingest-apis/processors-reference/remove.md diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors-reference/remove.md new file mode 100644 index 0000000000..b0b414354d --- /dev/null +++ b/_api-reference/ingest-apis/processors-reference/remove.md @@ -0,0 +1,76 @@ +--- +layout: default +title: Remove +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 230 +--- + +# Remove + +The remove processor is used to remove a field from a document. The syntax for the `remove` processor is: + +```json +{ + "remove": { + "field": "field_name" + } +} +``` + +The `field` parameter specifies the name of the field you want to remove. For example, the following example removes the `message` field from a document: + +```json +PUT /_ingest/pipeline/my_pipeline +{ + "description": "A simple ingest pipeline that removes the `message` field.", + "processors": [ + { + "remove": { + "field": "message" + } + } + ] +} +``` + +#### Remove parameters + +The following table lists the required and optional remove parameters. + +| Name | Required | Description | +|---|---|---| +| `field` | Yes | Specifies the name of the field that you want to remove. | +| `ignore_missing` | No | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +| `ignore_failure` | No | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | +| `if` | No | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | +| `tag` | No | Allows you to identify the processor for debugging and metrics. | + +This example uses all of the options: + +```json +{ + "remove": { + "field": "message", + "ignore_missing": true, + "ignore_failure": true, + "tag": "my_tag" + } +} +``` + +In this case, the `message` field is removed from any document that is indexed, if the document does not have the `message` field. If the processor fails to remove the `message` field, it continues processing documents. The processor is also tagged with the `my_tag` tag. + +The following example only deploys the `remove` processor if the value of the `message` field is equal to "This is a message:" + +```json +{ + "remove": { + "field": "message" + }, + "if": { + "field": "message", + "value": "This is a message" + } +} +``` \ No newline at end of file From a907c5f348e3629b35f97011d95f0b34c662e3ec Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 08:43:26 -0600 Subject: [PATCH 074/286] Add first draft remove processor Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors-reference/remove.md index b0b414354d..36e6898e9d 100644 --- a/_api-reference/ingest-apis/processors-reference/remove.md +++ b/_api-reference/ingest-apis/processors-reference/remove.md @@ -36,7 +36,7 @@ PUT /_ingest/pipeline/my_pipeline #### Remove parameters -The following table lists the required and optional remove parameters. +The following table lists the required and optional parameters for the `remove` processor. | Name | Required | Description | |---|---|---| From 6db682716605cf9f7665b18c92fd8eaa5d5feab7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 10:36:31 -0600 Subject: [PATCH 075/286] Address tech review feedback Signed-off-by: Melissa Vagi --- .../processors-reference/append.md | 36 +++++++------------ .../processors-reference/remove.md | 14 ++++---- 2 files changed, 20 insertions(+), 30 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 97e0aebc28..6195eeff17 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -8,41 +8,31 @@ nav_order: 10 # Append -The append ingest processor enriches incoming data during the ingestion process by appending additional fields or values to each document. The append processor operates on a per-dcoument basis, meaning it processes each incoming document individually. Learn how to use the append processor in your data processing workflows in the following documentation. +The `append` proccessor is used to add additional fields or values to a document. The syntax for the `append` processor is: + +```json +{ + "append": { + "field": "field_name", + "value": ["value1"] + } +} +``` ## Configuration parameters -The append processor supports the following parameters. +The following table lists the required and optional parameters for the `append` processor. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. | `value` | Required| Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | -`fields` | A list of fields from which to copy values. | `ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | `fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. -Following are examples of an append processor configuration and how to add it to an ingest pipeline. - -#### Example: Append processor configuration - -```json -{ - "description": "Appends the current timestamp to the document", - "processors": [ - { - "append": { - "field": "timestamp", - "value": "{{_timestamp}}" - } - } - ] -} -``` - -#### Example: Adding the append configuration to an ingest pipeline using the REST API +Following is an examples of adding the `append` processor to an ingest pipeline using the REST API. ```json PUT _ingest/pipeline/ @@ -52,7 +42,7 @@ PUT _ingest/pipeline/ { "append": { "field": "timestamp", - "value": "{{_timestamp}}" + "value": ["_timestamp"] } } ] diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors-reference/remove.md index 36e6898e9d..ce3842cf63 100644 --- a/_api-reference/ingest-apis/processors-reference/remove.md +++ b/_api-reference/ingest-apis/processors-reference/remove.md @@ -34,19 +34,19 @@ PUT /_ingest/pipeline/my_pipeline } ``` -#### Remove parameters +#### Configuration parameters The following table lists the required and optional parameters for the `remove` processor. | Name | Required | Description | |---|---|---| -| `field` | Yes | Specifies the name of the field that you want to remove. | -| `ignore_missing` | No | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -| `ignore_failure` | No | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | -| `if` | No | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | -| `tag` | No | Allows you to identify the processor for debugging and metrics. | +| `field` | Required | Specifies the name of the field that you want to remove. | +| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | +| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | +| `tag` | Optional | Allows you to identify the processor for debugging and metrics. | -This example uses all of the options: +The following is an example using the options: ```json { From a89ca6a34dd2fb405ad6689578c5e762f9857dc7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 10:42:40 -0600 Subject: [PATCH 076/286] Address tech review feedback Signed-off-by: Melissa Vagi --- .../processors-reference/append.md | 2 +- .../ingest-apis/processors-reference/bytes.md | 20 ++++++++++++++----- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 6195eeff17..bd22c27e54 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `append` `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. -Following is an examples of adding the `append` processor to an ingest pipeline using the REST API. +Following is an examples of adding the `append` processor to an ingest pipeline. ```json PUT _ingest/pipeline/ diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 620f154c75..2ca766d422 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -8,11 +8,22 @@ nav_order: 20 # Bytes -The bytes ingest processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. +The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. + +The syntax for the `bytes` processor is: + +```json +{ + "bytes": { + "field": "file.size", + "target_field": "file.size_bytes" + } +} +``` ## Configuration parameters -The byte processor supports the following parameters. +The following table lists the required and optional parameters for the `bytes` processor. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| @@ -25,11 +36,10 @@ The byte processor supports the following parameters. `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of a byte ingest processor configuration. - -#### Example: Byte processor configuration +Following is an examples of adding the `bytes` processor to an ingest pipeline. ```json +PUT _ingest/pipeline/ { "description": "Converts the file size field to bytes", "processors": [ From 7f7c8070b227466a40b1d58a77f5ee4d02c37d85 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 10:54:36 -0600 Subject: [PATCH 077/286] Address tech review feedback Signed-off-by: Melissa Vagi --- .../processors-reference/append.md | 17 +++++---- .../ingest-apis/processors-reference/bytes.md | 20 +++++----- .../processors-reference/convert.md | 38 +++++++++++-------- 3 files changed, 41 insertions(+), 34 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index bd22c27e54..0f50816a84 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -25,14 +25,15 @@ The following table lists the required and optional parameters for the `append` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be appended. | -`value` | Required| Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | -`ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | -`fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. -`allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. -`ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. - -Following is an examples of adding the `append` processor to an ingest pipeline. +`field` | Required | Name of the field where the data should be appended. | +`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | +`ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | +`fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. | +`allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | +`ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. | +`description` | Optional | Brief description of the processor. | + +Following is an example of adding the `append` processor to an ingest pipeline. ```json PUT _ingest/pipeline/ diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 2ca766d422..9dc3b4b041 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -27,16 +27,16 @@ The following table lists the required and optional parameters for the `bytes` p **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. | -`target_field` | Required | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | -`on_failure` | Optional | Action to take if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | - -Following is an examples of adding the `bytes` processor to an ingest pipeline. +`field` | Required | Name of the field where the data should be converted. | +`target_field` | Required | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`on_failure` | Optional | Action to take if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of adding the `bytes` processor to an ingest pipeline. ```json PUT _ingest/pipeline/ diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index 9454359531..c8de186f0e 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -8,31 +8,37 @@ nav_order: 30 # Convert -The convert ingest processor converts a field in a document to a different type. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. +The `convert` processor converts a field in a document to a different type. The syntax for the `convert` processor is: -Specifying `boolean` will set the field to `true` if its string value is equal to `true` (ignore case), to false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise. +```json +{ + "convert": { + "field": "field_name", + "type": "target_type" + } +} +``` ## Configuration parameters -The byte processor supports the following parameters. +The following table lists the required and optional parameters for the `convert` processor. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. | -`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`type` | Required | -`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | -`on_failure` | Optional | Action to take if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | - -Following is an example of a convert ingest processor configuration. - -#### Example: Convert processor configuration +`field` | Required | Name of the field where the data should be converted. | +`type` | Required | The field's target type. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. Specifying `boolean` will set the field to `true` if its string value is equal to `true` (ignore case), to false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise. | +`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`on_failure` | Optional | Action to take if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of adding the `convert` processor to an ingest pipeline. ```json +PUT _ingest/pipeline/ { "description": "Converts the file size field to an integer", "processors": [ From 2ae1cc4d028502a3cdfbc20c7edc1449cbdcaa2a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 11:39:50 -0600 Subject: [PATCH 078/286] Address tech review feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/csv.md | 45 +++++++++++-------- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/csv.md b/_api-reference/ingest-apis/processors-reference/csv.md index 31c35b9236..0ca94897e3 100644 --- a/_api-reference/ingest-apis/processors-reference/csv.md +++ b/_api-reference/ingest-apis/processors-reference/csv.md @@ -8,38 +8,47 @@ nav_order: 40 # CSV -The CSV ingest processor is used to parse CSV data and store it as individual fields in a document. +The `csv` processor is used to parse CSV data and store it as individual fields in a document. The syntax for the `csv` processor is: + +```json +{ + "csv": { + "field": "field_name", + "target_fields": ["field1, field2"] + } +} +``` ## Configuration parameters -The CSV processor supports the following parameters. +The following table lists the required and optional parameters for the `csv` processor. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field to extract data from. | -`target_field` | Required | Name of the field to store the parsed data in. | -`delimiter` | Optional | The delimiter used to separate the fields in the CSV data. | -`quote` | Optional | The character used to quote fields in the CSV data. | -`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | -`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of each field. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`on_failure` | Optional | Action to take if an error occurs. | -`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | - -Following is an example of a CSV ingest processor configuration. - -#### Example: CSV processor configuration +`field` | Required | Name of the field to extract data from. | +`target_fields` | Required | Name of the field to store the parsed data in. | +`delimiter` | Optional | The delimiter used to separate the fields in the CSV data. | +`quote` | Optional | The character used to quote fields in the CSV data. | +`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | +`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of each field. Default is `false`. | +`empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`on_failure` | Optional | Action to take if an error occurs. | +`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. Default is `false`. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of adding the `csv` processor to an ingest pipeline. ```json +PUT _ingest/pipeline/ { "description": "Parses the CSV data in the `data` field", "processors": [ { "csv": { "field": "data", - "target_field": ["field1", "field2", "field3"], + "target_fields": ["field1", "field2", "field3"], "ignore_missing": true } } From ad8133bede3174b7564a8e9c307ed8c5296795b6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 12:19:56 -0600 Subject: [PATCH 079/286] Address tech review feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/date.md | 54 +++++++++++-------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index 45d42ea072..e1bf0b3232 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -8,41 +8,49 @@ nav_order: 50 # Date -The date ingest processor is used to parse dates from fields in a document annd store them as a timestamp. +The `date` processor is used to parse dates from fields in a document annd store them as a timestamp. The syntax for the `date` processor is: + +```json +{ + "date": { + "field": "date_field", + "target_field": ["parsed_date"], + "formats": ["yyyy/MM/dd HH:mm:ss", "ISO8601"] + } +} +``` ## Configuration parameters -The date processor supports the following parameters. +The following table lists the required and optional parameters for the `date` processor. **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field to extract data from. | -`target_field` | Optional | Name of the field to store the parsed data in. | -`format` | Required | The format of the date in the `field` field. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | -`locale` | Optional | The locale to use when parsing the date. The default locale is | -`timezone ` | Optional | The timezone to use when parsing the date. | -`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`on_failure` | Optional | Action to take if an error occurs. | -`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | - -Following is an example of a date ingest processor configuration. - -#### Example: Date processor configuration +`field` | Required | Name of the field to extract data from. | +`target_field` | Optional | Name of the field to store the parsed data in. | +`formats` | Required | The format of the date in the `field` field. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | +`locale` | Optional | The locale to use when parsing the date. Default is English. | +`timezone ` | Optional | The timezone to use when parsing the date. Default is UTC. | +`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | +`if` | Optional | Conditional expression that determines whether the processor should be deployed. | +`on_failure` | Optional | Action to take if an error occurs. | +`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | +`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | + +Following is an example of adding the `date` processor to an ingest pipeline. ```json +PUT /_ingest/pipeline/date_processor { - "description": "Parses the date string in the `date_string` field and stores parsed date in the `date_timestamp` field", + "description": "A pipeline that parses timestamps to dates", "processors": [ { "date": { - "field": "date_string", - "target_field": ["date_timestamp"], - "format": "yyyy-MM-dd'T'HH:mm:ss.SSSZZ", - "locale": "en-US", - "ignore_missing": true + "field" : "date_field", + "target_field" : "timestamp", + "formats" : ["dd/MM/yyyy HH:mm:ss"], + "timezone" : "UTC" } } ] From ae1c7e8fb2d69cef0f4224de7546e4454abdabb5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 12:22:49 -0600 Subject: [PATCH 080/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 66319762b8..5c2975dcdf 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -15,7 +15,7 @@ Ingest processors are a core component of data processing [pipelines]({{site.url OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: ```json -GET /_nodes/ingest +GET /_nodes/ingest?filter_path=nodes.*.ingest.processors ``` {% include copy-curl.html %} From 18c5d5e666ef3257790be9a1331e6cb4e5ac2d64 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 12:23:21 -0600 Subject: [PATCH 081/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 5c2975dcdf..5dd7f1a329 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -10,7 +10,7 @@ has_children: true Ingest processors have a crucial role in preparing and enriching data before it is stored and analyzed and improving data quality and usability. They are a set of functionalities or operations applied to incoming data during the ingestion process and allow for real-time data transformation, manipulation, and enrichment. -Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape data as it enters a system, making it more suitable for downstream operations such as indexing, analysis, or storage. They have a range of capabilities--data extraction, validation, filtering, enrichment, and normalization--that can be performed on different aspects of the data, such as extracting specific fields, converting data types, removing or modifying unwanted data, or enriching data with additional information. +Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape documents before indexing. For example, you can remove fields, extract values from text, convert data format, or enrich additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: From 428bfd9db8e803eca1f009aa26ce9edc3c7cd059 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 12:24:02 -0600 Subject: [PATCH 082/286] Update ingest-processors.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 5dd7f1a329..3a6b9bfaaf 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -8,8 +8,6 @@ has_children: true # Ingest processors -Ingest processors have a crucial role in preparing and enriching data before it is stored and analyzed and improving data quality and usability. They are a set of functionalities or operations applied to incoming data during the ingestion process and allow for real-time data transformation, manipulation, and enrichment. - Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape documents before indexing. For example, you can remove fields, extract values from text, convert data format, or enrich additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: From 7c2a1d9d413828145b602f58cfa97c2574010c3a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 28 Jun 2023 16:06:53 -0600 Subject: [PATCH 083/286] Update ingest-processors.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 3a6b9bfaaf..681a51e59f 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -8,7 +8,7 @@ has_children: true # Ingest processors -Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/). They preprocess and shape documents before indexing. For example, you can remove fields, extract values from text, convert data format, or enrich additional information. +Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess and shape documents before indexing. For example, you can remove fields, extract values from text, convert data format, or enrich additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: From b0060da17adab996b99bd7f93794083f0acc35b7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 27 Jun 2023 15:44:08 -0600 Subject: [PATCH 084/286] Address tech review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index e1bf0b3232..15586daa3c 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -14,7 +14,6 @@ The `date` processor is used to parse dates from fields in a document annd store { "date": { "field": "date_field", - "target_field": ["parsed_date"], "formats": ["yyyy/MM/dd HH:mm:ss", "ISO8601"] } } From 85b57a7d1b52b60ac051ff484673dd787500012f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 29 Jun 2023 16:59:39 -0600 Subject: [PATCH 085/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../processors-reference/append.md | 36 ++++++++++++++++--- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 0f50816a84..47f641e825 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -33,21 +33,47 @@ The following table lists the required and optional parameters for the `append` `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. | `description` | Optional | Brief description of the processor. | -Following is an example of adding the `append` processor to an ingest pipeline. +Following is an example of an ingest pipeline using the `append` processor. ```json -PUT _ingest/pipeline/ +PUT _ingest/pipeline/user-behavior { - "description": "A pipeline that appends the current timestamp to the document", + "description": "Pipeline that appends event type", "processors": [ { "append": { - "field": "timestamp", - "value": ["_timestamp"] + "field": "event_types", + "value": "{{event_type}}" } } ] } + +PUT testindex1/_doc/1?pipeline=user-behavior +{ + "event_type": "page_view" +} + +GET testindex1/_doc/1 +``` + +Following is the response: + +```json +{ + "_index": "testindex1", + "_id": "1", + "_version": 2, + "_seq_no": 1, + "_primary_term": 1, + "found": true, + "_source": { + "event_type": "page_view", + "event_types": [ + "page_view" + ] + } +} ``` ## Best practices From 308ba02913f8bdbf3fd8c0204e90785ae40dca54 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 29 Jun 2023 17:10:08 -0600 Subject: [PATCH 086/286] Update pipeline example Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 47f641e825..d0d9d293aa 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -57,7 +57,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior GET testindex1/_doc/1 ``` -Following is the response: +Following is the response. ```json { From 057a1465f005133468d402b776571c9a8579d121 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 29 Jun 2023 17:47:29 -0600 Subject: [PATCH 087/286] Update pipeline example Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index d0d9d293aa..ea2935a86d 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -53,13 +53,12 @@ PUT testindex1/_doc/1?pipeline=user-behavior { "event_type": "page_view" } - -GET testindex1/_doc/1 ``` -Following is the response. +This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new documenet ingested into OpenSearch to an array field `event_types`.Following is the GET request and response. ```json +GET testindex1/_doc/1 { "_index": "testindex1", "_id": "1", From 1a60a13ce6fcf5bd1d055c37da965d5e90e82e18 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 29 Jun 2023 18:01:36 -0600 Subject: [PATCH 088/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../processors-reference/append.md | 2 +- .../ingest-apis/processors-reference/bytes.md | 39 +++++++++++++++---- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index ea2935a86d..5a9a63b774 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -55,7 +55,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior } ``` -This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new documenet ingested into OpenSearch to an array field `event_types`.Following is the GET request and response. +This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new documenet ingested into OpenSearch to an array field `event_types`. Following is the GET request and response. ```json GET testindex1/_doc/1 diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 9dc3b4b041..21685e503a 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -8,15 +8,15 @@ nav_order: 20 # Bytes -The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value will be converted and stored in the field. If the field is an array, all members of the array will be converted. +The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value is converted and stored in the field. If the field is an array, all members of the array are converted. The syntax for the `bytes` processor is: ```json { "bytes": { - "field": "file.size", - "target_field": "file.size_bytes" + "field": "source_field", + "target_field": "destination_field" } } ``` @@ -36,19 +36,42 @@ The following table lists the required and optional parameters for the `bytes` p `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of adding the `bytes` processor to an ingest pipeline. +Following is an example of a pipeline using a `bytes` processor. ```json -PUT _ingest/pipeline/ +PUT _ingest/pipeline/file_upload { - "description": "Converts the file size field to bytes", + "description": "Pipeline that converts file size to bytes", "processors": [ { "bytes": { - "field": "file.size", - "target_field": "file.size_bytes" + "field": "file_size", + "target_field": "file_size_bytes" } } ] } + +PUT testindex1/_doc/1?pipeline=file_upload +{ + "file_size": "10MB" +} ``` + +This pipeline, named `file_upload`, has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `fiel_size_bytes`. Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 3, + "_seq_no": 2, + "_primary_term": 1, + "found": true, + "_source": { + "file_size_bytes": 10485760, + "file_size": "10MB" + } +} +``` \ No newline at end of file From 2d3688717f0e80f0ca2044d9084f1dffc519938c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Jun 2023 11:15:15 -0600 Subject: [PATCH 089/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/bytes.md | 2 +- .../processors-reference/convert.md | 30 ++++++++++++++++--- 2 files changed, 27 insertions(+), 5 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 21685e503a..99a56d459f 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -58,7 +58,7 @@ PUT testindex1/_doc/1?pipeline=file_upload } ``` -This pipeline, named `file_upload`, has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `fiel_size_bytes`. Following is the GET request and response. +This pipeline, named `file_upload`, has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`. Following is the GET request and response. ```json GET testindex1/_doc/1 diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index c8de186f0e..2f52c83e22 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -8,7 +8,7 @@ nav_order: 30 # Convert -The `convert` processor converts a field in a document to a different type. The syntax for the `convert` processor is: +The `convert` processor converts a field in a document to a different type, for example, a string field to an integer field or vice versa. The syntax for the `convert` processor is: ```json { @@ -38,16 +38,38 @@ The following table lists the required and optional parameters for the `convert` Following is an example of adding the `convert` processor to an ingest pipeline. ```json -PUT _ingest/pipeline/ +PUT _ingest/pipeline/convert-file-size { - "description": "Converts the file size field to an integer", + "description": "Pipeline that converts the file size to an integer", "processors": [ { "convert": { - "field": "file.size", + "field": "file_size", "type": "integer" } } ] } + +PUT testindex1/_doc/1?pipeline=convert-file-size +{ + "file.size": "1024" +} +``` + +This pipeline converts the `file_size` field from a string to an integer, making it possible to perform numerical operations and aggregations on the `file_size` field. Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 4, + "_seq_no": 3, + "_primary_term": 1, + "found": true, + "_source": { + "file_size": 1024 + } +} ``` From 5fd0a6489958e381a38f9207d41201a3dcced841 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Jun 2023 11:30:14 -0600 Subject: [PATCH 090/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/csv.md | 41 +++++++++++++++---- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/csv.md b/_api-reference/ingest-apis/processors-reference/csv.md index 0ca94897e3..f99df13338 100644 --- a/_api-reference/ingest-apis/processors-reference/csv.md +++ b/_api-reference/ingest-apis/processors-reference/csv.md @@ -14,7 +14,7 @@ The `csv` processor is used to parse CSV data and store it as individual fields { "csv": { "field": "field_name", - "target_fields": ["field1, field2"] + "target_fields": ["field1, field2, ..."] } } ``` @@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `csv` pro |-----------|-----------|-----------| `field` | Required | Name of the field to extract data from. | `target_fields` | Required | Name of the field to store the parsed data in. | -`delimiter` | Optional | The delimiter used to separate the fields in the CSV data. | +`separator` | Optional | The delimiter used to separate the fields in the CSV data. | `quote` | Optional | The character used to quote fields in the CSV data. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | `trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of each field. Default is `false`. | @@ -38,20 +38,45 @@ The following table lists the required and optional parameters for the `csv` pro `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of adding the `csv` processor to an ingest pipeline. +Following is an example a pipeline using a `csv` processor. ```json -PUT _ingest/pipeline/ +PUT _ingest/pipeline/csv-processor { - "description": "Parses the CSV data in the `data` field", + "description": "Split resource usage into individual fields", "processors": [ { "csv": { - "field": "data", - "target_fields": ["field1", "field2", "field3"], - "ignore_missing": true + "field": "resource_usage", + "target_fields": ["cpu_usage", "memory_usage", "disk_usage"], + "separator": "," } } ] } + +PUT testindex1/_doc/1?pipeline=csv-processor +{ + "resource_usage": "25,4096,10" +} +``` + +This pipeline transforms `resource usage` field into three separate fields: `cpu_usage` with a value of 25, `memory_usage` with a value of 4096, and `disk_usage` with a value of 10. Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 5, + "_seq_no": 4, + "_primary_term": 1, + "found": true, + "_source": { + "resource_usage": "25,4096,10", + "memory_usage": "4096", + "disk_usage": "10", + "cpu_usage": "25" + } +} ``` From 37fa2c09729f04bf8b959faf424b5c0a4912a3be Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Jun 2023 14:23:14 -0600 Subject: [PATCH 091/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/date.md | 39 +++++++++++++++---- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index 15586daa3c..c5b9a1f609 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -14,7 +14,7 @@ The `date` processor is used to parse dates from fields in a document annd store { "date": { "field": "date_field", - "formats": ["yyyy/MM/dd HH:mm:ss", "ISO8601"] + "formats": ["yyyy-MM-dd'T'HH:mm:ss.SSSZZ"] } } ``` @@ -26,8 +26,8 @@ The following table lists the required and optional parameters for the `date` pr **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field to extract data from. | +`formats` | Required | An array of the expected date formats. Can be a java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | `target_field` | Optional | Name of the field to store the parsed data in. | -`formats` | Required | The format of the date in the `field` field. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | `locale` | Optional | The locale to use when parsing the date. Default is English. | `timezone ` | Optional | The timezone to use when parsing the date. Default is UTC. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | @@ -37,21 +37,44 @@ The following table lists the required and optional parameters for the `date` pr `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of adding the `date` processor to an ingest pipeline. +Following is an example of a pipeline using a `date` processor. ```json -PUT /_ingest/pipeline/date_processor +PUT /_ingest/pipeline/date-output-format { - "description": "A pipeline that parses timestamps to dates", + "description": "Pipeline that converts European date format to US date format", "processors": [ { "date": { - "field" : "date_field", - "target_field" : "timestamp", - "formats" : ["dd/MM/yyyy HH:mm:ss"], + "field" : "date_european", + "formats" : ["dd/MM/yyyy", "UNIX"], + "target_field": "date_us", + "output_format": "MM/dd/yyy", "timezone" : "UTC" } } ] } + +PUT testindex1/_doc/1?pipeline=date-output-format +{ + "date_european": "30/06/2023" +} ``` + +This pipeline adds the new field `date_us` with the desired output format. Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 9, + "_seq_no": 8, + "_primary_term": 1, + "found": true, + "_source": { + "date_us": "06/30/2023", + "date_european": "30/06/2023" + } +} From 1b1535c444385b4b5ddca469cb9ce150f3b63726 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Jun 2023 14:55:21 -0600 Subject: [PATCH 092/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../ingest-apis/processors-reference/date.md | 2 +- .../processors-reference/remove.md | 56 ++++++++----------- 2 files changed, 25 insertions(+), 33 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index c5b9a1f609..d53af5a49e 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -37,7 +37,7 @@ The following table lists the required and optional parameters for the `date` pr `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of a pipeline using a `date` processor. +Following is an example of a pipeline using the `date` processor. ```json PUT /_ingest/pipeline/date-output-format diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors-reference/remove.md index ce3842cf63..e261f147b4 100644 --- a/_api-reference/ingest-apis/processors-reference/remove.md +++ b/_api-reference/ingest-apis/processors-reference/remove.md @@ -18,22 +18,6 @@ The remove processor is used to remove a field from a document. The syntax for t } ``` -The `field` parameter specifies the name of the field you want to remove. For example, the following example removes the `message` field from a document: - -```json -PUT /_ingest/pipeline/my_pipeline -{ - "description": "A simple ingest pipeline that removes the `message` field.", - "processors": [ - { - "remove": { - "field": "message" - } - } - ] -} -``` - #### Configuration parameters The following table lists the required and optional parameters for the `remove` processor. @@ -46,31 +30,39 @@ The following table lists the required and optional parameters for the `remove` | `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | | `tag` | Optional | Allows you to identify the processor for debugging and metrics. | -The following is an example using the options: + +Following is an example of an ingest pipeline using the remove processor. ```json +PUT /_ingest/pipeline/remove_ip { - "remove": { - "field": "message", - "ignore_missing": true, - "ignore_failure": true, - "tag": "my_tag" + "description": "Pipeline that excludes the ip_address field.", + "processors": [ + { + "remove": { + "field": "ip_address" + } } + ] } -``` -In this case, the `message` field is removed from any document that is indexed, if the document does not have the `message` field. If the processor fails to remove the `message` field, it continues processing documents. The processor is also tagged with the `my_tag` tag. +PUT testindex1/_doc/1?pipeline=remove_ip +{ + "ip_address": "203.0.113.1" +} +``` -The following example only deploys the `remove` processor if the value of the `message` field is equal to "This is a message:" +This pipeline removes the ip_address field from any document that passes through the pipeline. Following is the GET request and response. ```json +GET testindex1/_doc/1 { - "remove": { - "field": "message" - }, - "if": { - "field": "message", - "value": "This is a message" - } + "_index": "testindex1", + "_id": "1", + "_version": 10, + "_seq_no": 9, + "_primary_term": 1, + "found": true, + "_source": {} } ``` \ No newline at end of file From 807431c4c09656c1a4171a65425602b938de01d4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 30 Jun 2023 16:24:00 -0600 Subject: [PATCH 093/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../processors-reference/lowercase.md | 74 +++++++++++++++++++ .../processors-reference/remove.md | 3 +- .../processors-reference/uppercase.md | 46 ++++++++++++ 3 files changed, 122 insertions(+), 1 deletion(-) create mode 100644 _api-reference/ingest-apis/processors-reference/lowercase.md create mode 100644 _api-reference/ingest-apis/processors-reference/uppercase.md diff --git a/_api-reference/ingest-apis/processors-reference/lowercase.md b/_api-reference/ingest-apis/processors-reference/lowercase.md new file mode 100644 index 0000000000..fdd04354dd --- /dev/null +++ b/_api-reference/ingest-apis/processors-reference/lowercase.md @@ -0,0 +1,74 @@ +--- +layout: default +title: Lowercase +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 210 +--- + +# Lowercase + +This processor converts all the text in a specific field to lowercase letters. The syntax for the `lowercase` processor is: + +```json +{ + "lowercase": { + "field": "field_name" + } +} +``` + +#### Configuration parameters + +The following table lists the required and optional parameters for the `lowercase` processor. + +| Name | Required | Description | +|---|---|---| +| `field` | Required | Specifies the name of the field that you want to remove. | +| `target_field` | Optional | Specifies the name of the field to store the converted value in. Default is `field`. By default, `field` is updated in-place. | +| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | +| `on_failure` | Optional | Defines the processors to be deployed immediately following the failed processor. | +| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | +| `tag` | Optional | Provides an identifier for the processor. Useful for debugging and metrics. | +`description` | Optional | Brief description of the processor. | + + +Following is an example of an ingest pipeline using the `lowercase` processor. + +```json +PUT _ingest/pipeline/lowercase-title +{ + "description" : "Pipeline that lowercases the title field", + "processors" : [ + { + "lowercase" : { + "field" : "title" + } + } + ] +} + + +PUT testindex1/_doc/1?pipeline=lowercase-title +{ + "title": "WAR AND PEACE" +} +``` + +Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 12, + "_seq_no": 11, + "_primary_term": 1, + "found": true, + "_source": { + "title": "war and peace" + } +} +``` \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors-reference/remove.md index e261f147b4..4b1634156a 100644 --- a/_api-reference/ingest-apis/processors-reference/remove.md +++ b/_api-reference/ingest-apis/processors-reference/remove.md @@ -29,9 +29,10 @@ The following table lists the required and optional parameters for the `remove` | `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | | `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | | `tag` | Optional | Allows you to identify the processor for debugging and metrics. | +`description` | Optional | Brief description of the processor. | -Following is an example of an ingest pipeline using the remove processor. +Following is an example of an ingest pipeline using the `remove` processor. ```json PUT /_ingest/pipeline/remove_ip diff --git a/_api-reference/ingest-apis/processors-reference/uppercase.md b/_api-reference/ingest-apis/processors-reference/uppercase.md new file mode 100644 index 0000000000..2cbda2927d --- /dev/null +++ b/_api-reference/ingest-apis/processors-reference/uppercase.md @@ -0,0 +1,46 @@ +--- +layout: default +title: Uppercase +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 310 +--- + +# Uppercase + + + +```json +PUT _ingest/pipeline/uppercase +{ + "processors": [ + { + "uppercase": { + "field": "name" + } + } + ] +} + +PUT testindex1/_doc/1?pipeline=uppercase +{ + "name": "John" +} +``` + +Following is the GET request and response. + +```json +GET testindex1/_doc/1 +{ + "_index": "testindex1", + "_id": "1", + "_version": 11, + "_seq_no": 10, + "_primary_term": 1, + "found": true, + "_source": { + "name": "JOHN" + } +} +``` \ No newline at end of file From 444123dfe941ef6adbed6c4b5f19f70ef0dbbad1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 3 Jul 2023 10:50:25 -0600 Subject: [PATCH 094/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../processors-reference/convert.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index 2f52c83e22..5c134dce8e 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -38,22 +38,23 @@ The following table lists the required and optional parameters for the `convert` Following is an example of adding the `convert` processor to an ingest pipeline. ```json -PUT _ingest/pipeline/convert-file-size +PUT _ingest/pipeline/convert-age { - "description": "Pipeline that converts the file size to an integer", + "description": "Pipeline that converts age to an integer", "processors": [ { "convert": { - "field": "file_size", + "field": "age", + "target_field": "age_int", "type": "integer" } } ] } -PUT testindex1/_doc/1?pipeline=convert-file-size +PUT testindex1/_doc/1?pipeline=convert-age { - "file.size": "1024" + "age": "20" } ``` @@ -64,12 +65,13 @@ GET testindex1/_doc/1 { "_index": "testindex1", "_id": "1", - "_version": 4, - "_seq_no": 3, - "_primary_term": 1, + "_version": 17, + "_seq_no": 16, + "_primary_term": 2, "found": true, "_source": { - "file_size": 1024 + "age_int": 20, + "age": "20" } } ``` From 9612019bb71e01d41cc0eacb513736525311c837 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 3 Jul 2023 11:12:42 -0600 Subject: [PATCH 095/286] Update pipeline example Signed-off-by: Melissa Vagi --- .../processors-reference/uppercase.md | 28 ++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/uppercase.md b/_api-reference/ingest-apis/processors-reference/uppercase.md index 2cbda2927d..b9427661d7 100644 --- a/_api-reference/ingest-apis/processors-reference/uppercase.md +++ b/_api-reference/ingest-apis/processors-reference/uppercase.md @@ -8,7 +8,33 @@ nav_order: 310 # Uppercase +This processor converts all the text in a specific field to uppercase letters. The syntax for the `uppercase` processor is: +```json +{ + "uppercase": { + "field": "field_name" + } +} +``` + +#### Configuration parameters + +The following table lists the required and optional parameters for the `uppercase` processor. + +| Name | Required | Description | +|---|---|---| +| `field` | Required | Specifies the name of the field that you want to remove. | +| `target_field` | Optional | Specifies the name of the field to store the converted value in. Default is `field`. By default, `field` is updated in-place. | +| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | +| `on_failure` | Optional | Defines the processors to be deployed immediately following the failed processor. | +| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | +| `tag` | Optional | Provides an identifier for the processor. Useful for debugging and metrics. | +`description` | Optional | Brief description of the processor. | + + +Following is an example of an ingest pipeline using the `uppercase` processor. ```json PUT _ingest/pipeline/uppercase @@ -43,4 +69,4 @@ GET testindex1/_doc/1 "name": "JOHN" } } -``` \ No newline at end of file +``` From 8c976585d9f2deda51825b4c5f2b36fb296aef4c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 3 Jul 2023 11:13:00 -0600 Subject: [PATCH 096/286] Update pipeline example Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/lowercase.md b/_api-reference/ingest-apis/processors-reference/lowercase.md index fdd04354dd..1c4558b61a 100644 --- a/_api-reference/ingest-apis/processors-reference/lowercase.md +++ b/_api-reference/ingest-apis/processors-reference/lowercase.md @@ -71,4 +71,4 @@ GET testindex1/_doc/1 "title": "war and peace" } } -``` \ No newline at end of file +``` From 3f66760ecfe7180dd1679d4b3d3df4f774001622 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 3 Jul 2023 11:16:55 -0600 Subject: [PATCH 097/286] Update append.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 5a9a63b774..a31aaa3221 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -55,7 +55,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior } ``` -This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new documenet ingested into OpenSearch to an array field `event_types`. Following is the GET request and response. +This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`. Following is the GET request and response. ```json GET testindex1/_doc/1 From 5dd31547bb4ad22bf27334ca73078f6f9a1aaccb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 5 Jul 2023 13:23:26 -0600 Subject: [PATCH 098/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index d53af5a49e..ccfcf27fe3 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -8,7 +8,7 @@ nav_order: 50 # Date -The `date` processor is used to parse dates from fields in a document annd store them as a timestamp. The syntax for the `date` processor is: +The `date` processor is used to parse dates from fields in a document and store them as a timestamp. The syntax for the `date` processor is: ```json { @@ -29,7 +29,7 @@ The following table lists the required and optional parameters for the `date` pr `formats` | Required | An array of the expected date formats. Can be a java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | `target_field` | Optional | Name of the field to store the parsed data in. | `locale` | Optional | The locale to use when parsing the date. Default is English. | -`timezone ` | Optional | The timezone to use when parsing the date. Default is UTC. | +`timezone ` | Optional | The time zone to use when parsing the date. Default is UTC. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `on_failure` | Optional | Action to take if an error occurs. | From 56be338e581c464cc8ad9f8f0dbcea85c4045a0c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:26:57 -0600 Subject: [PATCH 099/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 681a51e59f..39a942d03b 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -8,7 +8,7 @@ has_children: true # Ingest processors -Ingest processors are a core component of data processing [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess and shape documents before indexing. For example, you can remove fields, extract values from text, convert data format, or enrich additional information. +Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: From b48f12451bca3076c1de854cdb7836d677abbd81 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:27:11 -0600 Subject: [PATCH 100/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 39a942d03b..481719db75 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -10,7 +10,7 @@ has_children: true Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. -OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API: +OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: ```json GET /_nodes/ingest?filter_path=nodes.*.ingest.processors From 3825a13dd813b9813e6c63e805e4249fb7dfd856 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:27:25 -0600 Subject: [PATCH 101/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 481719db75..a247c42d32 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,4 +20,4 @@ GET /_nodes/ingest?filter_path=nodes.*.ingest.processors To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. {: .note} -See the [Processors Reference]() section for more information about each ingest processor. +See the [Processor Reference]() section for more information about each ingest processor. From a3940bf4e11f25d1041f0300f3d85adc1e26b440 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:27:45 -0600 Subject: [PATCH 102/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index a31aaa3221..767c5fbed5 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -8,7 +8,12 @@ nav_order: 10 # Append -The `append` proccessor is used to add additional fields or values to a document. The syntax for the `append` processor is: +The `append` processor is used to add values to a field: +- If the field is an array, the `append` processor appends the specified values to that array. +- If the field is a scalar field, the `append` processor converts it to an array and appends the specified values to that array. +- If the field does not exist, the `append` processor creates an array with the specified values. + +The syntax for the `append` processor is: ```json { From f907ec926215d91eddf9fc8cc2d0063890d6a2f9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:28:00 -0600 Subject: [PATCH 103/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 767c5fbed5..80b2bcf7c7 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -19,7 +19,7 @@ The syntax for the `append` processor is: { "append": { "field": "field_name", - "value": ["value1"] + "value": ["value1", "value2", "{{value3}}"] } } ``` From 5bf01461cb5751662f20f06337dceb5462b86e75 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:28:11 -0600 Subject: [PATCH 104/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 80b2bcf7c7..e23273f9c1 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -24,7 +24,7 @@ The syntax for the `append` processor is: } ``` -## Configuration parameters +## Parameters The following table lists the required and optional parameters for the `append` processor. From 07c07765cd086de31418d5e58ab01700625fb410 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:28:28 -0600 Subject: [PATCH 105/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index e23273f9c1..628ccd2fd0 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -30,7 +30,7 @@ The following table lists the required and optional parameters for the `append` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be appended. | +`field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | `ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | `fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. | From d2e0aa87a453923c957450339137c7ab5ddf135a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:28:41 -0600 Subject: [PATCH 106/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 628ccd2fd0..6044ff216f 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -31,7 +31,7 @@ The following table lists the required and optional parameters for the `append` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| -`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. | +`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | `fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. | `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | From 0ca87462887a48d2e9c92a13473f9d065d8b5346 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:28:52 -0600 Subject: [PATCH 107/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 6044ff216f..29d050578c 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -33,7 +33,6 @@ The following table lists the required and optional parameters for the `append` `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | -`fail_on_error` | Optional | If set to true, the processor will fail it an error occurs. The default value is false. | `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. | `description` | Optional | Brief description of the processor. | From 5c14e8cb8294f914a0903aa97b7d6bfa4900dbab Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:29:02 -0600 Subject: [PATCH 108/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 29d050578c..420b73526e 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -32,7 +32,6 @@ The following table lists the required and optional parameters for the `append` |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`ignore_empty_fields` | Optional | If set to true, empty values will be ignored when appending then to the target field. | `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | `ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. | `description` | Optional | Brief description of the processor. | From dc86e10a4fc23a9a21064f5fd418bef3ed3d2254 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:29:12 -0600 Subject: [PATCH 109/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 420b73526e..25a9d14ef6 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -33,7 +33,6 @@ The following table lists the required and optional parameters for the `append` `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | -`ignore_missing` | Optional | If set to true, the processor will ignore events that lack the target field. The default value is false. | `description` | Optional | Brief description of the processor. | Following is an example of an ingest pipeline using the `append` processor. From 34bbd60938bf8c03b788093805c1cb6f1dc86a61 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:29:24 -0600 Subject: [PATCH 110/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 25a9d14ef6..9b4a9604e1 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `append` |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`allow_duplicates` | Optional | If set to false, the processor will not append values that already exist in the target field. The default value is set to true. | +`allow_duplicates` | Optional | If set to `false`, the processor will not append values that already exist in the field. Default is `true`. | `description` | Optional | Brief description of the processor. | Following is an example of an ingest pipeline using the `append` processor. From 3ed7c02547770487bb33864d91c00ba676cd8c18 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:29:34 -0600 Subject: [PATCH 111/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 9b4a9604e1..6aad7a98e0 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -34,6 +34,10 @@ The following table lists the required and optional parameters for the `append` `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `allow_duplicates` | Optional | If set to `false`, the processor will not append values that already exist in the field. Default is `true`. | `description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to execute this processor. | +`on_failure` | Optional | A list of processors to execute if the processor fails. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | Following is an example of an ingest pipeline using the `append` processor. From ee1dbb5e499c611f20f85cfb244ed347b2f83a6c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:29:55 -0600 Subject: [PATCH 112/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 6aad7a98e0..bef600b561 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -41,6 +41,8 @@ The following table lists the required and optional parameters for the `append` Following is an example of an ingest pipeline using the `append` processor. +The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`: + ```json PUT _ingest/pipeline/user-behavior { From badbb425133fc41429c50129e769cb5821212531 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:30:11 -0600 Subject: [PATCH 113/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index bef600b561..ea5f3e8fcf 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -56,6 +56,8 @@ PUT _ingest/pipeline/user-behavior } ] } +``` +{% include copy-curl.html %} PUT testindex1/_doc/1?pipeline=user-behavior { From 11814e140dbdd67833e80e72f7fad7d5635bcf04 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:30:22 -0600 Subject: [PATCH 114/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index ea5f3e8fcf..be5e4e72db 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -59,6 +59,9 @@ PUT _ingest/pipeline/user-behavior ``` {% include copy-curl.html %} +Ingest a document into the index: + +```json PUT testindex1/_doc/1?pipeline=user-behavior { "event_type": "page_view" From 71f249ca4a7af8862edbf05d3409b56bc8b7fd05 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:30:33 -0600 Subject: [PATCH 115/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index be5e4e72db..52a367423b 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -67,6 +67,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior "event_type": "page_view" } ``` +{% include copy-curl.html %} This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`. Following is the GET request and response. From dedc3f7ecaabdfa8420c998ca9fe3487e75f3a5e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:31:00 -0600 Subject: [PATCH 116/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 52a367423b..71efec61d6 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -69,7 +69,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior ``` {% include copy-curl.html %} -This pipeline, named `user-behavior`, has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`. Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 From a4b23aff8a3703fb8e42f7315e0b321b49c981a1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:31:20 -0600 Subject: [PATCH 117/286] Update _api-reference/ingest-apis/processors-reference/append.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/append.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors-reference/append.md index 71efec61d6..303cdd4321 100644 --- a/_api-reference/ingest-apis/processors-reference/append.md +++ b/_api-reference/ingest-apis/processors-reference/append.md @@ -73,6 +73,10 @@ To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +Because there was no `event_types` field in the document, an array field is created and the event is appended to the array: { "_index": "testindex1", "_id": "1", From 86af726453fe2fec3153e176cc7c1040f1fe0508 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:31:40 -0600 Subject: [PATCH 118/286] Update _api-reference/ingest-apis/processors-reference/bytes.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 99a56d459f..83d3c75d2c 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -8,7 +8,7 @@ nav_order: 20 # Bytes -The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value is converted and stored in the field. If the field is an array, all members of the array are converted. +The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value is converted and stored in the field. If the field is an array, all values of the array are converted. The syntax for the `bytes` processor is: From 3ea45e8bde3740bd95a83c8c845c4835d88d5a8a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:32:02 -0600 Subject: [PATCH 119/286] Update _api-reference/ingest-apis/processors-reference/bytes.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 83d3c75d2c..8e1a0dca5c 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -28,7 +28,7 @@ The following table lists the required and optional parameters for the `bytes` p **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. | -`target_field` | Required | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | From 5bcf36e038790d0779fa226a71ae4abffb1f3fc1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:32:22 -0600 Subject: [PATCH 120/286] Update _api-reference/ingest-apis/processors-reference/bytes.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 8e1a0dca5c..75fbf6d610 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -29,7 +29,7 @@ The following table lists the required and optional parameters for the `bytes` p |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. | `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | +`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | `on_failure` | Optional | Action to take if an error occurs. | From 41cce5690f64595fc056fcc4e6700822782a33bb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:32:36 -0600 Subject: [PATCH 121/286] Update _api-reference/ingest-apis/processors-reference/bytes.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 75fbf6d610..54baa600ae 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `bytes` p `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | -`on_failure` | Optional | Action to take if an error occurs. | +`on_failure` | Optional | A list of processors to execute if the processor fails. | `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | From 67a2c29bf2eafd0cd0857f89d1589e7bc80d7af1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:32:50 -0600 Subject: [PATCH 122/286] Update _api-reference/ingest-apis/processors-reference/bytes.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors-reference/bytes.md index 54baa600ae..53df28ceb0 100644 --- a/_api-reference/ingest-apis/processors-reference/bytes.md +++ b/_api-reference/ingest-apis/processors-reference/bytes.md @@ -31,7 +31,7 @@ The following table lists the required and optional parameters for the `bytes` p `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`ignore_failure` | Optional | If set to `true`, the processor will not fail if an error occurs. | `on_failure` | Optional | A list of processors to execute if the processor fails. | `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | From ef609b114790ed539558b4c2de3659703a5d42a5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:33:16 -0600 Subject: [PATCH 123/286] Update _api-reference/ingest-apis/processors-reference/convert.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index 5c134dce8e..c7c72a3107 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -8,7 +8,7 @@ nav_order: 30 # Convert -The `convert` processor converts a field in a document to a different type, for example, a string field to an integer field or vice versa. The syntax for the `convert` processor is: +The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The syntax for the `convert` processor is: ```json { From 6d4eae32cc07a13c7af0027141fbbce93c5d0e41 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:33:28 -0600 Subject: [PATCH 124/286] Update _api-reference/ingest-apis/processors-reference/convert.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index c7c72a3107..3ecf6d3d45 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -25,7 +25,7 @@ The following table lists the required and optional parameters for the `convert` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. | +`field` | Required | Name of the field whose value to convert. | `type` | Required | The field's target type. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. Specifying `boolean` will set the field to `true` if its string value is equal to `true` (ignore case), to false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise. | `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | From 392cef18b0d6d3b6fc27b9ced0c2a04f65c13ad2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:33:54 -0600 Subject: [PATCH 125/286] Update _api-reference/ingest-apis/processors-reference/convert.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors-reference/convert.md index 3ecf6d3d45..697d74cca9 100644 --- a/_api-reference/ingest-apis/processors-reference/convert.md +++ b/_api-reference/ingest-apis/processors-reference/convert.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `convert` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field whose value to convert. | -`type` | Required | The field's target type. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. Specifying `boolean` will set the field to `true` if its string value is equal to `true` (ignore case), to false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise. | +`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `"true"` (ignoring case), and to `false` if the field value is a string `"false"` (ignoring case). For all other values, an exception is thrown. | `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | `ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | From 97e0df503bc909f5375b65dceb2a52a2c384f4a6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:34:15 -0600 Subject: [PATCH 126/286] Update _api-reference/ingest-apis/processors-reference/csv.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/csv.md b/_api-reference/ingest-apis/processors-reference/csv.md index f99df13338..b64f294486 100644 --- a/_api-reference/ingest-apis/processors-reference/csv.md +++ b/_api-reference/ingest-apis/processors-reference/csv.md @@ -8,7 +8,7 @@ nav_order: 40 # CSV -The `csv` processor is used to parse CSV data and store it as individual fields in a document. The syntax for the `csv` processor is: +The `csv` processor is used to parse comma-separated values (CSV) and store them as individual fields in a document. The processor ignores empty fields. The syntax for the `csv` processor is: ```json { From 9b3fd4da711497df6a072abae99cdc084d81bda1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 11:34:25 -0600 Subject: [PATCH 127/286] Update _api-reference/ingest-apis/processors-reference/csv.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/csv.md b/_api-reference/ingest-apis/processors-reference/csv.md index b64f294486..238ff71670 100644 --- a/_api-reference/ingest-apis/processors-reference/csv.md +++ b/_api-reference/ingest-apis/processors-reference/csv.md @@ -30,7 +30,7 @@ The following table lists the required and optional parameters for the `csv` pro `separator` | Optional | The delimiter used to separate the fields in the CSV data. | `quote` | Optional | The character used to quote fields in the CSV data. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | -`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of each field. Default is `false`. | +`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of the text. Default is `false`. | `empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `on_failure` | Optional | Action to take if an error occurs. | From 07c78de154c943600ce49bbd4719dc846e91a993 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 12:11:15 -0600 Subject: [PATCH 128/286] Update _api-reference/ingest-apis/processors-reference/date.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index ccfcf27fe3..12654ca964 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -26,7 +26,8 @@ The following table lists the required and optional parameters for the `date` pr **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field to extract data from. | -`formats` | Required | An array of the expected date formats. Can be a java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. The default format is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | +`formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | +`output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. `target_field` | Optional | Name of the field to store the parsed data in. | `locale` | Optional | The locale to use when parsing the date. Default is English. | `timezone ` | Optional | The time zone to use when parsing the date. Default is UTC. | From 04b7f33a0dc580b12db7216ffb672a7fbfe9b405 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 12:11:25 -0600 Subject: [PATCH 129/286] Update _api-reference/ingest-apis/processors-reference/date.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index 12654ca964..cd865a529b 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -28,7 +28,7 @@ The following table lists the required and optional parameters for the `date` pr `field` | Required | Name of the field to extract data from. | `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | `output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. -`target_field` | Optional | Name of the field to store the parsed data in. | +`target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | `locale` | Optional | The locale to use when parsing the date. Default is English. | `timezone ` | Optional | The time zone to use when parsing the date. Default is UTC. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | From 26de00dc5a315866e5f6ec1c89c24f9d60d4b909 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 12:11:46 -0600 Subject: [PATCH 130/286] Update _api-reference/ingest-apis/processors-reference/date.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index cd865a529b..170aa19ce0 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -29,7 +29,7 @@ The following table lists the required and optional parameters for the `date` pr `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | `output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. `target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | -`locale` | Optional | The locale to use when parsing the date. Default is English. | +`locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | `timezone ` | Optional | The time zone to use when parsing the date. Default is UTC. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | From 523c44145d887c473a1190d71a1c284bbfec8beb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 15:45:12 -0600 Subject: [PATCH 131/286] Continue writing Signed-off-by: Melissa Vagi --- .../ingest-apis/create-update-ingest.md | 5 +++-- _api-reference/ingest-apis/delete-ingest.md | 5 +++-- _api-reference/ingest-apis/get-ingest.md | 5 +++-- _api-reference/ingest-apis/index.md | 11 ++++++++-- .../ingest-apis/ingest-pipelines.md | 21 +++++++++++++++++++ .../ingest-apis/ingest-processors.md | 2 +- _api-reference/ingest-apis/simulate-ingest.md | 3 ++- 7 files changed, 42 insertions(+), 10 deletions(-) create mode 100644 _api-reference/ingest-apis/ingest-pipelines.md diff --git a/_api-reference/ingest-apis/create-update-ingest.md b/_api-reference/ingest-apis/create-update-ingest.md index de2ea4ac77..5d500c89e4 100644 --- a/_api-reference/ingest-apis/create-update-ingest.md +++ b/_api-reference/ingest-apis/create-update-ingest.md @@ -1,8 +1,9 @@ --- layout: default title: Create or update ingest pipeline -parent: Ingest APIs -nav_order: 11 +parent: Ingest pipelines +grand_parent: Ingest APIs +nav_order: 10 redirect_from: - /opensearch/rest-api/ingest-apis/create-update-ingest/ --- diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index c5065d1e28..072676a559 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -1,8 +1,9 @@ --- layout: default title: Delete a pipeline -parent: Ingest APIs -nav_order: 14 +parent: Ingest pipelines +grand_parent: Ingest APIs +nav_order: 12 redirect_from: - /opensearch/rest-api/ingest-apis/delete-ingest/ --- diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index f8e18f8a56..466209b851 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -1,8 +1,9 @@ --- layout: default title: Get ingest pipeline -parent: Ingest APIs -nav_order: 10 +parent: Ingest pipelines +grand_parent: Ingest APIs +nav_order: 11 redirect_from: - /opensearch/rest-api/ingest-apis/get-ingest/ --- diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 1df68b70cc..2ab8e2842f 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -9,6 +9,13 @@ redirect_from: # Ingest APIs -Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing ingest pipelines. Pipelines consist of **processors**, customizable tasks that run in the order they appear in the request body. The transformed data appears in your index after each of the processor completes. +Ingest APIs can be used to manage the tasks and resources associated with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/). -Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For more information on setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +## Ingest pipeline APIs + +Simplify, secure, and scale your OpenSearch data ingestion with the following APIs: + +- [Create or update ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-update-ingest/): Use this API to create or update a pipeline configuration. +- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. +- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. +- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. \ No newline at end of file diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md new file mode 100644 index 0000000000..b954489779 --- /dev/null +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Ingest pipelines +parent: Ingest APIs +has_children: true +nav_order: 5 +--- + +# Ingest pipelines + +Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing _ingest pipelines_. An ingest pipeline is a sequence of steps that are applied to data as it is being ingested into a system. Benefits of using ingest pipelines include: + +- Improving the quality of data by filtering out irrelevant data and transforming data into a format that is easy to understand and analyze. +- Improving the performance of data analysis by reducing the amount of data that needs to be analyzed. +- Improving the security of data by filtering out sensitive data. + +Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The transformed data appears in your index after each of the processor completes. + +Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). + + diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index a247c42d32..a1a7ab4c31 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -2,7 +2,7 @@ layout: default title: Ingest processors parent: Ingest APIs -nav_order: 50 +nav_order: 10 has_children: true --- diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index e8d858134f..548f8b0356 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -1,7 +1,8 @@ --- layout: default title: Simulate an ingest pipeline -parent: Ingest APIs +parent: Ingest pipelines +grand_parent: Ingest APIs nav_order: 13 redirect_from: - /opensearch/rest-api/ingest-apis/simulate-ingest/ From 5166d1f80e169db3ae44d20db1e5445ff56d54a5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 24 Jul 2023 16:05:54 -0600 Subject: [PATCH 132/286] Continue writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/index.md | 6 +++--- _api-reference/ingest-apis/ingest-pipelines.md | 11 +++++------ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 2ab8e2842f..4ad9469db8 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -9,13 +9,13 @@ redirect_from: # Ingest APIs -Ingest APIs can be used to manage the tasks and resources associated with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/). +Ingest APIs are a valuable tool for ingesting data into a system. Ingest APIs work together with [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to ingest data from a variety of sources and in a variety of formats. ## Ingest pipeline APIs -Simplify, secure, and scale your OpenSearch data ingestion with the following APIs: +Simplify, secure, and scale your data ingestion in OpenSearch with the following APIs: - [Create or update ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-update-ingest/): Use this API to create or update a pipeline configuration. - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. - [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. -- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. \ No newline at end of file +- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index b954489779..d9104ede90 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -8,14 +8,13 @@ nav_order: 5 # Ingest pipelines -Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing _ingest pipelines_. An ingest pipeline is a sequence of steps that are applied to data as it is being ingested into a system. Benefits of using ingest pipelines include: - -- Improving the quality of data by filtering out irrelevant data and transforming data into a format that is easy to understand and analyze. -- Improving the performance of data analysis by reducing the amount of data that needs to be analyzed. -- Improving the security of data by filtering out sensitive data. +An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching the data. The order in which the steps are applied are important, as each step depends on the output of the previous step. Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The transformed data appears in your index after each of the processor completes. -Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +{: .note} + + From 26b0dc20e5deb4d767233d272190f2f3df3aa8fc Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 25 Jul 2023 13:21:52 -0600 Subject: [PATCH 133/286] Continue writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 15 +++++++++++++++ _api-reference/ingest-apis/ingest-processors.md | 2 -- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index d9104ede90..acb2867144 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -15,6 +15,21 @@ Ingest pipelines consist of _processors_. Processors are customizable tasks that Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). {: .note} +## Define a pipeline +A pipeline definition describes the steps involved in an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: +```json +{ + "description" : "..." + "processors" : [...] +} +``` + +## Request body fields + +Field | Required | Type | Description +:--- | :--- | :--- | :--- +`description` | Optional | String | Description of the ingest pipeline. +`processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index a1a7ab4c31..cabba60302 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -19,5 +19,3 @@ GET /_nodes/ingest?filter_path=nodes.*.ingest.processors To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. {: .note} - -See the [Processor Reference]() section for more information about each ingest processor. From f3dcef161a2e2b97c4b0046254bfeae3fc9e5799 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 25 Jul 2023 19:38:03 -0600 Subject: [PATCH 134/286] Continue writing Signed-off-by: Melissa Vagi --- .../ingest-apis/create-update-ingest.md | 54 +++++------------- .../ingest-apis/ingest-pipelines.md | 52 ++++++++++++++++- .../ingest-apis/pipeline-failures.md | 57 +++++++++++++++++++ 3 files changed, 123 insertions(+), 40 deletions(-) create mode 100644 _api-reference/ingest-apis/pipeline-failures.md diff --git a/_api-reference/ingest-apis/create-update-ingest.md b/_api-reference/ingest-apis/create-update-ingest.md index 5d500c89e4..e225258981 100644 --- a/_api-reference/ingest-apis/create-update-ingest.md +++ b/_api-reference/ingest-apis/create-update-ingest.md @@ -8,27 +8,9 @@ redirect_from: - /opensearch/rest-api/ingest-apis/create-update-ingest/ --- -# Create and update a pipeline +# Create or update a pipeline -The create ingest pipeline API operation creates or updates an ingest pipeline. Each pipeline requires an ingest definition defining how each processor transforms your documents. - -## Example - -``` -PUT _ingest/pipeline/12345 -{ - "description" : "A description for your pipeline", - "processors" : [ - { - "set" : { - "field": "field-name", - "value": "value" - } - } - ] -} -``` -{% include copy-curl.html %} +To create or update an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. ## Path and HTTP methods ``` @@ -37,35 +19,32 @@ PUT _ingest/pipeline/{id} ## Request body fields +The body of the request must contain the field `processors`. The field `description` is optional. + Field | Required | Type | Description :--- | :--- | :--- | :--- -description | Optional | string | Description of your ingest pipeline. -processors | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. +`description` | Optional | String | Description of your ingest pipeline. +`processors` | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. + +The following is a simple example to create an ingest pipeline with one processor, a `set` processor that sets the `name` field to the value of the `user_id` field: ```json +PUT _ingest/pipeline/set-pipeline { - "description" : "A description for your pipeline", + "description" : "A simple ingest pipeline", "processors" : [ { "set" : { - "field": "field-name", - "value": "value" + "field": "name", + "value": "user_id" } } ] } ``` +{% include copy-curl.html %} -## URL parameters - -All URL parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -master_timeout | time | How long to wait for a connection to the master node. -timeout | time | How long to wait for the request to return. - -## Response +The following response confirms the pipeline was successfully created. ```json { @@ -73,8 +52,5 @@ timeout | time | How long to wait for the request to return. } ``` - - - - +See [Handling ingest pipeline failures]() to learn how to handle pipeline failures. diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index acb2867144..9d258644a3 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -17,7 +17,7 @@ Ingest pipelines in OpenSearch can only be managed using [ingest API operations] ## Define a pipeline -A pipeline definition describes the steps involved in an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: +A _pipeline definition_ describes the steps involved in an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: ```json { @@ -33,3 +33,53 @@ Field | Required | Type | Description `description` | Optional | String | Description of the ingest pipeline. `processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. +Here is a simple example in JSON format. This creates an ingest pipeline with one processor, a `set` processor, that sets the value of the `name` field to the value of the `user_id` field. + +```json +{ + "description": "This is a simple ingest pipeline.", + "processors": [ + { + "set": { + "field": "name", + "value": "{{user_id}}" + } + } + ] +} +``` +{% include copy-curl.html %} + +## Template snippets + +Use template snippets to create an ingest pipeline that loads data from a file, indexes it into OpenSearch, performs some processing on the data, and outputs the data to a file. You can use template snippets as a starting point for sections of your custom templates or [Mustache](https://mustache.github.io/) template snippets to create dynamic content. Mustache templates use a simple syntax (double curly brackets `{{` and `}}`) to replace placeholders in a template with values from a data source. + +The following template snippet sets the value of a field to a specific value. The value can be a string, a number, or a Boolean. + +#### Example: `set` ingest processor Mustache template snippet + +```json +{ + "set" : { + "field": "{{field_name}}", + "value": "{{value}}" + } +} +``` +{% include copy-curl.html %} + +The `field_name` and `value` variables are Mustache templates. You can use them to specify the field name and value that you want. For example, the following `set` ingest processor sets the `name` field to the value of the `user_id` field: + +```json +{ + "set": { + "field": "name", + "value": "{{user_id}} + } +} +``` +{% include copy-curl.html %} + +## Next steps + +Learn more about creating, getting, deleting, and testing ingest pipelines in the documentation linked under the section titled Related articles. diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md new file mode 100644 index 0000000000..acfb994d30 --- /dev/null +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -0,0 +1,57 @@ +--- +layout: default +title: Handling pipeline failures +parent: Ingest pipelines +grand_parent: Ingest APIs +nav_order: 15 +--- + +## Handling pipeline failures + +Each ingest pipeline consists of a series of processors that are applied to the data in sequence. If a processor fails, the entire pipeline will fail. The are two ways to handle failures: + +- **Fail the entire pipeline:** This is the default behavior. If a processor fails, the entire pipeline will fail and the document will not be indexed. +- **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. + +To configure the failure handling behavior, you need to use the `` parameter. For example, the following JSON object configures the `set-pipeline` to fail the entire pipeline if a processor fails: + +```json +{ + "description" : "A simple ingest pipeline", + "processors" : [ + { + "set" : { + "field": "name", + "value": "user_id" + } + } + ], + "" : "fail" +} +``` + +The following JSON object configures `set-pipeline` to fail the current processor and continue with the next processor: + +```json +{ + "description" : "A simple ingest pipeline", + "processors" : [ + { + "set" : { + "field": "name", + "value": "user_id" + } + } + ], + "" : "continue" +} +``` + +## Troubleshooting failures + +The following are tips on troubleshooting ingest pipeline failures: + +1. Check the logs: OpenSeach logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. +2. Inspect the document: If the ingest pipeline failed, then the document that was being processed will be in the index. +3. Check the processor configuration: It is possible the processor configuration is incorrect. To check this you can look at the processor configuration in the JSON object. +4. Try a different processor: You can try using a different processor. Some processors are better at handling certain types of data than others. From 67a4d50eca15dcf0f1e0a56a2bde63f2ba3a78f8 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 31 Jul 2023 17:00:52 -0600 Subject: [PATCH 135/286] Revised to include SME feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 9d258644a3..063c0d3b71 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -12,9 +12,16 @@ An _ingest pipeline_ is a sequence of steps that are applied to data as it is be Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The transformed data appears in your index after each of the processor completes. -Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). {: .note} +## Prerequisites + +The following are prerequisites for using OpenSearch ingest pipelines: + +- When using ingest in production environments, your cluster should contain at least one node with the `ingest` role. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +- If the OpenSearch security features are enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. + ## Define a pipeline A _pipeline definition_ describes the steps involved in an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: From ee0d3943d4f0c3e412599ad33867c917821bb03e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 09:04:01 -0600 Subject: [PATCH 136/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 063c0d3b71..ac2b24f06e 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -8,7 +8,7 @@ nav_order: 5 # Ingest pipelines -An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching the data. The order in which the steps are applied are important, as each step depends on the output of the previous step. +An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching the data. Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The transformed data appears in your index after each of the processor completes. From dd17c5d1d5696dec1456a100ec65aabd3fadd4dd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 09:04:27 -0600 Subject: [PATCH 137/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index ac2b24f06e..b50472c0ff 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -59,7 +59,7 @@ Here is a simple example in JSON format. This creates an ingest pipeline with on ## Template snippets -Use template snippets to create an ingest pipeline that loads data from a file, indexes it into OpenSearch, performs some processing on the data, and outputs the data to a file. You can use template snippets as a starting point for sections of your custom templates or [Mustache](https://mustache.github.io/) template snippets to create dynamic content. Mustache templates use a simple syntax (double curly brackets `{{` and `}}`) to replace placeholders in a template with values from a data source. +Few processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets:{{{field-name}}}. The following template snippet sets the value of a field to a specific value. The value can be a string, a number, or a Boolean. From 5276b90aa232a551ffdfcde2e814f2b854b504e3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 09:04:38 -0600 Subject: [PATCH 138/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index b50472c0ff..9558aa9577 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -61,7 +61,7 @@ Here is a simple example in JSON format. This creates an ingest pipeline with on Few processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets:{{{field-name}}}. -The following template snippet sets the value of a field to a specific value. The value can be a string, a number, or a Boolean. +The following template snippet sets the value of a field "{{field_name}}" to a value of a field "{{value}}". #### Example: `set` ingest processor Mustache template snippet From 91ef17bd346341de65e48731b049af3ccdac72c9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 09:04:46 -0600 Subject: [PATCH 139/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 9558aa9577..548ab302fd 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -75,7 +75,7 @@ The following template snippet sets the value of a field "{{field_name}}" to a v ``` {% include copy-curl.html %} -The `field_name` and `value` variables are Mustache templates. You can use them to specify the field name and value that you want. For example, the following `set` ingest processor sets the `name` field to the value of the `user_id` field: +The following `set` ingest processor sets the `name` field to the value of the `user_id` field: ```json { From ca51a56f43d3936adc09f955a366a8ea56439f67 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 09:06:26 -0600 Subject: [PATCH 140/286] Update _api-reference/ingest-apis/processors-reference/date.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors-reference/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors-reference/date.md index 170aa19ce0..dce9e22048 100644 --- a/_api-reference/ingest-apis/processors-reference/date.md +++ b/_api-reference/ingest-apis/processors-reference/date.md @@ -30,7 +30,7 @@ The following table lists the required and optional parameters for the `date` pr `output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. `target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | `locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | -`timezone ` | Optional | The time zone to use when parsing the date. Default is UTC. | +`timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets.| `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `on_failure` | Optional | Action to take if an error occurs. | From 1f1a0d44b109b6781630afbc27752bbead52c30b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 14:52:48 -0600 Subject: [PATCH 141/286] Address SME and doc reviewer input Signed-off-by: Melissa Vagi --- .../ingest-apis/create-update-ingest.md | 91 ++++-- _api-reference/ingest-apis/delete-ingest.md | 2 +- _api-reference/ingest-apis/get-ingest.md | 70 +++-- _api-reference/ingest-apis/index.md | 5 +- .../ingest-apis/ingest-pipelines.md | 64 +--- .../ingest-apis/ingest-processors.md | 2 +- _api-reference/ingest-apis/simulate-ingest.md | 274 ++++++++++-------- 7 files changed, 275 insertions(+), 233 deletions(-) diff --git a/_api-reference/ingest-apis/create-update-ingest.md b/_api-reference/ingest-apis/create-update-ingest.md index e225258981..4ebecb3b94 100644 --- a/_api-reference/ingest-apis/create-update-ingest.md +++ b/_api-reference/ingest-apis/create-update-ingest.md @@ -1,6 +1,6 @@ --- layout: default -title: Create or update ingest pipeline +title: Create pipeline parent: Ingest pipelines grand_parent: Ingest APIs nav_order: 10 @@ -8,35 +8,44 @@ redirect_from: - /opensearch/rest-api/ingest-apis/create-update-ingest/ --- -# Create or update a pipeline +# Create pipeline -To create or update an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. +Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires an ingest definition that defines how the processors change the document. -## Path and HTTP methods -``` -PUT _ingest/pipeline/{id} -``` +## Path and HTTP method -## Request body fields +To create, or update, an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. Replace `` with your pipeline identifier. -The body of the request must contain the field `processors`. The field `description` is optional. +```json +PUT _ingest/pipeline/ +``` -Field | Required | Type | Description -:--- | :--- | :--- | :--- -`description` | Optional | String | Description of your ingest pipeline. -`processors` | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. +Here is a example in JSON format that creates an ingest pipeline with using a `set` processor and an `uppercase` processor. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. -The following is a simple example to create an ingest pipeline with one processor, a `set` processor that sets the `name` field to the value of the `user_id` field: +#### Example request ```json -PUT _ingest/pipeline/set-pipeline +PUT _ingest/pipeline/my-pipeline { - "description" : "A simple ingest pipeline", - "processors" : [ + "description": "This pipeline processes student data", + "processors": [ { - "set" : { - "field": "name", - "value": "user_id" + "set": { + "description": "Sets the graduation year to 2023", + "field": "grad_year", + "value": 2023 + } + }, + { + "set": { + "description": "Sets graduated to true", + "field": "graduated", + "value": true + } + }, + { + "uppercase": { + "field": "name" } } ] @@ -44,13 +53,49 @@ PUT _ingest/pipeline/set-pipeline ``` {% include copy-curl.html %} -The following response confirms the pipeline was successfully created. +If a pipeline fails or results in an error, see [Handling pipelines failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/) to learn more. +{: .note} + +## Request body fields + +The following table lists the request body fields used to create, or update, a pipeline. The body of the request must contain the field `processors`. The field `description` is optional. + +Field | Required | Type | Description +:--- | :--- | :--- | :--- +`processors` | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. +`description` | Optional | String | Description of your ingest pipeline. + +## Path parameters + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline. A pipeline id is used in API requests to specify which pipeline should be created or modified. + +## Query parameters + +Parameter | Required | Type | Description +:--- | :--- | :--- | :--- +`cluster_manager_timeout` | Optional | Time | Period to wait for a connection to the cluster manager node. Defaults to 30 seconds. +`timeout` | Optional | Time | Period to wait for a response. Defaults to 30 seconds. + +## Template snippets + +Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets, for example, {{{field-name}}}. + +The following template snippet sets the value of a field "{{field_name}}" to a value of a field "{{value}}". + +#### Example: `set` ingest processor Mustache template snippet ```json { - "acknowledged" : true + "set" : { + "field_name": "grad_year", + "value": "{{value}}" + } } ``` +{% include copy-curl.html %} -See [Handling ingest pipeline failures]() to learn how to handle pipeline failures. +## Next steps +- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index 072676a559..2cdbce2d48 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -3,7 +3,7 @@ layout: default title: Delete a pipeline parent: Ingest pipelines grand_parent: Ingest APIs -nav_order: 12 +nav_order: 13 redirect_from: - /opensearch/rest-api/ingest-apis/delete-ingest/ --- diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index 466209b851..b64211deed 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -1,6 +1,6 @@ --- layout: default -title: Get ingest pipeline +title: Get pipeline parent: Ingest pipelines grand_parent: Ingest APIs nav_order: 11 @@ -8,53 +8,59 @@ redirect_from: - /opensearch/rest-api/ingest-apis/get-ingest/ --- -## Get ingest pipeline +# Get pipeline -After you create a pipeline, use the get ingest pipeline API operation to return all the information about a specific ingest pipeline. +After creating a pipeline, use the get ingest pipeline API operation to retrieve all the information about the pipeline. -## Example +## Retrieving information about all pipelines -``` -GET _ingest/pipeline/12345 +The following example request returns information about all ingest pipelines: + +```json +GET _ingest/pipeline/ ``` {% include copy-curl.html %} -## Path and HTTP methods - -Return all ingest pipelines. - -``` -GET _ingest/pipeline -``` +## Retrieving information about a specific pipeline -Returns a single ingest pipeline based on the pipeline's ID. +The following example request returns information about a specific pipeline, which for this example is `my-pipeline`: +```json +GET _ingest/pipeline/my-pipeline ``` -GET _ingest/pipeline/{id} -``` - -## URL parameters - -All parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -master_timeout | time | How long to wait for a connection to the master node. +{% include copy-curl.html %} -## Response +The response contains the pipeline information: ```json { - "pipeline-id" : { - "description" : "A description for your pipeline", - "processors" : [ + "my-pipeline": { + "description": "This pipeline processes student data", + "processors": [ + { + "set": { + "description": "Sets the graduation year to 2023", + "field": "grad_year", + "value": 2023 + } + }, { - "set" : { - "field" : "field-name", - "value" : "value" + "set": { + "description": "Sets graduated to true", + "field": "graduated", + "value": true + } + }, + { + "uppercase": { + "field": "name" } } ] } } -``` \ No newline at end of file +``` + +## Next steps + +- [Test your pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/) \ No newline at end of file diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 4ad9469db8..ab05de7398 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -15,7 +15,8 @@ Ingest APIs are a valuable tool for ingesting data into a system. Ingest APIs wo Simplify, secure, and scale your data ingestion in OpenSearch with the following APIs: -- [Create or update ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-update-ingest/): Use this API to create or update a pipeline configuration. +- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-update-ingest/): Use this API to create or update a pipeline configuration. - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. -- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. - [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. +- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. + diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 548ab302fd..fed8611690 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -8,9 +8,9 @@ nav_order: 5 # Ingest pipelines -An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching the data. +An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching data. Ingest pipelines are a valuable tool to help you tailor data to your needs. -Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The transformed data appears in your index after each of the processor completes. +Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. OpenSearch [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) perform common transformations to your data, and the modified data appears in your index after each processor completes. Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). {: .note} @@ -19,7 +19,7 @@ Ingest pipelines in OpenSearch can only be managed using [ingest API operations] The following are prerequisites for using OpenSearch ingest pipelines: -- When using ingest in production environments, your cluster should contain at least one node with the `ingest` role. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +- When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). - If the OpenSearch security features are enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. ## Define a pipeline @@ -33,60 +33,18 @@ A _pipeline definition_ describes the steps involved in an ingest pipeline and c } ``` -## Request body fields +### Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- +`processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. `description` | Optional | String | Description of the ingest pipeline. -`processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. - -Here is a simple example in JSON format. This creates an ingest pipeline with one processor, a `set` processor, that sets the value of the `name` field to the value of the `user_id` field. - -```json -{ - "description": "This is a simple ingest pipeline.", - "processors": [ - { - "set": { - "field": "name", - "value": "{{user_id}}" - } - } - ] -} -``` -{% include copy-curl.html %} - -## Template snippets - -Few processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets:{{{field-name}}}. - -The following template snippet sets the value of a field "{{field_name}}" to a value of a field "{{value}}". - -#### Example: `set` ingest processor Mustache template snippet - -```json -{ - "set" : { - "field": "{{field_name}}", - "value": "{{value}}" - } -} -``` -{% include copy-curl.html %} - -The following `set` ingest processor sets the `name` field to the value of the `user_id` field: - -```json -{ - "set": { - "field": "name", - "value": "{{user_id}} - } -} -``` -{% include copy-curl.html %} ## Next steps -Learn more about creating, getting, deleting, and testing ingest pipelines in the documentation linked under the section titled Related articles. +Learn more about how to: + +- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-update-ingest/) +- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) +- [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) +- [Delete a pipeline]({{site.url}}{{site.baseurl}}/ingest-apis/delete-ingest/) in their respective documentation. diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index cabba60302..3a65b4f1f9 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -13,7 +13,7 @@ Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: ```json -GET /_nodes/ingest?filter_path=nodes.*.ingest.processors +GET /_nodes/ingest ``` {% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 548f8b0356..d712065263 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -1,148 +1,118 @@ --- layout: default -title: Simulate an ingest pipeline +title: Simulate pipeline parent: Ingest pipelines grand_parent: Ingest APIs -nav_order: 13 +nav_order: 12 redirect_from: - /opensearch/rest-api/ingest-apis/simulate-ingest/ --- -# Simulate a pipeline +# Simulate pipeline -Simulates an ingest pipeline with any example documents you specify. - -## Example - -``` -POST /_ingest/pipeline/35678/_simulate -{ - "docs": [ - { - "_index": "index", - "_id": "id", - "_source": { - "location": "document-name" - } - }, - { - "_index": "index", - "_id": "id", - "_source": { - "location": "document-name" - } - } - ] -} -``` -{% include copy-curl.html %} +Use the simulate ingest pipeline API operation to run or test the pipeline. ## Path and HTTP methods -Simulate the last ingest pipeline created. +The following requests simulate the latest ingest pipeline created. ``` GET _ingest/pipeline/_simulate POST _ingest/pipeline/_simulate ``` -Simulate a single pipeline based on the pipeline's ID. +The following requests simulate a single pipeline based on the pipeline ID. ``` -GET _ingest/pipeline/{id}/_simulate -POST _ingest/pipeline/{id}/_simulate +GET _ingest/pipeline//_simulate +POST _ingest/pipeline//_simulate ``` -## URL parameters - -All URL parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -verbose | boolean | Verbose mode. Display data output for each processor in executed pipeline. - ## Request body fields +The following table lists the request body fields used to run a pipeline. + Field | Required | Type | Description :--- | :--- | :--- | :--- -`pipeline` | Optional | object | The pipeline you want to simulate. When included without the pipeline `{id}` inside the request path, the response simulates the last pipeline created. -`docs` | Required | array of objects | The documents you want to use to test the pipeline. +`docs` | Required | Array | The documents to be used to test the pipeline. +`pipeline` | Optional | Object | The pipeline to be simulated. If the pipeline identifier is not included, then the response simulates the latest pipeline created. -The `docs` field can include the following subfields: +The `docs` field can include subfields listed in the following table. Field | Required | Type | Description -:--- | :--- | :--- -`id` | Optional |string | An optional identifier for the document. The identifier cannot be used elsewhere in the index. -`index` | Optional | string | The index where the document's transformed data appears. -`source` | Required | object | The document's JSON body. +:--- | :--- | :--- | :--- +`source` | Required | Object | The document's JSON body. +`id` | Optional | String | A unique document identifier. The identifier cannot be used elsewhere in the index. +`index` | Optional | String | The index where the document's transformed data appears. -## Response +## Query parameters -Responses vary based on which path and HTTP method you choose. +The following table lists the query parameters for running a pipeline. -### Specify pipeline in request body +Parameter | Type | Description +:--- | :--- | :--- +`verbose` | Boolean | Verbose mode. Display data output for each processor in the executed pipeline. + +#### Example: Specify a pipeline in the path ```json +POST /_ingest/pipeline/my-pipeline/_simulate { - "docs" : [ + "docs": [ { - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "location" : "new-new", - "field2" : "_value" - }, - "_ingest" : { - "timestamp" : "2022-02-07T18:47:57.479230835Z" - } + "_index": "my-index", + "_id": "1", + "_source": { + "grad_year": 2024, + "graduated": false, + "name": "John Doe" } }, { - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "location" : "new-new", - "field2" : "_value" - }, - "_ingest" : { - "timestamp" : "2022-02-07T18:47:57.47933496Z" - } + "_index": "my-index", + "_id": "2", + "_source": { + "grad_year": 2025, + "graduated": false, + "name": "Jane Doe" } } ] +}] } ``` +{% include copy-curl.html %} -### Specify pipeline ID inside HTTP path +The request returns the following response: ```json { - "docs" : [ + "docs": [ { - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "field-name" : "value", - "location" : "document-name" + "doc": { + "_index": "my-index", + "_id": "1", + "_source": { + "name": "JOHN DOE", + "grad_year": 2023, + "graduated": true }, - "_ingest" : { - "timestamp" : "2022-02-03T21:47:05.382744877Z" + "_ingest": { + "timestamp": "2023-06-20T23:19:54.635306588Z" } } }, { - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "field-name" : "value", - "location" : "document-name" + "doc": { + "_index": "my-index", + "_id": "2", + "_source": { + "name": "JANE DOE", + "grad_year": 2023, + "graduated": true }, - "_ingest" : { - "timestamp" : "2022-02-03T21:47:05.382803544Z" + "_ingest": { + "timestamp": "2023-06-20T23:19:54.635746046Z" } } } @@ -150,48 +120,65 @@ Responses vary based on which path and HTTP method you choose. } ``` -### Receive verbose response +### Example: Verbose mode -With the `verbose` parameter set to `true`, the response shows how each processor transforms the specified document. +When the previous request is run with the `verbose` parameter set to `true`, the response shows the sequence of transformations made on each document. For example, for the document with the ID `1`, the response contains the results of applying each processor in the pipeline in turn: ```json { - "docs" : [ + "docs": [ { - "processor_results" : [ + "processor_results": [ { - "processor_type" : "set", - "status" : "success", - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "field-name" : "value", - "location" : "document-name" + "processor_type": "set", + "status": "success", + "description": "Sets the graduation year to 2023", + "doc": { + "_index": "my-index", + "_id": "1", + "_source": { + "name": "John Doe", + "grad_year": 2023, + "graduated": false }, - "_ingest" : { - "pipeline" : "35678", - "timestamp" : "2022-02-03T21:45:09.414049004Z" + "_ingest": { + "pipeline": "my-pipeline", + "timestamp": "2023-06-20T23:23:26.656564631Z" } } - } - ] - }, - { - "processor_results" : [ + }, { - "processor_type" : "set", - "status" : "success", - "doc" : { - "_index" : "index", - "_id" : "id", - "_source" : { - "field-name" : "value", - "location" : "document-name" + "processor_type": "set", + "status": "success", + "description": "Sets 'graduated' to true", + "doc": { + "_index": "my-index", + "_id": "1", + "_source": { + "name": "John Doe", + "grad_year": 2023, + "graduated": true }, - "_ingest" : { - "pipeline" : "35678", - "timestamp" : "2022-02-03T21:45:09.414093212Z" + "_ingest": { + "pipeline": "my-pipeline", + "timestamp": "2023-06-20T23:23:26.656564631Z" + } + } + }, + { + "processor_type": "uppercase", + "status": "success", + "doc": { + "_index": "my-index", + "_id": "1", + "_source": { + "name": "JOHN DOE", + "grad_year": 2023, + "graduated": true + }, + "_ingest": { + "pipeline": "my-pipeline", + "timestamp": "2023-06-20T23:23:26.656564631Z" } } } @@ -199,4 +186,49 @@ With the `verbose` parameter set to `true`, the response shows how each processo } ] } +``` + +### Example: Specify a pipeline in the request body + +Alternatively, you can specify a pipeline directly in the request body without creating a pipeline first: + +```json +POST /_ingest/pipeline/_simulate +{ + "pipeline" : + { + "description": "Splits text on whitespace characters", + "processors": [ + { + "csv" : { + "field" : "name", + "separator": ",", + "target_fields": ["last_name", "first_name"], + "trim": true + } + }, + { + "uppercase": { + "field": "last_name" + } + } + ] + }, + "docs": [ + { + "_index": "second-index", + "_id": "1", + "_source": { + "name": "Doe,John" + } + }, + { + "_index": "second-index", + "_id": "2", + "_source": { + "name": "Doe, Jane" + } + } + ] +} ``` \ No newline at end of file From 6b2893263eefaae1021183be493ddf7302fac9a6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 15:37:21 -0600 Subject: [PATCH 142/286] Add simulate content Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/simulate-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index d712065263..0f46ca2a24 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -231,4 +231,4 @@ POST /_ingest/pipeline/_simulate } ] } -``` \ No newline at end of file +``` From 0385a7ea46deb25b5e3821d46e0fec2802f0287e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 16:05:03 -0600 Subject: [PATCH 143/286] Writing Signed-off-by: Melissa Vagi --- ...eate-update-ingest.md => create-ingest.md} | 2 +- _api-reference/ingest-apis/delete-ingest.md | 38 +++++-------------- _api-reference/ingest-apis/get-ingest.md | 2 +- _api-reference/ingest-apis/index.md | 2 +- .../ingest-apis/ingest-pipelines.md | 4 +- 5 files changed, 15 insertions(+), 33 deletions(-) rename _api-reference/ingest-apis/{create-update-ingest.md => create-ingest.md} (97%) diff --git a/_api-reference/ingest-apis/create-update-ingest.md b/_api-reference/ingest-apis/create-ingest.md similarity index 97% rename from _api-reference/ingest-apis/create-update-ingest.md rename to _api-reference/ingest-apis/create-ingest.md index 4ebecb3b94..3d31dbc64a 100644 --- a/_api-reference/ingest-apis/create-update-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -14,7 +14,7 @@ Use the create pipeline API operation to create or update pipelines in OpenSearc ## Path and HTTP method -To create, or update, an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. Replace `` with your pipeline identifier. +To create, or update, an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. Replace `` with your pipeline ID. ```json PUT _ingest/pipeline/ diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index 2cdbce2d48..0de94e3f30 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -1,6 +1,6 @@ --- layout: default -title: Delete a pipeline +title: Delete pipeline parent: Ingest pipelines grand_parent: Ingest APIs nav_order: 13 @@ -8,38 +8,20 @@ redirect_from: - /opensearch/rest-api/ingest-apis/delete-ingest/ --- -# Delete a pipeline +# Delete pipeline -If you no longer want to use an ingest pipeline, use the delete ingest pipeline API operation. +Use the following requests to delete pipelines. -## Example +To delete a specific pipeline, pass the pipeline ID as a parameter: -``` -DELETE _ingest/pipeline/12345 +```json +DELETE /_ingest/pipeline/ ``` {% include copy-curl.html %} -## Path and HTTP methods - -Delete an ingest pipeline based on that pipeline's ID. - -``` -DELETE _ingest/pipeline/ -``` - -## URL parameters - -All URL parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- -master_timeout | time | How long to wait for a connection to the master node. -timeout | time | How long to wait for the request to return. - -## Response +To delete all pipelines in a cluster, use the wildcard character (`*`): ```json -{ - "acknowledged" : true -} -``` \ No newline at end of file +DELETE /_ingest/pipeline/* +``` +{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index b64211deed..32e448d4f7 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -10,7 +10,7 @@ redirect_from: # Get pipeline -After creating a pipeline, use the get ingest pipeline API operation to retrieve all the information about the pipeline. +Use the get ingest pipeline API operation to retrieve all the information about the pipeline. ## Retrieving information about all pipelines diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index ab05de7398..cae4ac37d8 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -15,7 +15,7 @@ Ingest APIs are a valuable tool for ingesting data into a system. Ingest APIs wo Simplify, secure, and scale your data ingestion in OpenSearch with the following APIs: -- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-update-ingest/): Use this API to create or update a pipeline configuration. +- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-ingest/): Use this API to create or update a pipeline configuration. - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. - [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. - [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index fed8611690..eb847051c1 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -42,9 +42,9 @@ Field | Required | Type | Description ## Next steps -Learn more about how to: +Learn how to: -- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-update-ingest/) +- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/) - [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) - [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) - [Delete a pipeline]({{site.url}}{{site.baseurl}}/ingest-apis/delete-ingest/) in their respective documentation. From 3207dad073430634ad4bbc3b9e0249ceeb2ffcb3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 1 Aug 2023 16:20:16 -0600 Subject: [PATCH 144/286] Writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 24 ++++++++++++++++++--- _api-reference/ingest-apis/index.md | 1 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 3d31dbc64a..1b856344d1 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -87,11 +87,29 @@ The following template snippet sets the value of a field "{{field_name}}" to a v #### Example: `set` ingest processor Mustache template snippet ```json +PUT _ingest/pipeline/my-pipeline { - "set" : { - "field_name": "grad_year", - "value": "{{value}}" + "processors": [ + { + "set": { + "description": "Sets the grad_year field to 2023 value", + "field": "{{{grad_year}}}", + "value": "{{{2023}}}" + } + }, + { + "set": { + "description": "Sets graduated to true", + "field": "{{{graduated}}}", + "value": "{{{true}}}" + } + }, + { + "uppercase": { + "field": "name" + } } + ] } ``` {% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index cae4ac37d8..f6b7587c62 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -19,4 +19,3 @@ Simplify, secure, and scale your data ingestion in OpenSearch with the following - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. - [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. - [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. - From ce5c9725e998c38817d882254806ca9386941064 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 2 Aug 2023 14:42:47 -0600 Subject: [PATCH 145/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index f6b7587c62..66856efaba 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -9,7 +9,7 @@ redirect_from: # Ingest APIs -Ingest APIs are a valuable tool for ingesting data into a system. Ingest APIs work together with [pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to ingest data from a variety of sources and in a variety of formats. +Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work together with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to process or transform data from a variety of sources and in a variety of formats. ## Ingest pipeline APIs From ea6110f0cd04b07ad9c455b1444f8e093abe1db6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 2 Aug 2023 15:18:21 -0600 Subject: [PATCH 146/286] Writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- .../ingest-apis/pipeline-failures.md | 20 ++++++++++++++----- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 1b856344d1..5906a79307 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -20,7 +20,7 @@ To create, or update, an ingest pipeline, you need to use the `PUT` method to th PUT _ingest/pipeline/ ``` -Here is a example in JSON format that creates an ingest pipeline with using a `set` processor and an `uppercase` processor. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. +Here is an example in JSON format that creates an ingest pipeline with using a `set` processor and an `uppercase` processor. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. #### Example request diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index acfb994d30..1d609a0b4d 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -8,12 +8,12 @@ nav_order: 15 ## Handling pipeline failures -Each ingest pipeline consists of a series of processors that are applied to the data in sequence. If a processor fails, the entire pipeline will fail. The are two ways to handle failures: +Each ingest pipeline consists of a series of processors that are applied to the data in sequence. If a processor fails, the entire pipeline will fail. You have two options for handling failures: -- **Fail the entire pipeline:** This is the default behavior. If a processor fails, the entire pipeline will fail and the document will not be indexed. +- **Fail the entire pipeline:** If a processor fails, the entire pipeline will fail and the document will not be indexed. - **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. -To configure the failure handling behavior, you need to use the `` parameter. For example, the following JSON object configures the `set-pipeline` to fail the entire pipeline if a processor fails: +By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: ```json { @@ -26,7 +26,7 @@ To configure the failure handling behavior, you need to use the `" : "fail" + "ignore_failure" : true } ``` @@ -43,10 +43,20 @@ The following JSON object configures `set-pipeline` to fail the current processo } } ], - "" : "continue" + "ignore_failure" : "continue" } ``` +If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]. + +## Search pipeline metrics + +To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): + +``` +GET /_nodes/stats/ingest +``` + ## Troubleshooting failures The following are tips on troubleshooting ingest pipeline failures: From cb908a9da95fc703fdd9f934341cb7d16e74ec74 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 2 Aug 2023 15:19:06 -0600 Subject: [PATCH 147/286] Writing Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/pipeline-failures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 1d609a0b4d..6bd9a1afba 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -49,7 +49,7 @@ The following JSON object configures `set-pipeline` to fail the current processo If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]. -## Search pipeline metrics +## Ingest pipeline metrics To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): From 4d078ae55b7082e6f8a8d70f1499d4432f8a87e5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 2 Aug 2023 16:07:30 -0600 Subject: [PATCH 148/286] Revised content Signed-off-by: Melissa Vagi --- .../ingest-apis/pipeline-failures.md | 193 ++++++++++++++++-- 1 file changed, 176 insertions(+), 17 deletions(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 6bd9a1afba..dbf6ce55a6 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -13,41 +13,50 @@ Each ingest pipeline consists of a series of processors that are applied to the - **Fail the entire pipeline:** If a processor fails, the entire pipeline will fail and the document will not be indexed. - **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. -By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: +By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `on_failure` parameter for that processor to `true` when creating the pipeline: ```json +PUT _ingest/pipeline/my-pipeline/ { - "description" : "A simple ingest pipeline", - "processors" : [ + "description": "Rename 'provider' field to 'cloud.provider'", + "processors": [ { - "set" : { - "field": "name", - "value": "user_id" + "rename": { + "field": "provider", + "target_field": "cloud.provider", + "ignore_failure": true } } - ], - "ignore_failure" : true + ] } ``` -The following JSON object configures `set-pipeline` to fail the current processor and continue with the next processor: +You can specify the `on_failure` parameter to run immediately after a processor fails. If you have specified `on_failure`, OpenSearch will run the other processors in the pipeline, even if the `on_failure` configuration is empty: ```json { - "description" : "A simple ingest pipeline", - "processors" : [ + "description": "Add timestamp to the document", + "processors": [ { - "set" : { - "field": "name", - "value": "user_id" + "date": { + "field": "timestamp_field", + "target_field": "timestamp", + "formats": ["yyyy-MM-dd HH:mm:ss"] } } ], - "ignore_failure" : "continue" + "on_failure": [ + { + "set": { + "field": "ingest_error", + "value": "failed" + } + } + ] } ``` -If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]. +If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/#ingest-pipeline-metrics). ## Ingest pipeline metrics @@ -56,12 +65,162 @@ To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.ba ``` GET /_nodes/stats/ingest ``` +{% include copy-curl.html %} + +The response contains statistics for all ingest pipelines: + +```json +{ + "_nodes": { + "total": 2, + "successful": 2, + "failed": 0 + }, + "cluster_name": "opensearch-cluster", + "nodes": { + "iFPgpdjPQ-uzTdyPLwQVnQ": { + "timestamp": 1691011228995, + "name": "opensearch-node1", + "transport_address": "172.19.0.4:9300", + "host": "172.19.0.4", + "ip": "172.19.0.4:9300", + "roles": [ + "cluster_manager", + "data", + "ingest", + "remote_cluster_client" + ], + "attributes": { + "shard_indexing_pressure_enabled": "true" + }, + "ingest": { + "total": { + "count": 1, + "time_in_millis": 2, + "current": 0, + "failed": 0 + }, + "pipelines": { + "my-pipeline": { + "count": 16, + "time_in_millis": 23, + "current": 0, + "failed": 4, + "processors": [ + { + "set": { + "type": "set", + "stats": { + "count": 6, + "time_in_millis": 0, + "current": 0, + "failed": 0 + } + } + }, + { + "set": { + "type": "set", + "stats": { + "count": 6, + "time_in_millis": 3, + "current": 0, + "failed": 0 + } + } + }, + { + "uppercase": { + "type": "uppercase", + "stats": { + "count": 6, + "time_in_millis": 0, + "current": 0, + "failed": 4 + } + } + } + ] + } + } + } + }, + "dDOB3vS3TVmB5t6PHdCj4Q": { + "timestamp": 1691011228997, + "name": "opensearch-node2", + "transport_address": "172.19.0.2:9300", + "host": "172.19.0.2", + "ip": "172.19.0.2:9300", + "roles": [ + "cluster_manager", + "data", + "ingest", + "remote_cluster_client" + ], + "attributes": { + "shard_indexing_pressure_enabled": "true" + }, + "ingest": { + "total": { + "count": 0, + "time_in_millis": 0, + "current": 0, + "failed": 0 + }, + "pipelines": { + "my-pipeline": { + "count": 0, + "time_in_millis": 0, + "current": 0, + "failed": 0, + "processors": [ + { + "set": { + "type": "set", + "stats": { + "count": 0, + "time_in_millis": 0, + "current": 0, + "failed": 0 + } + } + }, + { + "set": { + "type": "set", + "stats": { + "count": 0, + "time_in_millis": 0, + "current": 0, + "failed": 0 + } + } + }, + { + "uppercase": { + "type": "uppercase", + "stats": { + "count": 0, + "time_in_millis": 0, + "current": 0, + "failed": 0 + } + } + } + ] + } + } + } + } + } +} +``` ## Troubleshooting failures The following are tips on troubleshooting ingest pipeline failures: 1. Check the logs: OpenSeach logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. -2. Inspect the document: If the ingest pipeline failed, then the document that was being processed will be in the index. +2. Inspect the document: If the ingest pipeline failed, then the document that was being processed will be in its respective index. 3. Check the processor configuration: It is possible the processor configuration is incorrect. To check this you can look at the processor configuration in the JSON object. 4. Try a different processor: You can try using a different processor. Some processors are better at handling certain types of data than others. From 042204ebfe73606853a7b7373b83047e6fc0cc7e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 09:05:44 -0600 Subject: [PATCH 149/286] copy edit Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/pipeline-failures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index dbf6ce55a6..afba68104a 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -13,7 +13,7 @@ Each ingest pipeline consists of a series of processors that are applied to the - **Fail the entire pipeline:** If a processor fails, the entire pipeline will fail and the document will not be indexed. - **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. -By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `on_failure` parameter for that processor to `true` when creating the pipeline: +By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: ```json PUT _ingest/pipeline/my-pipeline/ From 3a38b2e7b182c6014d6053f7bb990c5aae7a25ec Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 09:25:14 -0600 Subject: [PATCH 150/286] Update file name Signed-off-by: Melissa Vagi --- .../ingest-apis/{processors-reference => processors}/append.md | 0 .../ingest-apis/{processors-reference => processors}/bytes.md | 0 .../ingest-apis/{processors-reference => processors}/convert.md | 0 .../ingest-apis/{processors-reference => processors}/csv.md | 0 .../ingest-apis/{processors-reference => processors}/date.md | 0 .../ingest-apis/{processors-reference => processors}/lowercase.md | 0 .../ingest-apis/{processors-reference => processors}/remove.md | 0 .../ingest-apis/{processors-reference => processors}/uppercase.md | 0 8 files changed, 0 insertions(+), 0 deletions(-) rename _api-reference/ingest-apis/{processors-reference => processors}/append.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/bytes.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/convert.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/csv.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/date.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/lowercase.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/remove.md (100%) rename _api-reference/ingest-apis/{processors-reference => processors}/uppercase.md (100%) diff --git a/_api-reference/ingest-apis/processors-reference/append.md b/_api-reference/ingest-apis/processors/append.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/append.md rename to _api-reference/ingest-apis/processors/append.md diff --git a/_api-reference/ingest-apis/processors-reference/bytes.md b/_api-reference/ingest-apis/processors/bytes.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/bytes.md rename to _api-reference/ingest-apis/processors/bytes.md diff --git a/_api-reference/ingest-apis/processors-reference/convert.md b/_api-reference/ingest-apis/processors/convert.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/convert.md rename to _api-reference/ingest-apis/processors/convert.md diff --git a/_api-reference/ingest-apis/processors-reference/csv.md b/_api-reference/ingest-apis/processors/csv.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/csv.md rename to _api-reference/ingest-apis/processors/csv.md diff --git a/_api-reference/ingest-apis/processors-reference/date.md b/_api-reference/ingest-apis/processors/date.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/date.md rename to _api-reference/ingest-apis/processors/date.md diff --git a/_api-reference/ingest-apis/processors-reference/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/lowercase.md rename to _api-reference/ingest-apis/processors/lowercase.md diff --git a/_api-reference/ingest-apis/processors-reference/remove.md b/_api-reference/ingest-apis/processors/remove.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/remove.md rename to _api-reference/ingest-apis/processors/remove.md diff --git a/_api-reference/ingest-apis/processors-reference/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md similarity index 100% rename from _api-reference/ingest-apis/processors-reference/uppercase.md rename to _api-reference/ingest-apis/processors/uppercase.md From 4e4a932824bdb555398d1c2eadcd739c9d06c85f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 10:40:56 -0600 Subject: [PATCH 151/286] Update file name Signed-off-by: Melissa Vagi --- .../ingest-apis/ingest-processors.md | 27 ++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 3a65b4f1f9..ee2475cdff 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -10,12 +10,33 @@ has_children: true Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. -OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [nodes info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: +OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: ```json GET /_nodes/ingest ``` {% include copy-curl.html %} -To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. You can learn more about the processor types within their respective documentation. -{: .note} +To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. +{:.note} + +The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case.You can learn more about the processor types and defining and configuring them within their respective documentation. + +#### Example + +```json +{ + "your_processor_type": { + "your_required_parameter": "your_value", + "your_optional_parameter": "your_optional_value" + } +} +``` + +**Parameter** | **Required** | **Description** | +|-----------|-----------|-----------| +`your_processor_type` | Required | Type of processor you want to use, such as `rename`, `set`, `append`, and so forth. Different processor types perform different actions. | +`your_required_parameter` | Required | Required parameter specific to the processor type you've chosen. It defines the main setting or action for the processor to take. | +`your_value` | Required | Replace this with the appropriate value for the chosen processor type and parameter. For example, if the processor is `rename`, then this value is the new field name you want to rename to. | +`your_optional_parameter` | Optional | Some processors have optional parameters that modify their behavior. Replace this with the optional parameter. | +`your_optional_value` | Optional | Replace this with the appropriate value for the optional parameter used. | From f2622a811f77429cf4efeab6622d6d85f75dd16a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 10:43:55 -0600 Subject: [PATCH 152/286] Update file name Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index ee2475cdff..b04b88edf3 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,7 +20,7 @@ GET /_nodes/ingest To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. {:.note} -The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case.You can learn more about the processor types and defining and configuring them within their respective documentation. +The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case. See the [Related articles]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/#related-articles) section to learn more about the processor types and defining and configuring them within a pipeline. #### Example From 41562a5229161d8c51b7005bf61c0caea795d0fe Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 12:53:43 -0600 Subject: [PATCH 153/286] Update file name Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 35 +++++++++++++++---- _api-reference/ingest-apis/simulate-ingest.md | 1 - 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 303cdd4321..6fcd5f504e 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -18,11 +18,13 @@ The syntax for the `append` processor is: ```json { "append": { - "field": "field_name", - "value": ["value1", "value2", "{{value3}}"] + "field": "your_target_field", + "value": ["your_appended_value"] } } ``` +{% include copy-curl.html %} +```` ## Parameters @@ -32,10 +34,9 @@ The following table lists the required and optional parameters for the `append` |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`allow_duplicates` | Optional | If set to `false`, the processor will not append values that already exist in the field. Default is `true`. | `description` | Optional | Brief description of the processor. | -`if` | Optional | Condition to execute this processor. | -`on_failure` | Optional | A list of processors to execute if the processor fails. | +`if` | Optional | Condition to run this processor. | +`on_failure` | Optional | A list of processors to run if the processor fails. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | @@ -51,7 +52,7 @@ PUT _ingest/pipeline/user-behavior { "append": { "field": "event_types", - "value": "{{event_type}}" + "value": ["event_type"] } } ] @@ -93,8 +94,28 @@ Because there was no `event_types` field in the document, an array field is crea } ``` +To test the processor, you can use the following example: + +```json +POST _ingest/pipeline/user-behavior/_simulate +{ + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "event_type": "page_view", + "event_types": + "event_type" + } + } + ] +} +``` +{% include copy-curl.html %} + ## Best practices - **Data validation:** Make sure the values being appended are valid and compatible with the target field's data type and format. - **Efficiency:** Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. -- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when external lookups or API requests encounter errors. +- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when API requests encounter errors. diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 0f46ca2a24..8666abb500 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -78,7 +78,6 @@ POST /_ingest/pipeline/my-pipeline/_simulate } } ] -}] } ``` {% include copy-curl.html %} From ec4a3541e52fdaf3fa95c6bdc9e35a2339b562e0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 13:32:55 -0600 Subject: [PATCH 154/286] Update file name Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 6 --- .../ingest-apis/processors/bytes.md | 45 +++++++++++++------ 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 6fcd5f504e..69e4391e92 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -113,9 +113,3 @@ POST _ingest/pipeline/user-behavior/_simulate } ``` {% include copy-curl.html %} - -## Best practices - -- **Data validation:** Make sure the values being appended are valid and compatible with the target field's data type and format. -- **Efficiency:** Consider the performance implications of appending large amounts of data to each document and optimize the processor configuration accordingly. -- **Error handling:** Implement proper error handling mechanisms to handle scenarios where appending fails, such as when API requests encounter errors. diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 53df28ceb0..4d7ef19138 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -32,11 +32,11 @@ The following table lists the required and optional parameters for the `bytes` p `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to `true`, the processor will not fail if an error occurs. | -`on_failure` | Optional | A list of processors to execute if the processor fails. | +`on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | Tag that can be used to identify the processor. | `description` | Optional | Brief description of the processor. | -Following is an example of a pipeline using a `bytes` processor. +The following query creates a pipeline, named `file_upload`, that has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`: ```json PUT _ingest/pipeline/file_upload @@ -51,27 +51,44 @@ PUT _ingest/pipeline/file_upload } ] } +``` +{% include copy-curl.html %} +``` +Ingest a document into the index: + +```json PUT testindex1/_doc/1?pipeline=file_upload { "file_size": "10MB" } ``` +{% include copy-curl.html %} +``` -This pipeline, named `file_upload`, has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`. Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} +``` + +To test the processor, you can use the following example: + +```json +POST _ingest/pipeline/user-behavior/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 3, - "_seq_no": 2, - "_primary_term": 1, - "found": true, - "_source": { - "file_size_bytes": 10485760, - "file_size": "10MB" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "file_size_bytes": "10485760", + "file_size": + "10MB" + } + } + ] } -``` \ No newline at end of file +``` From fdc6a1bf02ccefc5f89af1ecb71e7f7ca22a0d1c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 13:33:22 -0600 Subject: [PATCH 155/286] Update file name Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index b04b88edf3..28d1c3a4d8 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -40,3 +40,4 @@ The following is a generic example of an ingest processor definition within a pi `your_value` | Required | Replace this with the appropriate value for the chosen processor type and parameter. For example, if the processor is `rename`, then this value is the new field name you want to rename to. | `your_optional_parameter` | Optional | Some processors have optional parameters that modify their behavior. Replace this with the optional parameter. | `your_optional_value` | Optional | Replace this with the appropriate value for the optional parameter used. | + From 55861a89220f356b949013a33f7a50099a7105c3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 15:56:02 -0600 Subject: [PATCH 156/286] Add bytes content and examples Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/bytes.md | 35 ++++++++++++++++--- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 4d7ef19138..16d922562f 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -15,8 +15,7 @@ The syntax for the `bytes` processor is: ```json { "bytes": { - "field": "source_field", - "target_field": "destination_field" + "field": "your_field_name", } } ``` @@ -29,12 +28,13 @@ The following table lists the required and optional parameters for the `bytes` p |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. | `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`description` | Optional | Brief description of the processor. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | `ignore_failure` | Optional | If set to `true`, the processor will not fail if an error occurs. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | + The following query creates a pipeline, named `file_upload`, that has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`: @@ -74,7 +74,7 @@ GET testindex1/_doc/1 {% include copy-curl.html %} ``` -To test the processor, you can use the following example: +To test the processor, run the following query: ```json POST _ingest/pipeline/user-behavior/_simulate @@ -92,3 +92,30 @@ POST _ingest/pipeline/user-behavior/_simulate ] } ``` + +The following query creates a pipeline with the bytes processor and one optional parameter, `on_failure`, which uses the `set` processor to set the `error` field with a specific error message: + +``json +PUT _ingest/pipeline/file_upload +{ + "description": "Pipeline that converts file size to bytes", + "processors": [ + { + "bytes": { + "field": "file_size", + "target_field": "file_size_bytes", + "on_failure": [ + { + "set": { + "field": "error", + "value": "Failed to convert" + } + } + ] + } + } + ] +} +``` +{% include copy-curl.html %} +``` \ No newline at end of file From 9591dccaf3c253fe747fa28f82d546d94e057f2e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 3 Aug 2023 15:58:41 -0600 Subject: [PATCH 157/286] Add bytes content and examples Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 16d922562f..80c269d712 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -118,4 +118,4 @@ PUT _ingest/pipeline/file_upload } ``` {% include copy-curl.html %} -``` \ No newline at end of file +``` From ef4cb53fd97695646442653e9b73dd62b5dca96e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 4 Aug 2023 16:22:10 -0600 Subject: [PATCH 158/286] Testing code snippets and optional configuration parameters Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 2 +- .../ingest-apis/processors/bytes.md | 2 +- .../ingest-apis/processors/convert.md | 67 ++++++++++++------- 3 files changed, 44 insertions(+), 27 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 69e4391e92..7a4c59f811 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -94,7 +94,7 @@ Because there was no `event_types` field in the document, an array field is crea } ``` -To test the processor, you can use the following example: +To test the pipeline, run the following query: ```json POST _ingest/pipeline/user-behavior/_simulate diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 80c269d712..0afa32638b 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -74,7 +74,7 @@ GET testindex1/_doc/1 {% include copy-curl.html %} ``` -To test the processor, run the following query: +To test the pipeline, run the following query: ```json POST _ingest/pipeline/user-behavior/_simulate diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 697d74cca9..8e30431792 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -8,13 +8,13 @@ nav_order: 30 # Convert -The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The syntax for the `convert` processor is: +The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The syntax for the `convert` processor is: ```json { "convert": { "field": "field_name", - "type": "target_type" + "type": "type-value" } } ``` @@ -26,52 +26,69 @@ The following table lists the required and optional parameters for the `convert` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field whose value to convert. | -`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `"true"` (ignoring case), and to `false` if the field value is a string `"false"` (ignoring case). For all other values, an exception is thrown. | +`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `"true"` (ignoring case), and to `false` if the field value is a string `"false"` (ignoring case). If the value is not one of the allowed values, an error will occur. | +`description` | Optional | Brief description of the processor. | `target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`ignore_missing` | Optional | If set to true, the processor will not fail if the field does not exist. Default is `false`. | `if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. | +`ignore_missing` | If set to `true`, the processor will ignore documents that do not have a value for the specified field. Default is `false`. +`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. Default is `false`. | `on_failure` | Optional | Action to take if an error occurs. | `tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | -Following is an example of adding the `convert` processor to an ingest pipeline. +The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number and stores the converted value in the `price_float` field: ```json -PUT _ingest/pipeline/convert-age +PUT _ingest/pipeline/convert-price { - "description": "Pipeline that converts age to an integer", + "description": "Pipeline that converts price to floating-point number", "processors": [ { "convert": { - "field": "age", - "target_field": "age_int", - "type": "integer" + "field": "price", + "type": "string", + "target_field": "price_float" } } ] } +``` + +Ingest a document into the index: -PUT testindex1/_doc/1?pipeline=convert-age +```json +PUT testindex1/_doc/1?pipeline=convert-price { - "age": "20" + "price": "100" } ``` +{% include copy-curl.html %} +``` -This pipeline converts the `file_size` field from a string to an integer, making it possible to perform numerical operations and aggregations on the `file_size` field. Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} +``` + +To test the pipeline, run the following query:: + +```json +POST _ingest/pipeline/user-behavior/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 17, - "_seq_no": 16, - "_primary_term": 2, - "found": true, - "_source": { - "age_int": 20, - "age": "20" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "price_float": "100.00", + "price": + "price_float" + } + } + ] } ``` +{% include copy-curl.html %} +``` From 5aae19f4d1e14eaa749a274f970b8a2a3cb043a4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 4 Aug 2023 17:49:40 -0600 Subject: [PATCH 159/286] Testing code snippets and optional configuration parameters Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 8e30431792..76609c8bcc 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -52,6 +52,8 @@ PUT _ingest/pipeline/convert-price ] } ``` +{% include copy-curl.html %} +``` Ingest a document into the index: From 5f6dd0e0e517dff2bc12f759dbba5f7a42d888ca Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 18 Aug 2023 11:37:36 -0600 Subject: [PATCH 160/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 5906a79307..5aadd2d401 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -82,7 +82,6 @@ Parameter | Required | Type | Description Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets, for example, {{{field-name}}}. -The following template snippet sets the value of a field "{{field_name}}" to a value of a field "{{value}}". #### Example: `set` ingest processor Mustache template snippet From f91ccf59ce0c49a7474817b41444a48e57e9ff0d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 18 Aug 2023 11:37:50 -0600 Subject: [PATCH 161/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 5aadd2d401..941da9d9bd 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -83,7 +83,7 @@ Parameter | Required | Type | Description Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets, for example, {{{field-name}}}. -#### Example: `set` ingest processor Mustache template snippet +#### Example: `set` ingest processor using Mustache template snippet ```json PUT _ingest/pipeline/my-pipeline From 94a86c3225811b88ed43bae53fcee919e638dfed Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 22 Aug 2023 12:48:35 -0600 Subject: [PATCH 162/286] QA pipeline testing Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 48 +++++++-- .../ingest-apis/processors/bytes.md | 66 ++++++++++--- .../ingest-apis/processors/convert.md | 82 +++++++++++----- _api-reference/ingest-apis/processors/csv.md | 98 ++++++++++++++----- _api-reference/ingest-apis/processors/date.md | 91 +++++++++++++---- .../ingest-apis/processors/lowercase.md | 85 ++++++++++++---- .../ingest-apis/processors/remove.md | 79 ++++++++++++--- .../ingest-apis/processors/uppercase.md | 85 ++++++++++++---- 8 files changed, 493 insertions(+), 141 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 7a4c59f811..dba5b27930 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -24,7 +24,6 @@ The syntax for the `append` processor is: } ``` {% include copy-curl.html %} -```` ## Parameters @@ -33,14 +32,18 @@ The following table lists the required and optional parameters for the `append` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| -`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | +`description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | -`on_failure` | Optional | A list of processors to run if the processor fails. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -Following is an example of an ingest pipeline using the `append` processor. +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`: @@ -60,7 +63,9 @@ PUT _ingest/pipeline/user-behavior ``` {% include copy-curl.html %} -Ingest a document into the index: +**Step 2: Ingest a document into the index.** + +The following query ingests a document into the index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=user-behavior @@ -70,6 +75,8 @@ PUT testindex1/_doc/1?pipeline=user-behavior ``` {% include copy-curl.html %} +**Step 3: View the ingested document.** + To view the ingested document, run the following query: ```json @@ -78,6 +85,8 @@ GET testindex1/_doc/1 {% include copy-curl.html %} Because there was no `event_types` field in the document, an array field is created and the event is appended to the array: + +```json { "_index": "testindex1", "_id": "1", @@ -94,6 +103,8 @@ Because there was no `event_types` field in the document, an array field is crea } ``` +**Step 4: Test the pipeline.** + To test the pipeline, run the following query: ```json @@ -113,3 +124,28 @@ POST _ingest/pipeline/user-behavior/_simulate } ``` {% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "event_type": "page_view", + "event_types": [ + "event_type", + "event_type" + ] + }, + "_ingest": { + "timestamp": "2023-08-22T16:02:37.893458209Z" + } + } + } + ] +} +``` diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 0afa32638b..44261a625f 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -15,10 +15,11 @@ The syntax for the `bytes` processor is: ```json { "bytes": { - "field": "your_field_name", + "field": "your_field_name" } } ``` +{% include copy-curl.html %} ## Configuration parameters @@ -26,15 +27,20 @@ The following table lists the required and optional parameters for the `bytes` p **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. | -`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`field` | Required | Name of the field where the data should be converted. Supports template snippets.| `description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_failure` | Optional | If set to `true`, the processor will not fail if an error occurs. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | Tag that can be used to identify the processor. | - +`on_failure` | Optional | A list of processors to run if the processor fails. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`target_field` | Optional | Name of the field to store the parsed data in. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** The following query creates a pipeline, named `file_upload`, that has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`: @@ -53,9 +59,10 @@ PUT _ingest/pipeline/file_upload } ``` {% include copy-curl.html %} -``` -Ingest a document into the index: +**Step 2: Ingest a document into the index.** + +The following query ingests a document into the index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=file_upload @@ -64,7 +71,8 @@ PUT testindex1/_doc/1?pipeline=file_upload } ``` {% include copy-curl.html %} -``` + +**Step 3: View the ingested document.** To view the ingested document, run the following query: @@ -72,7 +80,8 @@ To view the ingested document, run the following query: GET testindex1/_doc/1 ``` {% include copy-curl.html %} -``` + +**Step 4: Test the pipeline.** To test the pipeline, run the following query: @@ -92,10 +101,38 @@ POST _ingest/pipeline/user-behavior/_simulate ] } ``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "event_types": [ + "event_type" + ], + "file_size_bytes": "10485760", + "file_size": "10MB" + }, + "_ingest": { + "timestamp": "2023-08-22T16:09:42.771569211Z" + } + } + } + ] +} +``` + +## Using optional parameters The following query creates a pipeline with the bytes processor and one optional parameter, `on_failure`, which uses the `set` processor to set the `error` field with a specific error message: -``json +```json PUT _ingest/pipeline/file_upload { "description": "Pipeline that converts file size to bytes", @@ -118,4 +155,5 @@ PUT _ingest/pipeline/file_upload } ``` {% include copy-curl.html %} -``` + +Repeat steps 2--4 to confirm the pipeline is working as expected. diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 76609c8bcc..51e9a01f10 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -18,6 +18,7 @@ The `convert` processor converts a field in a document to a different type, for } } ``` +{% include copy-curl.html %} ## Configuration parameters @@ -25,46 +26,61 @@ The following table lists the required and optional parameters for the `convert` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field whose value to convert. | +`field` | Required | Name of the field where the data should be converted. Supports template snippets.| `type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `"true"` (ignoring case), and to `false` if the field value is a string `"false"` (ignoring case). If the value is not one of the allowed values, an error will occur. | -`description` | Optional | Brief description of the processor. | -`target_field` | Optional | Name of the field to store the converted value. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`ignore_missing` | If set to `true`, the processor will ignore documents that do not have a value for the specified field. Default is `false`. -`ignore_failure` | Optional | If set to true, the processor will not fail if an error occurs. Default is `false`. | -`on_failure` | Optional | Action to take if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | +`description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`target_field` | Optional | Name of the field to store the parsed data in. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | + +## Using the processor + +Follow these steps to use the processor in a pipeline. -The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number and stores the converted value in the `price_float` field: +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number and stores the converted value in the `price_float` field and sets the value to `0` if it is less than `0`: ```json PUT _ingest/pipeline/convert-price { - "description": "Pipeline that converts price to floating-point number", + "description": "Pipeline that converts price to floating-point number and sets value to zero if price less than zero", "processors": [ { "convert": { "field": "price", - "type": "string", + "type": "float", "target_field": "price_float" } + }, + { + "set": { + "field": "price", + "value": "0", + "if": "ctx.price_float < 0" + } } ] } ``` {% include copy-curl.html %} -``` -Ingest a document into the index: +**Step 2: Ingest a document into the index.** + +The following query ingests a document into the index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=convert-price { - "price": "100" + "price": "10.5" } ``` {% include copy-curl.html %} -``` + +**Step 3: View the ingested document.** To view the ingested document, run the following query: @@ -72,25 +88,45 @@ To view the ingested document, run the following query: GET testindex1/_doc/1 ``` {% include copy-curl.html %} -``` -To test the pipeline, run the following query:: +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: ```json -POST _ingest/pipeline/user-behavior/_simulate +POST _ingest/pipeline/convert-price/_simulate { "docs": [ { "_index": "testindex1", "_id": "1", - "_source": { - "price_float": "100.00", - "price": - "price_float" + "_source": { + "price": "-10.5" } } ] } ``` {% include copy-curl.html %} -``` + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "price_float": -10.5, + "price": "0" + }, + "_ingest": { + "timestamp": "2023-08-22T15:38:21.180688799Z" + } + } + } + ] +} +``` \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 238ff71670..03efbfe0eb 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -18,6 +18,7 @@ The `csv` processor is used to parse comma-separated values (CSV) and store them } } ``` +{% include copy-curl.html %} ## Configuration parameters @@ -25,20 +26,27 @@ The following table lists the required and optional parameters for the `csv` pro **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field to extract data from. | +`field` | Required | Name of the field where the data should be converted. Supports template snippets.| `target_fields` | Required | Name of the field to store the parsed data in. | -`separator` | Optional | The delimiter used to separate the fields in the CSV data. | -`quote` | Optional | The character used to quote fields in the CSV data. | +`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | +`description` | Optional | Brief description of the processor. | +`empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`quote` | Optional | The character used to quote fields in the CSV data. | +`separator` | Optional | The delimiter used to separate the fields in the CSV data. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | `trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of the text. Default is `false`. | -`empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`on_failure` | Optional | Action to take if an error occurs. | -`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. Default is `false`. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | -Following is an example a pipeline using a `csv` processor. +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `csv-processor`, that splits `resource_usage` into three new fields named `cpu_usage`, `memory_usage`, and `disk_usage`: ```json PUT _ingest/pipeline/csv-processor @@ -54,29 +62,73 @@ PUT _ingest/pipeline/csv-processor } ] } +``` +{% include copy-curl.html %} + +**Step 2: Ingest a document into the index.** +The following query ingests a document into the index named `testindex1`: + +```json PUT testindex1/_doc/1?pipeline=csv-processor { "resource_usage": "25,4096,10" } ``` +{% include copy-curl.html %} -This pipeline transforms `resource usage` field into three separate fields: `cpu_usage` with a value of 25, `memory_usage` with a value of 4096, and `disk_usage` with a value of 10. Following is the GET request and response. +**Step 3: View the ingested document.** + +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/csv-processor/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 5, - "_seq_no": 4, - "_primary_term": 1, - "found": true, - "_source": { - "resource_usage": "25,4096,10", - "memory_usage": "4096", - "disk_usage": "10", - "cpu_usage": "25" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "resource_usage": "25,4096,10", + "memory_usage": "4096", + "disk_usage": "10", + "cpu_usage": "25" + } + } + ] } ``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "memory_usage": "4096", + "disk_usage": "10", + "resource_usage": "25,4096,10", + "cpu_usage": "25" + }, + "_ingest": { + "timestamp": "2023-08-22T16:40:45.024796379Z" + } + } + } + ] +} +``` \ No newline at end of file diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index dce9e22048..ba789dfc5c 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -18,6 +18,7 @@ The `date` processor is used to parse dates from fields in a document and store } } ``` +{% include copy-curl.html %} ## Configuration parameters @@ -25,20 +26,25 @@ The following table lists the required and optional parameters for the `date` pr **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field to extract data from. | +`field` | Required | Name of the field where the data should be converted. Supports template snippets.| `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | -`output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. -`target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | +`description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | `timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets.| -`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `false`. | -`if` | Optional | Conditional expression that determines whether the processor should be deployed. | -`on_failure` | Optional | Action to take if an error occurs. | -`ignore_failure` | Optional | If set to `true`, the processor does not fail if an error occurs. | -`tag` | Optional | Tag that can be used to identify the processor. | -`description` | Optional | Brief description of the processor. | -Following is an example of a pipeline using the `date` processor. +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `date-output-format`, that uses the `date` processor to convert from European date format to US date format, adding the new field `date_us` with the desired `output_format`: ```json PUT /_ingest/pipeline/date-output-format @@ -56,26 +62,69 @@ PUT /_ingest/pipeline/date-output-format } ] } +``` +{% include copy-curl.html %} +**Step 2: Ingest a document into the index.** + +The following query ingests a document into the index named `testindex1`: + +```json PUT testindex1/_doc/1?pipeline=date-output-format { "date_european": "30/06/2023" } ``` +{% include copy-curl.html %} + +**Step 3: View the ingested document.** -This pipeline adds the new field `date_us` with the desired output format. Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/date-output-format/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 9, - "_seq_no": 8, - "_primary_term": 1, - "found": true, - "_source": { - "date_us": "06/30/2023", - "date_european": "30/06/2023" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "date_us": "06/30/2023", + "date_european": "30/06/2023" + } + } + ] +} +``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "date_us": "06/30/2023", + "date_european": "30/06/2023" + }, + "_ingest": { + "timestamp": "2023-08-22T17:08:46.275195504Z" + } + } + } + ] } +``` diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 1c4558b61a..239fc9dc22 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -17,6 +17,7 @@ This processor converts all the text in a specific field to lowercase letters. T } } ``` +{% include copy-curl.html %} #### Configuration parameters @@ -24,17 +25,23 @@ The following table lists the required and optional parameters for the `lowercas | Name | Required | Description | |---|---|---| -| `field` | Required | Specifies the name of the field that you want to remove. | -| `target_field` | Optional | Specifies the name of the field to store the converted value in. Default is `field`. By default, `field` is updated in-place. | -| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | -| `on_failure` | Optional | Defines the processors to be deployed immediately following the failed processor. | -| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | -| `tag` | Optional | Provides an identifier for the processor. Useful for debugging and metrics. | -`description` | Optional | Brief description of the processor. | +`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`target_field` | Optional | Name of the field to store the parsed data in. Default is `field`. By default, `field` is updated in-place. | -Following is an example of an ingest pipeline using the `lowercase` processor. +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `lowercase-title`, that uses the `lowercase` processor to lowercase the `title` field of a document: ```json PUT _ingest/pipeline/lowercase-title @@ -48,27 +55,67 @@ PUT _ingest/pipeline/lowercase-title } ] } +``` +{% include copy-curl.html %} +**Step 2: Ingest a document into the index.** +The following query ingests a document into the index named `testindex1`: + +```json PUT testindex1/_doc/1?pipeline=lowercase-title { "title": "WAR AND PEACE" } ``` +{% include copy-curl.html %} + +**Step 3: View the ingested document.** -Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/lowercase-title/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 12, - "_seq_no": 11, - "_primary_term": 1, - "found": true, - "_source": { - "title": "war and peace" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "title": "war and peace" + } + } + ] +} +``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "title": "war and peace" + }, + "_ingest": { + "timestamp": "2023-08-22T17:39:39.872671834Z" + } + } + } + ] } ``` diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 4b1634156a..578ba5fb8a 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -17,6 +17,7 @@ The remove processor is used to remove a field from a document. The syntax for t } } ``` +{% include copy-curl.html %} #### Configuration parameters @@ -24,15 +25,20 @@ The following table lists the required and optional parameters for the `remove` | Name | Required | Description | |---|---|---| -| `field` | Required | Specifies the name of the field that you want to remove. | -| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | -| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | -| `tag` | Optional | Allows you to identify the processor for debugging and metrics. | -`description` | Optional | Brief description of the processor. | +`field` | Required | Name of the field where the data should be appended. Supports template snippets.| +`description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +## Using the processor -Following is an example of an ingest pipeline using the `remove` processor. +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `remove_ip`, that removes the `ip_address` field from a document: ```json PUT /_ingest/pipeline/remove_ip @@ -46,24 +52,65 @@ PUT /_ingest/pipeline/remove_ip } ] } +``` +{% include copy-curl.html %} + +**Step 2: Ingest a document into the index.** +The following query ingests a document into the index named `testindex1`: + +```json PUT testindex1/_doc/1?pipeline=remove_ip { "ip_address": "203.0.113.1" } ``` +{% include copy-curl.html %} + +**Step 3: View the ingested document.** -This pipeline removes the ip_address field from any document that passes through the pipeline. Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/remove_ip/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 10, - "_seq_no": 9, - "_primary_term": 1, - "found": true, - "_source": {} + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source":{ + "ip_address": "203.0.113.1" + } + } + ] } -``` \ No newline at end of file +``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": {}, + "_ingest": { + "timestamp": "2023-08-22T17:58:33.970510012Z" + } + } + } + ] +} +``` diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index b9427661d7..4a19034f7b 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -17,6 +17,7 @@ This processor converts all the text in a specific field to uppercase letters. T } } ``` +{% include copy-curl.html %} #### Configuration parameters @@ -24,17 +25,22 @@ The following table lists the required and optional parameters for the `uppercas | Name | Required | Description | |---|---|---| -| `field` | Required | Specifies the name of the field that you want to remove. | -| `target_field` | Optional | Specifies the name of the field to store the converted value in. Default is `field`. By default, `field` is updated in-place. | -| `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -| `ignore_failure` | Optional | Specifies whether the processor should continue processing documents even if it fails to remove the specified field. Default is `false`. | -| `on_failure` | Optional | Defines the processors to be deployed immediately following the failed processor. | -| `if` | Optional | Conditionally deploys the processor based on the value of the field. The `value` parameter specifies the value that you want to compare the field to. | -| `tag` | Optional | Provides an identifier for the processor. Useful for debugging and metrics. | -`description` | Optional | Brief description of the processor. | +`field` | Required | Name of the field where the data should be appended. Supports template snippets.| +`description` | Optional | Brief description of the processor. | +`if` | Optional | Condition to run this processor. | +`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | +`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | +`on_failure` | Optional | A list of processors to run if the processor fails. | +`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | +`target_field` | Optional | Name of the field to store the parsed data in. Default is `field`. By default, `field` is updated in-place. | +## Using the processor -Following is an example of an ingest pipeline using the `uppercase` processor. +Follow these steps to use the processor in a pipeline. + +**Step 1: Create pipeline.** + +The following query creates a pipeline, named `uppercase`, that converts the text in the `field` field to uppercase: ```json PUT _ingest/pipeline/uppercase @@ -47,26 +53,67 @@ PUT _ingest/pipeline/uppercase } ] } +``` +{% include copy-curl.html %} + +**Step 2: Ingest a document into the index.** +The following query ingests a document into the index named `testindex1`: + +```json PUT testindex1/_doc/1?pipeline=uppercase { "name": "John" } ``` +{% include copy-curl.html %} + +**Step 3: View the ingested document.** -Following is the GET request and response. +To view the ingested document, run the following query: ```json GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +**Step 4: Test the pipeline.** + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/uppercase/_simulate { - "_index": "testindex1", - "_id": "1", - "_version": 11, - "_seq_no": 10, - "_primary_term": 1, - "found": true, - "_source": { - "name": "JOHN" - } + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "name": "JOHN" + } + } + ] +} +``` +{% include copy-curl.html %} + +You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "name": "JOHN" + }, + "_ingest": { + "timestamp": "2023-08-22T18:40:40.870808043Z" + } + } + } + ] } ``` From 05e3ed6b2c09d6986ce38182d4d793856d71c74c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 22 Aug 2023 13:07:31 -0600 Subject: [PATCH 163/286] Update _api-reference/ingest-apis/pipeline-failures.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/pipeline-failures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index afba68104a..7e9a7b4e1f 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -63,7 +63,7 @@ If the processor fails, OpenSearch logs the failure and continues to run all rem To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): ``` -GET /_nodes/stats/ingest +GET /_nodes/stats/ingest?filter_path=nodes.*.ingest ``` {% include copy-curl.html %} From 74fb7f1703c065cba9a67568896bc889bceafac2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 22 Aug 2023 13:35:39 -0600 Subject: [PATCH 164/286] Address SME comments Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 3 +- .../ingest-apis/ingest-processors.md | 3 +- .../ingest-apis/pipeline-failures.md | 152 ++++-------------- 3 files changed, 35 insertions(+), 123 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 941da9d9bd..32413c71eb 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -20,7 +20,7 @@ To create, or update, an ingest pipeline, you need to use the `PUT` method to th PUT _ingest/pipeline/ ``` -Here is an example in JSON format that creates an ingest pipeline with using a `set` processor and an `uppercase` processor. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. +Here is an example in JSON format that creates an ingest pipeline with using `set` and `uppercase` processors. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. #### Example request @@ -82,7 +82,6 @@ Parameter | Required | Type | Description Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets, for example, {{{field-name}}}. - #### Example: `set` ingest processor using Mustache template snippet ```json diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 28d1c3a4d8..a77ca86f80 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -22,7 +22,7 @@ To set up and deploy ingest processors, make sure you have the necessary permiss The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case. See the [Related articles]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/#related-articles) section to learn more about the processor types and defining and configuring them within a pipeline. -#### Example +#### Example query and description of parameters ```json { @@ -40,4 +40,3 @@ The following is a generic example of an ingest processor definition within a pi `your_value` | Required | Replace this with the appropriate value for the chosen processor type and parameter. For example, if the processor is `rename`, then this value is the new field name you want to rename to. | `your_optional_parameter` | Optional | Some processors have optional parameters that modify their behavior. Replace this with the optional parameter. | `your_optional_value` | Optional | Replace this with the appropriate value for the optional parameter used. | - diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 7e9a7b4e1f..5a246e1139 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -6,7 +6,7 @@ grand_parent: Ingest APIs nav_order: 15 --- -## Handling pipeline failures +# Handling pipeline failures Each ingest pipeline consists of a series of processors that are applied to the data in sequence. If a processor fails, the entire pipeline will fail. You have two options for handling failures: @@ -30,10 +30,12 @@ PUT _ingest/pipeline/my-pipeline/ ] } ``` +{% include copy-curl.html %} You can specify the `on_failure` parameter to run immediately after a processor fails. If you have specified `on_failure`, OpenSearch will run the other processors in the pipeline, even if the `on_failure` configuration is empty: ```json +PUT _ingest/pipeline/my-pipeline/ { "description": "Add timestamp to the document", "processors": [ @@ -41,169 +43,81 @@ You can specify the `on_failure` parameter to run immediately after a processor "date": { "field": "timestamp_field", "target_field": "timestamp", - "formats": ["yyyy-MM-dd HH:mm:ss"] - } - } - ], - "on_failure": [ + "formats": ["yyyy-MM-dd HH:mm:ss"], + "on_failure": [ { "set": { "field": "ingest_error", "value": "failed" } } + ] + } + } ] } ``` +{% include copy-curl.html %} If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/#ingest-pipeline-metrics). +{: tip} ## Ingest pipeline metrics To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): -``` +```json GET /_nodes/stats/ingest?filter_path=nodes.*.ingest ``` {% include copy-curl.html %} -The response contains statistics for all ingest pipelines: +The response contains statistics for all ingest pipelines, for example: ```json -{ - "_nodes": { - "total": 2, - "successful": 2, - "failed": 0 - }, - "cluster_name": "opensearch-cluster", + { "nodes": { "iFPgpdjPQ-uzTdyPLwQVnQ": { - "timestamp": 1691011228995, - "name": "opensearch-node1", - "transport_address": "172.19.0.4:9300", - "host": "172.19.0.4", - "ip": "172.19.0.4:9300", - "roles": [ - "cluster_manager", - "data", - "ingest", - "remote_cluster_client" - ], - "attributes": { - "shard_indexing_pressure_enabled": "true" - }, "ingest": { "total": { - "count": 1, - "time_in_millis": 2, + "count": 28, + "time_in_millis": 82, "current": 0, - "failed": 0 + "failed": 9 }, "pipelines": { - "my-pipeline": { - "count": 16, - "time_in_millis": 23, + "user-behavior": { + "count": 5, + "time_in_millis": 0, "current": 0, - "failed": 4, + "failed": 0, "processors": [ { - "set": { - "type": "set", + "append": { + "type": "append", "stats": { - "count": 6, + "count": 5, "time_in_millis": 0, "current": 0, "failed": 0 } } - }, - { - "set": { - "type": "set", - "stats": { - "count": 6, - "time_in_millis": 3, - "current": 0, - "failed": 0 - } - } - }, - { - "uppercase": { - "type": "uppercase", - "stats": { - "count": 6, - "time_in_millis": 0, - "current": 0, - "failed": 4 - } - } } ] - } - } - } - }, - "dDOB3vS3TVmB5t6PHdCj4Q": { - "timestamp": 1691011228997, - "name": "opensearch-node2", - "transport_address": "172.19.0.2:9300", - "host": "172.19.0.2", - "ip": "172.19.0.2:9300", - "roles": [ - "cluster_manager", - "data", - "ingest", - "remote_cluster_client" - ], - "attributes": { - "shard_indexing_pressure_enabled": "true" - }, - "ingest": { - "total": { - "count": 0, - "time_in_millis": 0, - "current": 0, - "failed": 0 - }, - "pipelines": { - "my-pipeline": { - "count": 0, - "time_in_millis": 0, + }, + "remove_ip": { + "count": 5, + "time_in_millis": 9, "current": 0, - "failed": 0, + "failed": 2, "processors": [ { - "set": { - "type": "set", + "remove": { + "type": "remove", "stats": { - "count": 0, - "time_in_millis": 0, + "count": 5, + "time_in_millis": 8, "current": 0, - "failed": 0 - } - } - }, - { - "set": { - "type": "set", - "stats": { - "count": 0, - "time_in_millis": 0, - "current": 0, - "failed": 0 - } - } - }, - { - "uppercase": { - "type": "uppercase", - "stats": { - "count": 0, - "time_in_millis": 0, - "current": 0, - "failed": 0 + "failed": 2 } } } From 96616d83baea1894790774d7c98408737029b80a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 22 Aug 2023 13:47:19 -0600 Subject: [PATCH 165/286] Add copy label Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/simulate-ingest.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 8666abb500..9eac5eceb8 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -20,6 +20,7 @@ The following requests simulate the latest ingest pipeline created. GET _ingest/pipeline/_simulate POST _ingest/pipeline/_simulate ``` +{% include copy-curl.html %} The following requests simulate a single pipeline based on the pipeline ID. @@ -27,6 +28,7 @@ The following requests simulate a single pipeline based on the pipeline ID. GET _ingest/pipeline//_simulate POST _ingest/pipeline//_simulate ``` +{% include copy-curl.html %} ## Request body fields @@ -231,3 +233,4 @@ POST /_ingest/pipeline/_simulate ] } ``` +{% include copy-curl.html %} From 88d3805ecb9933bec91e14d558dcbb42b7fa4c4e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 22 Aug 2023 15:17:02 -0600 Subject: [PATCH 166/286] Update append.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index dba5b27930..1e3adf9903 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -25,7 +25,7 @@ The syntax for the `append` processor is: ``` {% include copy-curl.html %} -## Parameters +## Configuration parameters The following table lists the required and optional parameters for the `append` processor. From 4412bfc58d6ce05ae6eed6aa4e7604be21e6dac3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:01:35 -0600 Subject: [PATCH 167/286] Fix broken links Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/get-ingest.md | 2 +- _api-reference/ingest-apis/index.md | 6 +++--- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- _api-reference/ingest-apis/ingest-processors.md | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index 32e448d4f7..dfc1c0fd51 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -63,4 +63,4 @@ The response contains the pipeline information: ## Next steps -- [Test your pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/) \ No newline at end of file +- [Test your pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) \ No newline at end of file diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 66856efaba..2fb35ba2f3 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -15,7 +15,7 @@ Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work Simplify, secure, and scale your data ingestion in OpenSearch with the following APIs: -- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/create-ingest/): Use this API to create or update a pipeline configuration. -- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/get-ingest/): Use this API to retrieve a pipeline configuration. +- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration. +- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration. - [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. -- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/delete-ingest/): Use this API to delete a pipeline configuration. +- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/): Use this API to delete a pipeline configuration. diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index eb847051c1..4ef44e2b0d 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -47,4 +47,4 @@ Learn how to: - [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/) - [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) - [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) -- [Delete a pipeline]({{site.url}}{{site.baseurl}}/ingest-apis/delete-ingest/) in their respective documentation. +- [Delete a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/) in their respective documentation. diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index a77ca86f80..e525a5f9a8 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -8,7 +8,7 @@ has_children: true # Ingest processors -Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. +Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: From 3766c7bd25a95805f50414aa2fcd6c7598c00b86 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:11:37 -0600 Subject: [PATCH 168/286] Fix broken links Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/index.md | 2 +- _api-reference/ingest-apis/ingest-processors.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 2fb35ba2f3..00512dd22f 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -17,5 +17,5 @@ Simplify, secure, and scale your data ingestion in OpenSearch with the following - [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration. - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration. -- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/simulate-ingest/): Use this pipeline to test a pipeline configuration. +- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/): Use this pipeline to test a pipeline configuration. - [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/): Use this API to delete a pipeline configuration. diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index e525a5f9a8..40d1d75a5a 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,7 +20,7 @@ GET /_nodes/ingest To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. {:.note} -The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case. See the [Related articles]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/#related-articles) section to learn more about the processor types and defining and configuring them within a pipeline. +The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. #### Example query and description of parameters From dac1701efeaf1e33abe0fe3c8d0c15f0662bb9ae Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:46:45 -0600 Subject: [PATCH 169/286] Update create-ingest.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 32413c71eb..1d3c3692ff 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -80,7 +80,7 @@ Parameter | Required | Type | Description ## Template snippets -Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get a field value, enclose the field name in triple curly brackets, for example, {{{field-name}}}. +Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, {{{field-name}}}. #### Example: `set` ingest processor using Mustache template snippet From 17d42110fbab8784cdee73b68b91af9e16cd72a6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:49:37 -0600 Subject: [PATCH 170/286] Update append.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 1e3adf9903..a48df3a815 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -125,7 +125,7 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From 417242c8013a5a3124af43fd44489c00b1d03f63 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:50:23 -0600 Subject: [PATCH 171/286] Update bytes.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 44261a625f..7146c5631c 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -103,7 +103,7 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From d95391c4f431597b8ebdc605b6b8fc6ccb2dc0cd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:51:08 -0600 Subject: [PATCH 172/286] Update convert.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 51e9a01f10..b698b36735 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -109,7 +109,7 @@ POST _ingest/pipeline/convert-price/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { @@ -129,4 +129,4 @@ You'll get the following response, which confirms the pipeline is working correc } ] } -``` \ No newline at end of file +``` From b253f3a413bf13371e79b43fd399d200ad468356 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:53:02 -0600 Subject: [PATCH 173/286] Update convert.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index b698b36735..96e6b54e4f 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `convert` **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. Supports template snippets.| -`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `"true"` (ignoring case), and to `false` if the field value is a string `"false"` (ignoring case). If the value is not one of the allowed values, an error will occur. | +`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case), and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From 28482a9bab6bb4db2d9f9a02afc07083c3758dac Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:53:51 -0600 Subject: [PATCH 174/286] Update csv.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 03efbfe0eb..066b64089d 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -109,7 +109,7 @@ POST _ingest/pipeline/csv-processor/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { @@ -131,4 +131,4 @@ You'll get the following response, which confirms the pipeline is working correc } ] } -``` \ No newline at end of file +``` From 7b485211cae3280bebbc0cd5b5c67301d3015bb9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:54:48 -0600 Subject: [PATCH 175/286] Update date.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index ba789dfc5c..e80ec0dab8 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -107,7 +107,7 @@ POST _ingest/pipeline/date-output-format/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From e51b1f75cc05b268ce100025c75781b1df215e1c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:55:16 -0600 Subject: [PATCH 176/286] Update lowercase.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 239fc9dc22..cc5ee0d2b3 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -99,7 +99,7 @@ POST _ingest/pipeline/lowercase-title/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From 26e7db0edeb3fdefa30dab195cebd6cb9a593b5c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:56:21 -0600 Subject: [PATCH 177/286] Update remove.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 578ba5fb8a..bece38c6ee 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -96,7 +96,7 @@ POST _ingest/pipeline/remove_ip/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From 06ad0e1d6d212e035444ab75e9774335b22a05b0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 23 Aug 2023 12:57:02 -0600 Subject: [PATCH 178/286] Update uppercase.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 4a19034f7b..20045eebd7 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -97,7 +97,7 @@ POST _ingest/pipeline/uppercase/_simulate ``` {% include copy-curl.html %} -You'll get the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: ```json { From 753ea67ef644872da38dea7dbc3fd1ff92c76597 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:16:33 -0600 Subject: [PATCH 179/286] Update append.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index a48df3a815..b808cc7b20 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -125,7 +125,7 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 95481918f4be0b44f7b3fe3a6f78ced9f98e39ff Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:16:59 -0600 Subject: [PATCH 180/286] Update bytes.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 7146c5631c..c845dededf 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -103,7 +103,7 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 2280597fd62d8d4a8c6bcc249e73b09e6148ce3b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:17:49 -0600 Subject: [PATCH 181/286] Update convert.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 96e6b54e4f..e3cef3de42 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -109,7 +109,7 @@ POST _ingest/pipeline/convert-price/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 1cda6ba83b39bd54f2972cf97468a02075ef3f81 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:18:14 -0600 Subject: [PATCH 182/286] Update csv.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 066b64089d..76e741980c 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -109,7 +109,7 @@ POST _ingest/pipeline/csv-processor/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From d7d3e68c5f5b7976c70638f8a6a33fe1378d15c9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:18:42 -0600 Subject: [PATCH 183/286] Update date.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index e80ec0dab8..cca5cca253 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -107,7 +107,7 @@ POST _ingest/pipeline/date-output-format/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 64d4d30127f1c27b8f64a657275604f23b114fa5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:19:13 -0600 Subject: [PATCH 184/286] Update lowercase.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index cc5ee0d2b3..3548abfef9 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -99,7 +99,7 @@ POST _ingest/pipeline/lowercase-title/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From de5c1d0524ef41fcbc9fdee6641175b256f270b4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:19:37 -0600 Subject: [PATCH 185/286] Update remove.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index bece38c6ee..ecb156ee80 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -96,7 +96,7 @@ POST _ingest/pipeline/remove_ip/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 9c76cc16b0f5be825c7e164a6a2fa9b58610dc5a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 08:20:04 -0600 Subject: [PATCH 186/286] Update uppercase.md Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 20045eebd7..ddf9bc167d 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -97,7 +97,7 @@ POST _ingest/pipeline/uppercase/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms the pipeline is working correctly and producing the expected output: +You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: ```json { From 056ea7bf973d9388cb9b46f2e949dc372edfb935 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 13:12:58 -0600 Subject: [PATCH 187/286] Address doc review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 32 ++++++++----------- _api-reference/ingest-apis/delete-ingest.md | 2 +- .../ingest-apis/ingest-pipelines.md | 20 +++++++----- .../ingest-apis/ingest-processors.md | 21 +----------- .../ingest-apis/pipeline-failures.md | 32 ++++++++----------- .../ingest-apis/processors/append.md | 15 +++++---- .../ingest-apis/processors/bytes.md | 15 +++++---- .../ingest-apis/processors/convert.md | 15 +++++---- _api-reference/ingest-apis/processors/csv.md | 15 +++++---- _api-reference/ingest-apis/processors/date.md | 15 +++++---- .../ingest-apis/processors/lowercase.md | 15 +++++---- .../ingest-apis/processors/remove.md | 29 ++++++++++------- .../ingest-apis/processors/uppercase.md | 15 +++++---- 13 files changed, 122 insertions(+), 119 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 1d3c3692ff..cbd48a6511 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -10,20 +10,19 @@ redirect_from: # Create pipeline -Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires an ingest definition that defines how the processors change the document. +Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires you to define at least one processor that specifies how to change the documents. ## Path and HTTP method -To create, or update, an ingest pipeline, you need to use the `PUT` method to the `/_ingest/pipelines` endpoint. Replace `` with your pipeline ID. +Replace `` with your pipeline ID. ```json PUT _ingest/pipeline/ ``` - -Here is an example in JSON format that creates an ingest pipeline with using `set` and `uppercase` processors. The `set` processor sets the value of the `grad_year` field to the value of `2023` and the `graduated` field to the value of `true`. The `uppercase` processor converts the `name` field to capital letters. - #### Example request +Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the the `grad_year` to `2023`, the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase. + ```json PUT _ingest/pipeline/my-pipeline { @@ -53,25 +52,22 @@ PUT _ingest/pipeline/my-pipeline ``` {% include copy-curl.html %} -If a pipeline fails or results in an error, see [Handling pipelines failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/) to learn more. -{: .note} +To learn more about error handling, see [Handling pipelines failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/). ## Request body fields -The following table lists the request body fields used to create, or update, a pipeline. The body of the request must contain the field `processors`. The field `description` is optional. +The following table lists the request body fields used to create, or update, a pipeline. -Field | Required | Type | Description +Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`processors` | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. +`processors` | Required | Array of processor objects | An array of processors, each of which transforms documents. Processors are run sequentially in the order specified. `description` | Optional | String | Description of your ingest pipeline. ## Path parameters Parameter | Required | Type | Description :--- | :--- | :--- | :--- -`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline. A pipeline id is used in API requests to specify which pipeline should be created or modified. - -## Query parameters +`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline. Parameter | Required | Type | Description :--- | :--- | :--- | :--- @@ -80,7 +76,7 @@ Parameter | Required | Type | Description ## Template snippets -Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, {{{field-name}}}. +Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, `{% raw %}{{{field-name}}}{% endraw %}`. #### Example: `set` ingest processor using Mustache template snippet @@ -91,15 +87,15 @@ PUT _ingest/pipeline/my-pipeline { "set": { "description": "Sets the grad_year field to 2023 value", - "field": "{{{grad_year}}}", - "value": "{{{2023}}}" + "field": "{% raw %}{{{grad_year}}}{% endraw %}", + "value": "{% raw %}{{{2023}}}{% endraw %}" } }, { "set": { "description": "Sets graduated to true", - "field": "{{{graduated}}}", - "value": "{{{true}}}" + "field": "{% raw %}{{{graduated}}}{% endraw %}", + "value": "{% raw %}{{{true}}}{% endraw %}" } }, { diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index 0de94e3f30..1049dd9cdb 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -10,7 +10,7 @@ redirect_from: # Delete pipeline -Use the following requests to delete pipelines. +Use the following request to delete pipelines. To delete a specific pipeline, pass the pipeline ID as a parameter: diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 4ef44e2b0d..fbb6527a76 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -8,9 +8,9 @@ nav_order: 5 # Ingest pipelines -An _ingest pipeline_ is a sequence of steps that are applied to data as it is being ingested into a system. Each step in the pipeline performs a specific task, such as filtering, transforming, or enriching data. Ingest pipelines are a valuable tool to help you tailor data to your needs. +An _ingest pipeline_ is a sequence of _processors_ that are applied to documents as they are ingested into an index. Each [processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) in the pipeline performs a specific task, such as filtering, transforming, or enriching data. Ingest pipelines are a valuable tool to help you tailor data to your needs. -Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. OpenSearch [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) perform common transformations to your data, and the modified data appears in your index after each processor completes. +Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). {: .note} @@ -20,7 +20,7 @@ Ingest pipelines in OpenSearch can only be managed using [ingest API operations] The following are prerequisites for using OpenSearch ingest pipelines: - When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). -- If the OpenSearch security features are enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. +- If the OpenSearch Security plugin is enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. ## Define a pipeline @@ -33,18 +33,22 @@ A _pipeline definition_ describes the steps involved in an ingest pipeline and c } ``` -### Request body fields +Alternatively, you can specify a pipeline directly in the request body without creating a pipeline first. + +### ### Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- `processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. `description` | Optional | String | Description of the ingest pipeline. +### Example: Specify a pipeline in the request body + ## Next steps Learn how to: -- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/) -- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) -- [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) -- [Delete a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/) in their respective documentation. +- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/). +- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/). +- [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/). +- [Delete a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/). diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 40d1d75a5a..028158daa0 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,23 +20,4 @@ GET /_nodes/ingest To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. {:.note} -The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on their specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. - -#### Example query and description of parameters - -```json -{ - "your_processor_type": { - "your_required_parameter": "your_value", - "your_optional_parameter": "your_optional_value" - } -} -``` - -**Parameter** | **Required** | **Description** | -|-----------|-----------|-----------| -`your_processor_type` | Required | Type of processor you want to use, such as `rename`, `set`, `append`, and so forth. Different processor types perform different actions. | -`your_required_parameter` | Required | Required parameter specific to the processor type you've chosen. It defines the main setting or action for the processor to take. | -`your_value` | Required | Replace this with the appropriate value for the chosen processor type and parameter. For example, if the processor is `rename`, then this value is the new field name you want to rename to. | -`your_optional_parameter` | Optional | Some processors have optional parameters that modify their behavior. Replace this with the optional parameter. | -`your_optional_value` | Optional | Replace this with the appropriate value for the optional parameter used. | +The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on your specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 5a246e1139..9c0349fa11 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -8,7 +8,7 @@ nav_order: 15 # Handling pipeline failures -Each ingest pipeline consists of a series of processors that are applied to the data in sequence. If a processor fails, the entire pipeline will fail. You have two options for handling failures: +Each ingest pipeline consists of a series of processors that are applied to the documents in sequence. If a processor fails, the entire pipeline will fail. You have two options for handling failures: - **Fail the entire pipeline:** If a processor fails, the entire pipeline will fail and the document will not be indexed. - **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. @@ -37,21 +37,21 @@ You can specify the `on_failure` parameter to run immediately after a processor ```json PUT _ingest/pipeline/my-pipeline/ { - "description": "Add timestamp to the document", + "description": "Add timestampto the document", "processors": [ { "date": { "field": "timestamp_field", - "target_field": "timestamp", "formats": ["yyyy-MM-dd HH:mm:ss"], - "on_failure": [ - { - "set": { - "field": "ingest_error", - "value": "failed" - } - } - ] + "target_field": "@timestamp", + "on_failure": [ + { + "set": { + "field": "ingest_error", + "value": "failed" + } + } + ] } } ] @@ -130,11 +130,5 @@ The response contains statistics for all ingest pipelines, for example: } ``` -## Troubleshooting failures - -The following are tips on troubleshooting ingest pipeline failures: - -1. Check the logs: OpenSeach logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. -2. Inspect the document: If the ingest pipeline failed, then the document that was being processed will be in its respective index. -3. Check the processor configuration: It is possible the processor configuration is incorrect. To check this you can look at the processor configuration in the JSON object. -4. Try a different processor: You can try using a different processor. Some processors are better at handling certain types of data than others. +**Troubleshooting ingest pipeline failures:** The first thing you should do is check the logs to see if there are any errors or warning that can help you identify the cause of the failure. OpenSeach logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. +{: .tip} diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index b808cc7b20..c7c2d2d0b3 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -13,7 +13,7 @@ The `append` processor is used to add values to a field: - If the field is a scalar field, the `append` processor converts it to an array and appends the specified values to that array. - If the field does not exist, the `append` processor creates an array with the specified values. -The syntax for the `append` processor is: +The following is the syntax for the `append` processor: ```json { @@ -43,7 +43,7 @@ The following table lists the required and optional parameters for the `append` Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`: @@ -63,9 +63,9 @@ PUT _ingest/pipeline/user-behavior ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=user-behavior @@ -75,9 +75,9 @@ PUT testindex1/_doc/1?pipeline=user-behavior ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -105,6 +105,9 @@ Because there was no `event_types` field in the document, an array field is crea **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index c845dededf..6381409910 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -10,7 +10,7 @@ nav_order: 20 The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value is converted and stored in the field. If the field is an array, all values of the array are converted. -The syntax for the `bytes` processor is: +The following is the syntax for the `bytes` processor: ```json { @@ -40,7 +40,7 @@ The following table lists the required and optional parameters for the `bytes` p Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `file_upload`, that has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`: @@ -60,9 +60,9 @@ PUT _ingest/pipeline/file_upload ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=file_upload @@ -72,9 +72,9 @@ PUT testindex1/_doc/1?pipeline=file_upload ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -83,6 +83,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index e3cef3de42..71b7387774 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -8,7 +8,7 @@ nav_order: 30 # Convert -The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The syntax for the `convert` processor is: +The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The following is the syntax for the `convert` processor: ```json { @@ -40,7 +40,7 @@ The following table lists the required and optional parameters for the `convert` Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number and stores the converted value in the `price_float` field and sets the value to `0` if it is less than `0`: @@ -68,9 +68,9 @@ PUT _ingest/pipeline/convert-price ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=convert-price @@ -80,9 +80,9 @@ PUT testindex1/_doc/1?pipeline=convert-price ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -91,6 +91,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 76e741980c..c3f58e3cf2 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -8,7 +8,7 @@ nav_order: 40 # CSV -The `csv` processor is used to parse comma-separated values (CSV) and store them as individual fields in a document. The processor ignores empty fields. The syntax for the `csv` processor is: +The `csv` processor is used to parse comma-separated values (CSV) and store them as individual fields in a document. The processor ignores empty fields. The following is the syntax for the `csv` processor: ```json { @@ -44,7 +44,7 @@ The following table lists the required and optional parameters for the `csv` pro Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `csv-processor`, that splits `resource_usage` into three new fields named `cpu_usage`, `memory_usage`, and `disk_usage`: @@ -65,9 +65,9 @@ PUT _ingest/pipeline/csv-processor ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=csv-processor @@ -77,9 +77,9 @@ PUT testindex1/_doc/1?pipeline=csv-processor ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -88,6 +88,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index cca5cca253..2ab7047c8c 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -8,7 +8,7 @@ nav_order: 50 # Date -The `date` processor is used to parse dates from fields in a document and store them as a timestamp. The syntax for the `date` processor is: +The `date` processor is used to parse dates from fields in a document. By default, the date processor adds the parsed date as a new field called `@timestamp`. You can specify a different field by setting the `target_field` configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition. The following is the syntax for the `date` processor: ```json { @@ -42,7 +42,7 @@ The following table lists the required and optional parameters for the `date` pr Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `date-output-format`, that uses the `date` processor to convert from European date format to US date format, adding the new field `date_us` with the desired `output_format`: @@ -65,9 +65,9 @@ PUT /_ingest/pipeline/date-output-format ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=date-output-format @@ -77,9 +77,9 @@ PUT testindex1/_doc/1?pipeline=date-output-format ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -88,6 +88,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 3548abfef9..9bfb331f2c 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -8,7 +8,7 @@ nav_order: 210 # Lowercase -This processor converts all the text in a specific field to lowercase letters. The syntax for the `lowercase` processor is: +This processor converts all the text in a specific field to lowercase letters. The following is the syntax for the `lowercase` processor: ```json { @@ -39,7 +39,7 @@ The following table lists the required and optional parameters for the `lowercas Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `lowercase-title`, that uses the `lowercase` processor to lowercase the `title` field of a document: @@ -58,9 +58,9 @@ PUT _ingest/pipeline/lowercase-title ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=lowercase-title @@ -70,9 +70,9 @@ PUT testindex1/_doc/1?pipeline=lowercase-title ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -81,6 +81,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index ecb156ee80..9802b2f0c4 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -8,7 +8,7 @@ nav_order: 230 # Remove -The remove processor is used to remove a field from a document. The syntax for the `remove` processor is: +The remove processor is used to remove a field from a document. The following is the syntax for the `remove` processor: ```json { @@ -36,7 +36,7 @@ The following table lists the required and optional parameters for the `remove` Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `remove_ip`, that removes the `ip_address` field from a document: @@ -55,21 +55,22 @@ PUT /_ingest/pipeline/remove_ip ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json -PUT testindex1/_doc/1?pipeline=remove_ip +PPUT testindex1/_doc/1?pipeline=remove_ip { - "ip_address": "203.0.113.1" + "ip_address": "203.0.113.1", + "name": "John Doe" } ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -78,6 +79,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json @@ -88,7 +92,8 @@ POST _ingest/pipeline/remove_ip/_simulate "_index": "testindex1", "_id": "1", "_source":{ - "ip_address": "203.0.113.1" + "ip_address": "203.0.113.1", + "name": "John Doe" } } ] @@ -105,9 +110,11 @@ You'll receive the following response, which confirms that the pipeline is worki "doc": { "_index": "testindex1", "_id": "1", - "_source": {}, + "_source": { + "name": "John Doe" + }, "_ingest": { - "timestamp": "2023-08-22T17:58:33.970510012Z" + "timestamp": "2023-08-24T18:02:13.218986756Z" } } } diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index ddf9bc167d..37eb15cdab 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -8,7 +8,7 @@ nav_order: 310 # Uppercase -This processor converts all the text in a specific field to uppercase letters. The syntax for the `uppercase` processor is: +This processor converts all the text in a specific field to uppercase letters. The following is the syntax for the `uppercase` processor: ```json { @@ -38,7 +38,7 @@ The following table lists the required and optional parameters for the `uppercas Follow these steps to use the processor in a pipeline. -**Step 1: Create pipeline.** +**Step 1: Create a pipeline.** The following query creates a pipeline, named `uppercase`, that converts the text in the `field` field to uppercase: @@ -56,9 +56,9 @@ PUT _ingest/pipeline/uppercase ``` {% include copy-curl.html %} -**Step 2: Ingest a document into the index.** +**Step 2: Ingest a document into an index.** -The following query ingests a document into the index named `testindex1`: +The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=uppercase @@ -68,9 +68,9 @@ PUT testindex1/_doc/1?pipeline=uppercase ``` {% include copy-curl.html %} -**Step 3: View the ingested document.** +**Step 3: View an ingested document.** -To view the ingested document, run the following query: +To view an ingested document, run the following query: ```json GET testindex1/_doc/1 @@ -79,6 +79,9 @@ GET testindex1/_doc/1 **Step 4: Test the pipeline.** +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + To test the pipeline, run the following query: ```json From 7e096db553045870c7d1a4401acead9f78129337 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 13:23:42 -0600 Subject: [PATCH 188/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/simulate-ingest.md | 41 ++++++++++++++++++- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 9eac5eceb8..fe13480bb0 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -14,7 +14,7 @@ Use the simulate ingest pipeline API operation to run or test the pipeline. ## Path and HTTP methods -The following requests simulate the latest ingest pipeline created. +The following requests **simulate the latest ingest pipeline created**: ``` GET _ingest/pipeline/_simulate @@ -22,7 +22,7 @@ POST _ingest/pipeline/_simulate ``` {% include copy-curl.html %} -The following requests simulate a single pipeline based on the pipeline ID. +The following requests **simulate a single pipeline based on the pipeline ID**: ``` GET _ingest/pipeline//_simulate @@ -234,3 +234,40 @@ POST /_ingest/pipeline/_simulate } ``` {% include copy-curl.html %} + +The request returns the following response: + +```json +{ + "docs": [ + { + "doc": { + "_index": "second-index", + "_id": "1", + "_source": { + "name": "Doe,John", + "last_name": "DOE", + "first_name": "John" + }, + "_ingest": { + "timestamp": "2023-08-24T19:20:44.816219673Z" + } + } + }, + { + "doc": { + "_index": "second-index", + "_id": "2", + "_source": { + "name": "Doe, Jane", + "last_name": "DOE", + "first_name": "Jane" + }, + "_ingest": { + "timestamp": "2023-08-24T19:20:44.816492381Z" + } + } + } + ] +} +``` From 55357a8a99b7ffae0ec35ef28872f50cba4e882c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 13:26:51 -0600 Subject: [PATCH 189/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index fbb6527a76..6982bbb156 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -42,8 +42,6 @@ Field | Required | Type | Description `processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. `description` | Optional | String | Description of the ingest pipeline. -### Example: Specify a pipeline in the request body - ## Next steps Learn how to: From 14294a371cf6984477f477f8946c282f7434f976 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 13:46:35 -0600 Subject: [PATCH 190/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 2ab7047c8c..f0334a5ef7 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -8,7 +8,7 @@ nav_order: 50 # Date -The `date` processor is used to parse dates from fields in a document. By default, the date processor adds the parsed date as a new field called `@timestamp`. You can specify a different field by setting the `target_field` configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition. The following is the syntax for the `date` processor: +The `date` processor is used to parse dates from fields in a document and add the parsed data to a new field. By default, the parsed data is stored in the `@timestamp` field. The following is the syntax for the `date` processor: ```json { From c879c9d3faf7052f05f1b7c13ed76c976cc3d54c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 24 Aug 2023 16:56:32 -0600 Subject: [PATCH 191/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 38 ++++++++------ _api-reference/ingest-apis/get-ingest.md | 2 +- .../ingest-apis/processors/uppercase.md | 49 ++++++++++--------- _api-reference/ingest-apis/simulate-ingest.md | 2 +- 4 files changed, 49 insertions(+), 42 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index cbd48a6511..2e1b40ca1e 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -80,34 +80,40 @@ Some processor parameters support [Mustache](https://mustache.github.io/) templa #### Example: `set` ingest processor using Mustache template snippet +The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`. + ```json PUT _ingest/pipeline/my-pipeline { - "processors": [ + "processors": [ { "set": { - "description": "Sets the grad_year field to 2023 value", - "field": "{% raw %}{{{grad_year}}}{% endraw %}", - "value": "{% raw %}{{{2023}}}{% endraw %}" + "field": "{% raw %}{{{role}}}{% endraw %}", + "value": "{% raw %}{{{tenure}}}{% endraw %}" } - }, - { - "set": { - "description": "Sets graduated to true", - "field": "{% raw %}{{{graduated}}}{% endraw %}", - "value": "{% raw %}{{{true}}}{% endraw %}" - } - }, - { - "uppercase": { - "field": "name" - } } ] } ``` {% include copy-curl.html %} +Ingest a document by running the following query: + +```json +PUT testindex1/_doc/1?pipeline=my-pipeline +{ + "role" : "teacher", + "tenure": 10 +} +``` + +View the ingested document by running the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + ## Next steps - [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index dfc1c0fd51..cd88e8be15 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -3,7 +3,7 @@ layout: default title: Get pipeline parent: Ingest pipelines grand_parent: Ingest APIs -nav_order: 11 +nav_order: 12 redirect_from: - /opensearch/rest-api/ingest-apis/get-ingest/ --- diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 37eb15cdab..7848cbe24f 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -56,28 +56,8 @@ PUT _ingest/pipeline/uppercase ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=uppercase -{ - "name": "John" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2: Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -92,7 +72,7 @@ POST _ingest/pipeline/uppercase/_simulate "_index": "testindex1", "_id": "1", "_source": { - "name": "JOHN" + "name": "{}" } } ] @@ -110,13 +90,34 @@ You'll receive the following response, which confirms that the pipeline is worki "_index": "testindex1", "_id": "1", "_source": { - "name": "JOHN" + "name": "{}" }, "_ingest": { - "timestamp": "2023-08-22T18:40:40.870808043Z" + "timestamp": "2023-08-24T21:24:48.598293591Z" } } } ] } ``` + +**Step 3: Ingest a document into an index.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=uppercase +{ + "name": "John" +} +``` +{% include copy-curl.html %} + +**Step 4: View an ingested document.** + +To view an ingested document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index fe13480bb0..9c2936c153 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -3,7 +3,7 @@ layout: default title: Simulate pipeline parent: Ingest pipelines grand_parent: Ingest APIs -nav_order: 12 +nav_order: 11 redirect_from: - /opensearch/rest-api/ingest-apis/simulate-ingest/ --- From 9b467ccd5cb1b9b0a6de848aa8f2049b437d2aa4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 25 Aug 2023 08:13:49 -0600 Subject: [PATCH 192/286] Revise order of steps Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 56 ++++++++++--------- 1 file changed, 29 insertions(+), 27 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index c7c2d2d0b3..d771e2b085 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -63,7 +63,32 @@ PUT _ingest/pipeline/user-behavior ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** +**Step 2: Test pipeline.** + +It is recommended that you test a pipeline before you ingest documents. +{: .tip} + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/user-behavior/_simulate +{ + "docs": [ + { + "_index": "testindex1", + "_id": "1", + "_source": { + "event_type": "page_view", + "event_types": + "event_type" + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 3: Ingest a document into an index.** The following query ingests a document into an index named `testindex1`: @@ -75,7 +100,7 @@ PUT testindex1/_doc/1?pipeline=user-behavior ``` {% include copy-curl.html %} -**Step 3: View an ingested document.** +**Step 4: View an ingested document.** To view an ingested document, run the following query: @@ -103,32 +128,9 @@ Because there was no `event_types` field in the document, an array field is crea } ``` -**Step 4: Test the pipeline.** - -It is recommended that you test a pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/user-behavior/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "event_type": "page_view", - "event_types": - "event_type" - } - } - ] -} -``` -{% include copy-curl.html %} +### Response -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +The following response confirms that the pipeline is working correctly and producing the expected output: ```json { From 12347ec96cf1a2761fbcf305326d4b2ddd5e8607 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 25 Aug 2023 09:23:32 -0600 Subject: [PATCH 193/286] Update processor steps Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/append.md | 14 +++--- .../ingest-apis/processors/bytes.md | 48 +++++++++--------- .../ingest-apis/processors/convert.md | 48 +++++++++--------- _api-reference/ingest-apis/processors/csv.md | 48 +++++++++--------- _api-reference/ingest-apis/processors/date.md | 48 +++++++++--------- .../ingest-apis/processors/lowercase.md | 49 +++++++++--------- .../ingest-apis/processors/remove.md | 50 ++++++++++--------- .../ingest-apis/processors/uppercase.md | 49 +++++++++--------- 8 files changed, 181 insertions(+), 173 deletions(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index d771e2b085..a3662d2639 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -63,7 +63,7 @@ PUT _ingest/pipeline/user-behavior ``` {% include copy-curl.html %} -**Step 2: Test pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -88,7 +88,7 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document into an index.** +**Step 3: Ingest a document.** The following query ingests a document into an index named `testindex1`: @@ -100,16 +100,16 @@ PUT testindex1/_doc/1?pipeline=user-behavior ``` {% include copy-curl.html %} -**Step 4: View an ingested document.** +**Step 4 (Optional): Retrieve the document.** -To view an ingested document, run the following query: +To retrieve the document, run the following query: ```json GET testindex1/_doc/1 ``` {% include copy-curl.html %} -Because there was no `event_types` field in the document, an array field is created and the event is appended to the array: +Because no `event_types` field is in the document, an array field is created and the event is appended to the array: ```json { @@ -128,9 +128,9 @@ Because there was no `event_types` field in the document, an array field is crea } ``` -### Response +#### Response -The following response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 6381409910..480aa6e50d 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -60,28 +60,7 @@ PUT _ingest/pipeline/file_upload ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=file_upload -{ - "file_size": "10MB" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -106,7 +85,30 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=file_upload +{ + "file_size": "10MB" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 71b7387774..676555f8b1 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -68,28 +68,7 @@ PUT _ingest/pipeline/convert-price ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=convert-price -{ - "price": "10.5" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -112,7 +91,30 @@ POST _ingest/pipeline/convert-price/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=convert-price +{ + "price": "10.5" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index c3f58e3cf2..838fc46d56 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -65,28 +65,7 @@ PUT _ingest/pipeline/csv-processor ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=csv-processor -{ - "resource_usage": "25,4096,10" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -112,7 +91,30 @@ POST _ingest/pipeline/csv-processor/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=csv-processor +{ + "resource_usage": "25,4096,10" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index f0334a5ef7..89168d8e58 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -65,28 +65,7 @@ PUT /_ingest/pipeline/date-output-format ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=date-output-format -{ - "date_european": "30/06/2023" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -110,7 +89,30 @@ POST _ingest/pipeline/date-output-format/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=date-output-format +{ + "date_european": "30/06/2023" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 9bfb331f2c..1f88ae17a0 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -34,7 +34,6 @@ The following table lists the required and optional parameters for the `lowercas `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | `target_field` | Optional | Name of the field to store the parsed data in. Default is `field`. By default, `field` is updated in-place. | - ## Using the processor Follow these steps to use the processor in a pipeline. @@ -58,28 +57,7 @@ PUT _ingest/pipeline/lowercase-title ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=lowercase-title -{ - "title": "WAR AND PEACE" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -102,7 +80,30 @@ POST _ingest/pipeline/lowercase-title/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=lowercase-title +{ + "title": "WAR AND PEACE" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To view an ingested document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 9802b2f0c4..e07bf9cbfa 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -55,29 +55,7 @@ PUT /_ingest/pipeline/remove_ip ``` {% include copy-curl.html %} -**Step 2: Ingest a document into an index.** - -The following query ingests a document into an index named `testindex1`: - -```json -PPUT testindex1/_doc/1?pipeline=remove_ip -{ - "ip_address": "203.0.113.1", - "name": "John Doe" -} -``` -{% include copy-curl.html %} - -**Step 3: View an ingested document.** - -To view an ingested document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -**Step 4: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -101,7 +79,31 @@ POST _ingest/pipeline/remove_ip/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PPUT testindex1/_doc/1?pipeline=remove_ip +{ + "ip_address": "203.0.113.1", + "name": "John Doe" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: ```json { diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 7848cbe24f..9c82faa462 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -57,7 +57,7 @@ PUT _ingest/pipeline/uppercase {% include copy-curl.html %} -**Step 2: Test the pipeline.** +**Step 2 (Optional): Test the pipeline.** It is recommended that you test a pipeline before you ingest documents. {: .tip} @@ -72,7 +72,7 @@ POST _ingest/pipeline/uppercase/_simulate "_index": "testindex1", "_id": "1", "_source": { - "name": "{}" + "name": "{John}" } } ] @@ -80,28 +80,7 @@ POST _ingest/pipeline/uppercase/_simulate ``` {% include copy-curl.html %} -You'll receive the following response, which confirms that the pipeline is working correctly and producing the expected output: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "name": "{}" - }, - "_ingest": { - "timestamp": "2023-08-24T21:24:48.598293591Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document into an index.** +**Step 3: Ingest a document.** The following query ingests a document into an index named `testindex1`: @@ -113,11 +92,29 @@ PUT testindex1/_doc/1?pipeline=uppercase ``` {% include copy-curl.html %} -**Step 4: View an ingested document.** +**Step 4 (Optional): Retrieve the document.** -To view an ingested document, run the following query: +To retrieve the document, run the following query: ```json GET testindex1/_doc/1 ``` {% include copy-curl.html %} + +#### Response + +The following example response confirms the pipeline is working correctly and producing the expected output: + +```json +{ + "_index": "testindex1", + "_id": "1", + "_version": 44, + "_seq_no": 43, + "_primary_term": 3, + "found": true, + "_source": { + "name": "JOHN" + } +} +``` \ No newline at end of file From a8f4cebbacde54eaad99fd148820f517f89eca5c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 25 Aug 2023 09:34:58 -0600 Subject: [PATCH 194/286] Copy edits Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 21 --------------------- _api-reference/ingest-apis/get-ingest.md | 4 ---- 2 files changed, 25 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 2e1b40ca1e..3142a8d222 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -96,24 +96,3 @@ PUT _ingest/pipeline/my-pipeline } ``` {% include copy-curl.html %} - -Ingest a document by running the following query: - -```json -PUT testindex1/_doc/1?pipeline=my-pipeline -{ - "role" : "teacher", - "tenure": 10 -} -``` - -View the ingested document by running the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -## Next steps - -- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/) diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index cd88e8be15..a56d7da584 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -60,7 +60,3 @@ The response contains the pipeline information: } } ``` - -## Next steps - -- [Test your pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/) \ No newline at end of file From 9d29c8662c67a250493e0c4602847e905ea5dfeb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:10:46 -0600 Subject: [PATCH 195/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 3142a8d222..f3f7747564 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -14,7 +14,7 @@ Use the create pipeline API operation to create or update pipelines in OpenSearc ## Path and HTTP method -Replace `` with your pipeline ID. +Replace `` with your pipeline ID: ```json PUT _ingest/pipeline/ From 6fe30f57f8f48def7ac9716cc80de32b94ff8ecc Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:11:02 -0600 Subject: [PATCH 196/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index f3f7747564..03d39e01a1 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -21,7 +21,7 @@ PUT _ingest/pipeline/ ``` #### Example request -Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the the `grad_year` to `2023`, the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase. +Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the `grad_year` to `2023`, and the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase. ```json PUT _ingest/pipeline/my-pipeline From a20ac44b2f998767f24856f16f9a046c3a95a436 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:11:15 -0600 Subject: [PATCH 197/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 03d39e01a1..d78c8a62e0 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -52,7 +52,7 @@ PUT _ingest/pipeline/my-pipeline ``` {% include copy-curl.html %} -To learn more about error handling, see [Handling pipelines failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/). +To learn more about error handling, see [Handling pipeline failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/). ## Request body fields From fc955826825bd6154bce4b93607ee924d6b1abdc Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:11:29 -0600 Subject: [PATCH 198/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index d78c8a62e0..af86160223 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -56,7 +56,7 @@ To learn more about error handling, see [Handling pipeline failures]({{site.url} ## Request body fields -The following table lists the request body fields used to create, or update, a pipeline. +The following table lists the request body fields used to create or update a pipeline. Parameter | Required | Type | Description :--- | :--- | :--- | :--- From 80eb62f75cdb1c844d6bff99a8fd7ca775cd29b2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:11:38 -0600 Subject: [PATCH 199/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index af86160223..01605c2cb9 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -61,7 +61,7 @@ The following table lists the request body fields used to create or update a pip Parameter | Required | Type | Description :--- | :--- | :--- | :--- `processors` | Required | Array of processor objects | An array of processors, each of which transforms documents. Processors are run sequentially in the order specified. -`description` | Optional | String | Description of your ingest pipeline. +`description` | Optional | String | A description of your ingest pipeline. ## Path parameters From 967bf9c49f959a98422ab38052e543c830be3589 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:12:11 -0600 Subject: [PATCH 200/286] Update _api-reference/ingest-apis/create-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 01605c2cb9..2110b2bd47 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -80,7 +80,7 @@ Some processor parameters support [Mustache](https://mustache.github.io/) templa #### Example: `set` ingest processor using Mustache template snippet -The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`. +The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`: ```json PUT _ingest/pipeline/my-pipeline From aafd5d07a76c6b24513fb64b2c38d98bff999994 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:12:22 -0600 Subject: [PATCH 201/286] Update _api-reference/ingest-apis/delete-ingest.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/delete-ingest.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index 1049dd9cdb..59383fb0aa 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -10,7 +10,7 @@ redirect_from: # Delete pipeline -Use the following request to delete pipelines. +Use the following request to delete a pipeline. To delete a specific pipeline, pass the pipeline ID as a parameter: From 7f0a5ff37d41ff2b2936299c2c189ead86920859 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:12:45 -0600 Subject: [PATCH 202/286] Update _api-reference/ingest-apis/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 00512dd22f..462c699fc2 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -13,7 +13,7 @@ Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work ## Ingest pipeline APIs -Simplify, secure, and scale your data ingestion in OpenSearch with the following APIs: +Simplify, secure, and scale your OpenSearch data ingestion with the following APIs: - [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration. - [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration. From 9402e7ba9f13161fdbe94f6a8d3d0cf2e00dd749 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:13:17 -0600 Subject: [PATCH 203/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 6982bbb156..c89dcffd3e 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -8,7 +8,7 @@ nav_order: 5 # Ingest pipelines -An _ingest pipeline_ is a sequence of _processors_ that are applied to documents as they are ingested into an index. Each [processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) in the pipeline performs a specific task, such as filtering, transforming, or enriching data. Ingest pipelines are a valuable tool to help you tailor data to your needs. +An _ingest pipeline_ is a sequence of _processors_ that are applied to documents as they are ingested into an index. Each [processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) in a pipeline performs a specific task, such as filtering, transforming, or enriching data. Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. From 47a7040b1860a856edbe6d15d5e1f1c0abc9f24e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:18:12 -0600 Subject: [PATCH 204/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index c89dcffd3e..20f1fd4c57 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -10,7 +10,7 @@ nav_order: 5 An _ingest pipeline_ is a sequence of _processors_ that are applied to documents as they are ingested into an index. Each [processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) in a pipeline performs a specific task, such as filtering, transforming, or enriching data. -Ingest pipelines consist of _processors_. Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. +Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). {: .note} From 65ba5e0076668fb2f444aa9a2a29a26a48fd21a7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:18:35 -0600 Subject: [PATCH 205/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 20f1fd4c57..d69c1191a5 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -12,7 +12,7 @@ An _ingest pipeline_ is a sequence of _processors_ that are applied to documents Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. -Ingest pipelines in OpenSearch can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). +Ingest pipelines can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). {: .note} ## Prerequisites From f5319453f73b3fb19d96c799ed94cd247fe5f20b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 09:19:36 -0600 Subject: [PATCH 206/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index d69c1191a5..d88ce4f666 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -19,7 +19,7 @@ Ingest pipelines can only be managed using [ingest API operations]({{site.url}}{ The following are prerequisites for using OpenSearch ingest pipelines: -- When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). +- When using ingestion in a production environment, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). - If the OpenSearch Security plugin is enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. ## Define a pipeline From 32846b19c5e6ec15e12d3b0df505f0b07865c2fe Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:18:15 -0600 Subject: [PATCH 207/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index d88ce4f666..32888ed2aa 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -35,7 +35,7 @@ A _pipeline definition_ describes the steps involved in an ingest pipeline and c Alternatively, you can specify a pipeline directly in the request body without creating a pipeline first. -### ### Request body fields +### Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- From 904f386c1c536ed0a21afe16d479711b5c876583 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:19:08 -0600 Subject: [PATCH 208/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 32888ed2aa..084578e554 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -39,7 +39,7 @@ Alternatively, you can specify a pipeline directly in the request body without c Field | Required | Type | Description :--- | :--- | :--- | :--- -`processors` | Required | Array of processor objects | A component that performs a specific task to process data as it's being ingested into OpenSearch. +`processors` | Required | Array of processor objects | A component that performs a specific data processing task as the data is being ingested into OpenSearch. `description` | Optional | String | Description of the ingest pipeline. ## Next steps From f4add7c09550825f12fabc749e691034f3ff85c0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:25:43 -0600 Subject: [PATCH 209/286] Update _api-reference/ingest-apis/ingest-pipelines.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index 084578e554..d8d8594864 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -40,7 +40,7 @@ Alternatively, you can specify a pipeline directly in the request body without c Field | Required | Type | Description :--- | :--- | :--- | :--- `processors` | Required | Array of processor objects | A component that performs a specific data processing task as the data is being ingested into OpenSearch. -`description` | Optional | String | Description of the ingest pipeline. +`description` | Optional | String | A description of the ingest pipeline. ## Next steps From 69a05d86b70700ef3464ad586418b7cbb96b88cd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:28:36 -0600 Subject: [PATCH 210/286] Update _api-reference/ingest-apis/ingest-processors.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/ingest-processors.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 028158daa0..31c7193074 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -8,7 +8,7 @@ has_children: true # Ingest processors -Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/), as they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data format, or append additional information. +Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) because they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data formats, or append additional information. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: From c405e9810099c547f1e72b41fbc7caeb6c0fafa9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:30:34 -0600 Subject: [PATCH 211/286] Update _api-reference/ingest-apis/pipeline-failures.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/pipeline-failures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 9c0349fa11..8428103b78 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -32,7 +32,7 @@ PUT _ingest/pipeline/my-pipeline/ ``` {% include copy-curl.html %} -You can specify the `on_failure` parameter to run immediately after a processor fails. If you have specified `on_failure`, OpenSearch will run the other processors in the pipeline, even if the `on_failure` configuration is empty: +You can specify the `on_failure` parameter to run immediately after a processor fails. If you have specified `on_failure`, OpenSearch will run the other processors in the pipeline even if the `on_failure` configuration is empty: ```json PUT _ingest/pipeline/my-pipeline/ From 63effc5df3f815116d4b1af59ab0abc8cd143225 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:31:20 -0600 Subject: [PATCH 212/286] Update _api-reference/ingest-apis/pipeline-failures.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/pipeline-failures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 8428103b78..6dc4d02235 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -130,5 +130,5 @@ The response contains statistics for all ingest pipelines, for example: } ``` -**Troubleshooting ingest pipeline failures:** The first thing you should do is check the logs to see if there are any errors or warning that can help you identify the cause of the failure. OpenSeach logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. +**Troubleshooting ingest pipeline failures:** The first thing you should do is check the logs to see whether there are any errors or warnings that can help you identify the cause of the failure. OpenSearch logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. {: .tip} From 1dbf2a634564e5b96717255ea5282c052a7e6603 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:31:50 -0600 Subject: [PATCH 213/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index a3662d2639..9fadb2bb05 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -29,7 +29,7 @@ The following is the syntax for the `append` processor: The following table lists the required and optional parameters for the `append` processor. -**Parameter** | **Required** | **Description** | +Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | From baad585ba5ae6c1a9e8403d1f8e7f4558cdf50c4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:34:07 -0600 Subject: [PATCH 214/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 9fadb2bb05..2a5a99ea62 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `append` Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| -`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | +`value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From ced10729bed05c38f834560af690d669d861bef0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:34:33 -0600 Subject: [PATCH 215/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 2a5a99ea62..5915bf5e5d 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -33,7 +33,7 @@ Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be appended. Supports template snippets.| `value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | From 4065a792c9a2988c670992c5f1a13520e29a45bf Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:35:10 -0600 Subject: [PATCH 216/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 5915bf5e5d..68719541d1 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -31,7 +31,7 @@ The following table lists the required and optional parameters for the `append` Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be appended. Supports template snippets.| +`field` | Required | The name of the field where the data should be appended. Supports template snippets.| `value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | From 761c6c6bc5e1e09af9d409f49aee0d9408edbac1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:36:25 -0600 Subject: [PATCH 217/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 68719541d1..a9a9e4f2fe 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -45,7 +45,7 @@ Follow these steps to use the processor in a pipeline. **Step 1: Create a pipeline.** -The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field `event_types`: +The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field named `event_types`: ```json PUT _ingest/pipeline/user-behavior From c73de521dacae315aefa11c26cb59c52962d7a87 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:36:56 -0600 Subject: [PATCH 218/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index a9a9e4f2fe..ae03d535b4 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -55,7 +55,7 @@ PUT _ingest/pipeline/user-behavior { "append": { "field": "event_types", - "value": ["event_type"] + "value": ["page_view"] } } ] From e3616bc9f6663dd38785a37d289dc62bd04848a0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:37:33 -0600 Subject: [PATCH 219/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index ae03d535b4..596068d69c 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -65,7 +65,7 @@ PUT _ingest/pipeline/user-behavior **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From 3f8e587328a14bcf25b78ffa5c6895f7af354e17 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:49:36 -0600 Subject: [PATCH 220/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 596068d69c..df7412f85f 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -95,7 +95,6 @@ The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=user-behavior { - "event_type": "page_view" } ``` {% include copy-curl.html %} From 867a2ba97de6c1b8cec9c4c29b40981b4081a221 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:51:24 -0600 Subject: [PATCH 221/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index df7412f85f..68a5c7fcc5 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -108,7 +108,7 @@ GET testindex1/_doc/1 ``` {% include copy-curl.html %} -Because no `event_types` field is in the document, an array field is created and the event is appended to the array: +Because the document does not contain an `event_types` field, an array field is created and the event is appended to the array: ```json { From 8a7e49fdde1c6180bd244f5eeecaa3e034c51715 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 10:58:44 -0600 Subject: [PATCH 222/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 68a5c7fcc5..37e8705809 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -119,7 +119,6 @@ Because the document does not contain an `event_types` field, an array field is "_primary_term": 1, "found": true, "_source": { - "event_type": "page_view", "event_types": [ "page_view" ] From 1e235c7439d9e97a2d494323ca201f28a7fb2306 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:00:11 -0600 Subject: [PATCH 223/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 37e8705809..268d022c1b 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -128,7 +128,7 @@ Because the document does not contain an `event_types` field, an array field is #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From fd30afbd108c16b57aafe1221e9dd231d533012f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:00:51 -0600 Subject: [PATCH 224/286] Update _api-reference/ingest-apis/processors/bytes.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 480aa6e50d..e7c4e8f5a3 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -25,7 +25,7 @@ The following is the syntax for the `bytes` processor: The following table lists the required and optional parameters for the `bytes` processor. -**Parameter** | **Required** | **Description** | +Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. Supports template snippets.| `description` | Optional | Brief description of the processor. | From ae5ab17559aa17ddc5bf1a81764999d26f45ebcc Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:01:16 -0600 Subject: [PATCH 225/286] Update _api-reference/ingest-apis/processors/bytes.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index e7c4e8f5a3..6813eb5c88 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `bytes` p Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`field` | Required | The name of the field where the data should be converted. Supports template snippets. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From 9985ddde88c6fef3861f487f7b5d3c3c05c99d20 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:03:48 -0600 Subject: [PATCH 226/286] Update _api-reference/ingest-apis/processors/bytes.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index 6813eb5c88..eb52944f52 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -28,7 +28,7 @@ The following table lists the required and optional parameters for the `bytes` p Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | From b6b9f22c680ae60a1627e6e1fa284e1c188cca7d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:13:39 -0600 Subject: [PATCH 227/286] Update _api-reference/ingest-apis/processors/bytes.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/bytes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index eb52944f52..af8b792741 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -34,7 +34,7 @@ Parameter | Required | Description | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | Name of the field to store the parsed data in. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in place in the `field` field. Default is `field`. | ## Using the processor From e53d1ccdf0dc40871d8182173d6550ad8490843c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 11:19:12 -0600 Subject: [PATCH 228/286] Address editorial feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/create-ingest.md | 2 + .../ingest-apis/ingest-pipelines.md | 6 +- .../ingest-apis/ingest-processors.md | 2 +- .../ingest-apis/processors/append.md | 72 +++++++++---------- _api-reference/ingest-apis/simulate-ingest.md | 2 +- 5 files changed, 38 insertions(+), 46 deletions(-) diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md index 2110b2bd47..38e9b32b54 100644 --- a/_api-reference/ingest-apis/create-ingest.md +++ b/_api-reference/ingest-apis/create-ingest.md @@ -69,6 +69,8 @@ Parameter | Required | Type | Description :--- | :--- | :--- | :--- `pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline. +## Query parameters + Parameter | Required | Type | Description :--- | :--- | :--- | :--- `cluster_manager_timeout` | Optional | Time | Period to wait for a connection to the cluster manager node. Defaults to 30 seconds. diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index d8d8594864..ecaf6f574c 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -24,7 +24,7 @@ The following are prerequisites for using OpenSearch ingest pipelines: ## Define a pipeline -A _pipeline definition_ describes the steps involved in an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: +A _pipeline definition_ describes the sequence of an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: ```json { @@ -33,9 +33,7 @@ A _pipeline definition_ describes the steps involved in an ingest pipeline and c } ``` -Alternatively, you can specify a pipeline directly in the request body without creating a pipeline first. - -### Request body fields +### ### Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 31c7193074..43708c2ec1 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -20,4 +20,4 @@ GET /_nodes/ingest To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. {:.note} -The following is a generic example of an ingest processor definition within a pipeline. Processor types and their required or optional parameters vary depending on your specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. +Processor types and their required or optional parameters vary depending on your specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 268d022c1b..90282e5178 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -31,10 +31,10 @@ The following table lists the required and optional parameters for the `append` Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | The name of the field where the data should be appended. Supports template snippets.| -`value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`field` | Required | The name of the field to which the data should be appended. Supports template snippets.| +`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | +`description` | Optional | Brief description of the processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | @@ -45,7 +45,7 @@ Follow these steps to use the processor in a pipeline. **Step 1: Create a pipeline.** -The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `event_type` of each new document ingested into OpenSearch to an array field named `event_types`: +The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `page_view` of each new document ingested into OpenSearch to an array field named `event_types`: ```json PUT _ingest/pipeline/user-behavior @@ -72,21 +72,39 @@ To test the pipeline, run the following query: ```json POST _ingest/pipeline/user-behavior/_simulate +{ + "docs":[ + { + "_source":{ + } + } + ] +} +``` +{% include copy-curl.html %} + +#### Reponse + +The following response confirms that the pipeline is working as expected: + { "docs": [ { - "_index": "testindex1", - "_id": "1", - "_source": { - "event_type": "page_view", - "event_types": - "event_type" + "doc": { + "_index": "_index", + "_id": "_id", + "_source": { + "event_types": [ + "page_view" + ] + }, + "_ingest": { + "timestamp": "2023-08-28T16:55:10.621805166Z" + } } } ] } -``` -{% include copy-curl.html %} **Step 3: Ingest a document.** @@ -95,6 +113,7 @@ The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=user-behavior { + "event_types": "page_view" } ``` {% include copy-curl.html %} @@ -125,30 +144,3 @@ Because the document does not contain an `event_types` field, an array field is } } ``` - -#### Response - -The following example response confirms that the pipeline is working correctly and producing the expected output: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "event_type": "page_view", - "event_types": [ - "event_type", - "event_type" - ] - }, - "_ingest": { - "timestamp": "2023-08-22T16:02:37.893458209Z" - } - } - } - ] -} -``` diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 9c2936c153..a1b89bb68b 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -191,7 +191,7 @@ When the previous request is run with the `verbose` parameter set to `true`, the ### Example: Specify a pipeline in the request body -Alternatively, you can specify a pipeline directly in the request body without creating a pipeline first: +Alternatively, you can specify a pipeline directly in the request body without first creating a pipeline: ```json POST /_ingest/pipeline/_simulate From f5881070825aa2781c20255c0fcc8de4184c5f97 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:46:24 -0600 Subject: [PATCH 229/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 676555f8b1..886aa4de31 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -24,7 +24,7 @@ The `convert` processor converts a field in a document to a different type, for The following table lists the required and optional parameters for the `convert` processor. -**Parameter** | **Required** | **Description** | +Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. Supports template snippets.| `type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case), and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | From da0a4b65ddb178fcf89d6abdcad31212523d2f09 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:48:21 -0600 Subject: [PATCH 230/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 886aa4de31..fb28b0a8f7 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `convert` Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`field` | Required | The name of the field where the data should be converted. Supports template snippets. | `type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case), and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | From 07ac669ade405a2591a6c4bcc87f871dabd32c14 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:48:48 -0600 Subject: [PATCH 231/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index fb28b0a8f7..75308e4562 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `convert` Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | -`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case), and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | +`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From bf9b669892b25ad9dd0dee21fa06b1edac3ce493 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:49:08 -0600 Subject: [PATCH 232/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 75308e4562..5202803525 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -28,7 +28,7 @@ Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | From 1aa34dcef5a34d5010c8008c7abad117de6d034f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:54:50 -0600 Subject: [PATCH 233/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 5202803525..e6ef3e46a9 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -34,7 +34,7 @@ Parameter | Required | Description | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | Name of the field to store the parsed data in. If not specified, the value will be stored in-place in the `field` field. Default is `field`. | +`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in place in the `field` field. Default is `field`. | ## Using the processor From a166ab1b0cba11d45cd004f2b5dc3f7cc2a33c93 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:58:15 -0600 Subject: [PATCH 234/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index e6ef3e46a9..63f1342f45 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -42,7 +42,7 @@ Follow these steps to use the processor in a pipeline. **Step 1: Create a pipeline.** -The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number and stores the converted value in the `price_float` field and sets the value to `0` if it is less than `0`: +The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number, stores the converted value in the `price_float` field, and sets the value to `0` if it is less than `0`: ```json PUT _ingest/pipeline/convert-price From e19c4063654e968dde75af9770312082ffb5747e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:58:47 -0600 Subject: [PATCH 235/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 63f1342f45..63e9636049 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -70,7 +70,7 @@ PUT _ingest/pipeline/convert-price **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From 9ae3dc7d7a473dc5497cb6979acb54afffa735ac Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 12:47:32 -0600 Subject: [PATCH 236/286] Address editorial feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/ingest-pipelines.md | 4 +- .../ingest-apis/ingest-processors.md | 2 +- .../ingest-apis/pipeline-failures.md | 2 +- .../ingest-apis/processors/append.md | 6 +- .../ingest-apis/processors/bytes.md | 66 +++++-------------- 5 files changed, 26 insertions(+), 54 deletions(-) diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md index ecaf6f574c..38ea3fc7d5 100644 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ b/_api-reference/ingest-apis/ingest-pipelines.md @@ -33,7 +33,7 @@ A _pipeline definition_ describes the sequence of an ingest pipeline and can be } ``` -### ### Request body fields +### Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- @@ -45,6 +45,6 @@ Field | Required | Type | Description Learn how to: - [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/). -- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/). - [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/). +- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/). - [Delete a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/). diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md index 43708c2ec1..5a9a5e0d41 100644 --- a/_api-reference/ingest-apis/ingest-processors.md +++ b/_api-reference/ingest-apis/ingest-processors.md @@ -13,7 +13,7 @@ Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site. OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: ```json -GET /_nodes/ingest +GET /_nodes/ingest?filter_path=nodes.*.ingest.processors ``` {% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md index 6dc4d02235..f8814f39c2 100644 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ b/_api-reference/ingest-apis/pipeline-failures.md @@ -37,7 +37,7 @@ You can specify the `on_failure` parameter to run immediately after a processor ```json PUT _ingest/pipeline/my-pipeline/ { - "description": "Add timestampto the document", + "description": "Add timestamp to the document", "processors": [ { "date": { diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 90282e5178..5d496be09a 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -32,8 +32,8 @@ The following table lists the required and optional parameters for the `append` Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field to which the data should be appended. Supports template snippets.| -`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`value` | Required | The value to be appended. This can be a static value or a dynamic value derived from existing fields. Supports template snippets. | +`description` | Optional | A brief description of the processor. | `if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | @@ -87,6 +87,7 @@ POST _ingest/pipeline/user-behavior/_simulate The following response confirms that the pipeline is working as expected: +```json { "docs": [ { @@ -105,6 +106,7 @@ The following response confirms that the pipeline is working as expected: } ] } +``` **Step 3: Ingest a document.** diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md index af8b792741..329a657911 100644 --- a/_api-reference/ingest-apis/processors/bytes.md +++ b/_api-reference/ingest-apis/processors/bytes.md @@ -29,7 +29,7 @@ Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | @@ -42,7 +42,7 @@ Follow these steps to use the processor in a pipeline. **Step 1: Create a pipeline.** -The following query creates a pipeline, named `file_upload`, that has one bytes processor. It converts the `file_size` to its byte equivalent and stores it in a new field `file_size_bytes`: +The following query creates a pipeline, named `file_upload`, that has one `bytes` processor. It converts the `file_size` to its byte equivalent and stores it in a new field named `file_size_bytes`: ```json PUT _ingest/pipeline/file_upload @@ -62,13 +62,13 @@ PUT _ingest/pipeline/file_upload **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: ```json -POST _ingest/pipeline/user-behavior/_simulate +POST _ingest/pipeline/file_upload/_simulate { "docs": [ { @@ -85,30 +85,9 @@ POST _ingest/pipeline/user-behavior/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=file_upload -{ - "file_size": "10MB" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -#### Response +#### Reponse -The following example response confirms the pipeline is working correctly and producing the expected output: +The following response confirms that the pipeline is working as expected: ```json { @@ -133,32 +112,23 @@ The following example response confirms the pipeline is working correctly and pr } ``` -## Using optional parameters +**Step 3: Ingest a document.** -The following query creates a pipeline with the bytes processor and one optional parameter, `on_failure`, which uses the `set` processor to set the `error` field with a specific error message: +The following query ingests a document into an index named `testindex1`: ```json -PUT _ingest/pipeline/file_upload +PUT testindex1/_doc/1?pipeline=file_upload { - "description": "Pipeline that converts file size to bytes", - "processors": [ - { - "bytes": { - "field": "file_size", - "target_field": "file_size_bytes", - "on_failure": [ - { - "set": { - "field": "error", - "value": "Failed to convert" - } - } - ] - } - } - ] + "file_size": "10MB" } ``` {% include copy-curl.html %} -Repeat steps 2--4 to confirm the pipeline is working as expected. +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From bde9ba95802c284b4f498fd9cb8b40541b054712 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:01:06 -0600 Subject: [PATCH 237/286] Address editorial changes Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index 63e9636049..d09369566e 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -26,10 +26,10 @@ The following table lists the required and optional parameters for the `convert` Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | The name of the field where the data should be converted. Supports template snippets. | +`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | `type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | From 9e62879a1d4758ae469f187ddd4ea25b9e972c7d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:04:24 -0600 Subject: [PATCH 238/286] Update _api-reference/ingest-apis/processors/convert.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/convert.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index d09369566e..a94a40fbd5 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -114,7 +114,7 @@ GET testindex1/_doc/1 #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From be5056c911b1619986a41cf1eb8cb21bbacd5f04 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:04:44 -0600 Subject: [PATCH 239/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 838fc46d56..a16c318d12 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -8,7 +8,7 @@ nav_order: 40 # CSV -The `csv` processor is used to parse comma-separated values (CSV) and store them as individual fields in a document. The processor ignores empty fields. The following is the syntax for the `csv` processor: +The `csv` processor is used to parse CSVs and store them as individual fields in a document. The processor ignores empty fields. The following is the syntax for the `csv` processor: ```json { From 8212b0a43ce4e9baa267bc7f6bb3128dc6aafdd6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:06:40 -0600 Subject: [PATCH 240/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index a16c318d12..1a9c26737e 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -24,7 +24,7 @@ The `csv` processor is used to parse CSVs and store them as individual fields in The following table lists the required and optional parameters for the `csv` processor. -**Parameter** | **Required** | **Description** | +Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | Name of the field where the data should be converted. Supports template snippets.| `target_fields` | Required | Name of the field to store the parsed data in. | From 482b8544a5f884d7924763ce986ab43963c957f8 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:07:07 -0600 Subject: [PATCH 241/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 1a9c26737e..fb0e969b86 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `csv` pro Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`field` | Required | The name of the field where the data should be converted. Supports template snippets. | `target_fields` | Required | Name of the field to store the parsed data in. | `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | Brief description of the processor. | From c9310ca7a39917a1dbb609c29f28383a46040977 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:07:23 -0600 Subject: [PATCH 242/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index fb0e969b86..b9f82c61c2 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -27,7 +27,7 @@ The following table lists the required and optional parameters for the `csv` pro Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | -`target_fields` | Required | Name of the field to store the parsed data in. | +`target_fields` | Required | The name of the field in which to store the parsed data. | `value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | Brief description of the processor. | `empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | From 9967966bcc229db96dbf1bcc26a6d70829d5cf22 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:07:47 -0600 Subject: [PATCH 243/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index b9f82c61c2..6eee697d55 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -28,7 +28,7 @@ Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `target_fields` | Required | The name of the field in which to store the parsed data. | -`value` | Required | Value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | +`value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | Brief description of the processor. | `empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | `if` | Optional | Condition to run this processor. | From 042207e7653b3bde73417b2214b4a652c7f6c24c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:08:19 -0600 Subject: [PATCH 244/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 6eee697d55..8f8a2718ad 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -29,7 +29,7 @@ Parameter | Required | Description | `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `target_fields` | Required | The name of the field in which to store the parsed data. | `value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From ca48924d9e8d65158aa1a8a5707370b120ca2212 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:08:51 -0600 Subject: [PATCH 245/286] Update _api-reference/ingest-apis/processors/csv.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 8f8a2718ad..d7af61f195 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -30,7 +30,7 @@ Parameter | Required | Description | `target_fields` | Required | The name of the field in which to store the parsed data. | `value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | A brief description of the processor. | -`empty_value` | Optional | Represents optional parameters that are not required to be present or are not applicable. | +`empty_value` | Optional | Represents optional parameters that are not required or are not applicable. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | From c282647e99e771ab941dca096486dce2784594b9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:05:05 -0600 Subject: [PATCH 246/286] Address editorial feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/convert.md | 46 +++++++++---------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md index a94a40fbd5..5b12c8e931 100644 --- a/_api-reference/ingest-apis/processors/convert.md +++ b/_api-reference/ingest-apis/processors/convert.md @@ -34,7 +34,7 @@ Parameter | Required | Description | `ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in place in the `field` field. Default is `field`. | +`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in the `field` field. Default is `field`. | ## Using the processor @@ -91,30 +91,9 @@ POST _ingest/pipeline/convert-price/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=convert-price -{ - "price": "10.5" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { @@ -135,3 +114,24 @@ The following example response confirms that the pipeline is working correctly a ] } ``` + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=convert-price +{ + "price": "10.5" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From 2ade25c36b113523f58fe0549fe001c5c4721ca3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:21:27 -0600 Subject: [PATCH 247/286] Address editorial feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 57 ++++++++++---------- 1 file changed, 28 insertions(+), 29 deletions(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index d7af61f195..2652758226 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -26,19 +26,18 @@ The following table lists the required and optional parameters for the `csv` pro Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | The name of the field where the data should be converted. Supports template snippets. | +`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | `target_fields` | Required | The name of the field in which to store the parsed data. | -`value` | Required | The value to be appended. This can be a static value, a dynamic value derived from existing fields, or a value obtained from external lookups. Supports template snippets. | `description` | Optional | A brief description of the processor. | `empty_value` | Optional | Represents optional parameters that are not required or are not applicable. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | `on_failure` | Optional | A list of processors to run if the processor fails. | -`quote` | Optional | The character used to quote fields in the CSV data. | -`separator` | Optional | The delimiter used to separate the fields in the CSV data. | +`quote` | Optional | The character used to quote fields in the CSV data. Default is `"`. | +`separator` | Optional | The delimiter used to separate the fields in the CSV data. Default is `,`. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`trim` | Optional | If set to `true`, the processor trims whitespace from the beginning and end of the text. Default is `false`. | +`trim` | Optional | If set to `true`, the processor trims white space from the beginning and end of the text. Default is `false`. | ## Using the processor @@ -67,7 +66,7 @@ PUT _ingest/pipeline/csv-processor **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: @@ -91,30 +90,9 @@ POST _ingest/pipeline/csv-processor/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=csv-processor -{ - "resource_usage": "25,4096,10" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { @@ -137,3 +115,24 @@ The following example response confirms the pipeline is working correctly and pr ] } ``` + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=csv-processor +{ + "resource_usage": "25,4096,10" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From 60af276874b9d1098d41be9750c59095a5971902 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:22:17 -0600 Subject: [PATCH 248/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 89168d8e58..8d67a31747 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -8,7 +8,7 @@ nav_order: 50 # Date -The `date` processor is used to parse dates from fields in a document and add the parsed data to a new field. By default, the parsed data is stored in the `@timestamp` field. The following is the syntax for the `date` processor: +The `date` processor is used to parse dates from document fields and to add the parsed data to a new field. By default, the parsed data is stored in the `@timestamp` field. The following is the syntax for the `date` processor: ```json { From ce1030c29fda9116387b1addfa0ac61f3ab419a7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:22:32 -0600 Subject: [PATCH 249/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 8d67a31747..69a299c544 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `date` pr **Parameter** | **Required** | **Description** | |-----------|-----------|-----------| -`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`field` | Required | The name of the field where the data should be converted. Supports template snippets. | `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | From 2fc787047b4a2aa3a23d0ee3040e2bec0807422f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:28:46 -0600 Subject: [PATCH 250/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 69a299c544..df189b52fe 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -24,7 +24,7 @@ The `date` processor is used to parse dates from document fields and to add the The following table lists the required and optional parameters for the `date` processor. -**Parameter** | **Required** | **Description** | +Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | From eecc24a0a5046464a69d1f2fb204ac0eb4e01fb6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:29:08 -0600 Subject: [PATCH 251/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index df189b52fe..b5bbde110f 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -28,7 +28,7 @@ Parameter | Required | Description | |-----------|-----------|-----------| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | From 88a437391b6d088104726768084e8aae7c7206b7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:29:27 -0600 Subject: [PATCH 252/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index b5bbde110f..9de25fa755 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -35,7 +35,7 @@ Parameter | Required | Description | `on_failure` | Optional | A list of processors to run if the processor fails. | `output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | Name of the field to store the parsed data in. Default target field is `@timestamp`. | +`target_field` | Optional | The name of the field in which to store the parsed data. Default target field is `@timestamp`. | `timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets.| ## Using the processor From 0e92ea400da5caaf16acaa7d2e7f7558886fd1bd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:29:50 -0600 Subject: [PATCH 253/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 9de25fa755..db19e72d45 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -36,7 +36,7 @@ Parameter | Required | Description | `output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | `target_field` | Optional | The name of the field in which to store the parsed data. Default target field is `@timestamp`. | -`timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets.| +`timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets. | ## Using the processor From e23e4d76899004c1678f7241c850d904680a060d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:30:07 -0600 Subject: [PATCH 254/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index db19e72d45..05301c87bf 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -67,7 +67,7 @@ PUT /_ingest/pipeline/date-output-format **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From 11a591e25514b10f3acc01e7b91eaac491fb1248 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:30:28 -0600 Subject: [PATCH 255/286] Update _api-reference/ingest-apis/processors/date.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index 05301c87bf..ab5cff4e7d 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -112,7 +112,7 @@ GET testindex1/_doc/1 #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From 0f5b034d11af4196cb73e37d4ccc951201a2ee44 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:30:52 -0600 Subject: [PATCH 256/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 1f88ae17a0..39a7634b30 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -8,7 +8,7 @@ nav_order: 210 # Lowercase -This processor converts all the text in a specific field to lowercase letters. The following is the syntax for the `lowercase` processor: +The `lowercase` processor converts all the text in a specific field to lowercase letters. The following is the syntax for the `lowercase` processor: ```json { From 749a54965231dcd3c33be76fdde83c48c4e5b9a8 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:31:10 -0600 Subject: [PATCH 257/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 39a7634b30..6489077095 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -25,7 +25,7 @@ The following table lists the required and optional parameters for the `lowercas | Name | Required | Description | |---|---|---| -`field` | Required | Name of the field where the data should be converted. Supports template snippets.| +`field` | Required | The name of the field where the data should be converted. Supports template snippets. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From 4beaca84920bc9e5df5aafe1030636348c4abc64 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:31:28 -0600 Subject: [PATCH 258/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 6489077095..b9608071c9 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `lowercas | Name | Required | Description | |---|---|---| `field` | Required | The name of the field where the data should be converted. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | From 04f4facff2988cdb6ebaf4d40a803aa52b37772a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:31:47 -0600 Subject: [PATCH 259/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index b9608071c9..00f10b0979 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `lowercas `on_failure` | Optional | A list of processors to run if the processor fails. | `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | Name of the field to store the parsed data in. Default is `field`. By default, `field` is updated in-place. | +`target_field` | Optional | The name of the field in which to store the parsed data. Default is `field`. By default, `field` is updated in place. | ## Using the processor From 7ce28405a159d94e11d6e5226141d209971d5364 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:32:08 -0600 Subject: [PATCH 260/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 00f10b0979..b8e513d9de 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -59,7 +59,7 @@ PUT _ingest/pipeline/lowercase-title **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From d3dc5cfb0b93be67a45610a0b007aaa6494ace4d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:32:36 -0600 Subject: [PATCH 261/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index b8e513d9de..b179755826 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -72,7 +72,7 @@ POST _ingest/pipeline/lowercase-title/_simulate "_index": "testindex1", "_id": "1", "_source": { - "title": "war and peace" + "title": "WAR AND PEACE" } } ] From 5c34ca7d11dca0c3fae84bf5791260a634a8947d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:33:04 -0600 Subject: [PATCH 262/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index b179755826..0fda2a3852 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -94,7 +94,7 @@ PUT testindex1/_doc/1?pipeline=lowercase-title **Step 4 (Optional): Retrieve the document.** -To view an ingested document, run the following query: +To retrieve the document, run the following query: ```json GET testindex1/_doc/1 From 3e3fa5b35f0a35a94e400fee4eddf107999b65fd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:33:25 -0600 Subject: [PATCH 263/286] Update _api-reference/ingest-apis/processors/lowercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 0fda2a3852..020f1e1ebd 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -103,7 +103,7 @@ GET testindex1/_doc/1 #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From eb258db04a7a178d819fd457e3e86eb6df53a4c2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:36:21 -0600 Subject: [PATCH 264/286] Address editorial feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/lowercase.md | 48 +++++++++---------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md index 020f1e1ebd..535875ff7d 100644 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ b/_api-reference/ingest-apis/processors/lowercase.md @@ -25,9 +25,9 @@ The following table lists the required and optional parameters for the `lowercas | Name | Required | Description | |---|---|---| -`field` | Required | The name of the field where the data should be converted. Supports template snippets. | +`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | @@ -80,30 +80,9 @@ POST _ingest/pipeline/lowercase-title/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=lowercase-title -{ - "title": "WAR AND PEACE" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { @@ -123,3 +102,24 @@ The following example response confirms that the pipeline is working correctly a ] } ``` + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=lowercase-title +{ + "title": "WAR AND PEACE" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From bf8e9a625b3a808555ed5f7c03a1f22e3cf4bb0a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:36:47 -0600 Subject: [PATCH 265/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index e07bf9cbfa..0f1e2d57f9 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -8,7 +8,7 @@ nav_order: 230 # Remove -The remove processor is used to remove a field from a document. The following is the syntax for the `remove` processor: +The `remove` processor is used to remove a field from a document. The following is the syntax for the `remove` processor: ```json { From ecd54df843484786a47bf83e541a09a6bca6b645 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:37:11 -0600 Subject: [PATCH 266/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 0f1e2d57f9..ea14e2090b 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -25,7 +25,7 @@ The following table lists the required and optional parameters for the `remove` | Name | Required | Description | |---|---|---| -`field` | Required | Name of the field where the data should be appended. Supports template snippets.| +`field` | Required | The name of the field where the data should be appended. Supports template snippets. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From 1853ec221a727c5ba99c3da9ade94bed1cc74c6b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:37:40 -0600 Subject: [PATCH 267/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi From b528834b485d690d00cbad8fb40cfc4c7f37fd45 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:38:02 -0600 Subject: [PATCH 268/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index ea14e2090b..b3fdb18f65 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `remove` | Name | Required | Description | |---|---|---| `field` | Required | The name of the field where the data should be appended. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | From 1a32fc94e0fa2752c5359b1357b3def6c3e578a5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:38:32 -0600 Subject: [PATCH 269/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index b3fdb18f65..6a96d5664b 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -57,7 +57,7 @@ PUT /_ingest/pipeline/remove_ip **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From ef800803eae1deff7a1fb6d475dc0e6f721483c0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:38:56 -0600 Subject: [PATCH 270/286] Update _api-reference/ingest-apis/processors/remove.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/remove.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 6a96d5664b..3d53e9846e 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -103,7 +103,7 @@ GET testindex1/_doc/1 #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From 9ff0234312ebc203b121e468f32ee868759fe236 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:44:03 -0600 Subject: [PATCH 271/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/remove.md | 50 +++++++++---------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md index 3d53e9846e..db233a0b08 100644 --- a/_api-reference/ingest-apis/processors/remove.md +++ b/_api-reference/ingest-apis/processors/remove.md @@ -25,9 +25,9 @@ The following table lists the required and optional parameters for the `remove` | Name | Required | Description | |---|---|---| -`field` | Required | The name of the field where the data should be appended. Supports template snippets. | +`field` | Required | The name of the field to which the data should be appended. Supports template snippets. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | @@ -79,31 +79,9 @@ POST _ingest/pipeline/remove_ip/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PPUT testindex1/_doc/1?pipeline=remove_ip -{ - "ip_address": "203.0.113.1", - "name": "John Doe" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { @@ -123,3 +101,25 @@ The following example response confirms that the pipeline is working correctly a ] } ``` + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PPUT testindex1/_doc/1?pipeline=remove_ip +{ + "ip_address": "203.0.113.1", + "name": "John Doe" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From 98726c70bd804b4d287082faac824263519ed15e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:44:32 -0600 Subject: [PATCH 272/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 9c82faa462..08b144cee7 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -8,7 +8,7 @@ nav_order: 310 # Uppercase -This processor converts all the text in a specific field to uppercase letters. The following is the syntax for the `uppercase` processor: +The `uppercase` processor converts all the text in a specific field to uppercase letters. The following is the syntax for the `uppercase` processor: ```json { From 93d3028891075669e79cab836d665357bd7e4c8b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:44:47 -0600 Subject: [PATCH 273/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 08b144cee7..00af1bde1e 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -25,7 +25,7 @@ The following table lists the required and optional parameters for the `uppercas | Name | Required | Description | |---|---|---| -`field` | Required | Name of the field where the data should be appended. Supports template snippets.| +`field` | Required | The name of the field where the data should be appended. Supports template snippets. | `description` | Optional | Brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | From d2e6261fe5731d7c9af0c5a70ea914b27a46f6bb Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:45:03 -0600 Subject: [PATCH 274/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 00af1bde1e..364dc7f5ff 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -26,7 +26,7 @@ The following table lists the required and optional parameters for the `uppercas | Name | Required | Description | |---|---|---| `field` | Required | The name of the field where the data should be appended. Supports template snippets. | -`description` | Optional | Brief description of the processor. | +`description` | Optional | A brief description of the processor. | `if` | Optional | Condition to run this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | From 115ca458e9fab26c827a0aab70b11c29684fd8f9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:45:22 -0600 Subject: [PATCH 275/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 364dc7f5ff..8fc106ff17 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `uppercas `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | `tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | Name of the field to store the parsed data in. Default is `field`. By default, `field` is updated in-place. | +`target_field` | Optional | The name of the field in which to store the parsed data. Default is `field`. By default, `field` is updated in place. | ## Using the processor From 55b45ac42ebb21d5d95092de60bb4215b9d57a90 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:45:41 -0600 Subject: [PATCH 276/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 8fc106ff17..fc44b9c976 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -59,7 +59,7 @@ PUT _ingest/pipeline/uppercase **Step 2 (Optional): Test the pipeline.** -It is recommended that you test a pipeline before you ingest documents. +It is recommended that you test your pipeline before you ingest documents. {: .tip} To test the pipeline, run the following query: From 3869d730ddf2fbc978d806fa730040e3e3db6883 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:46:11 -0600 Subject: [PATCH 277/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index fc44b9c976..cc21dbac18 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -72,7 +72,7 @@ POST _ingest/pipeline/uppercase/_simulate "_index": "testindex1", "_id": "1", "_source": { - "name": "{John}" + "name": "John" } } ] From 477b5b5f2f09077d5b52e7951a0718cd4459637c Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:48:58 -0600 Subject: [PATCH 278/286] Update _api-reference/ingest-apis/processors/uppercase.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index cc21dbac18..b818197162 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -103,7 +103,7 @@ GET testindex1/_doc/1 #### Response -The following example response confirms the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working correctly and producing the expected output: ```json { From 2d6281add030039810db050199232ed706d23009 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:53:55 -0600 Subject: [PATCH 279/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/uppercase.md | 40 +++++++++---------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index b818197162..dfadab4108 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -25,9 +25,9 @@ The following table lists the required and optional parameters for the `uppercas | Name | Required | Description | |---|---|---| -`field` | Required | The name of the field where the data should be appended. Supports template snippets. | +`field` | Required | The name of the field to which the data should be appended. Supports template snippets. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | `on_failure` | Optional | A list of processors to run if the processor fails. | @@ -80,6 +80,24 @@ POST _ingest/pipeline/uppercase/_simulate ``` {% include copy-curl.html %} +#### Response + +The following example response confirms that the pipeline is working correctly and producing the expected output: + +```json +{ + "_index": "testindex1", + "_id": "1", + "_version": 44, + "_seq_no": 43, + "_primary_term": 3, + "found": true, + "_source": { + "name": "JOHN" + } +} +``` + **Step 3: Ingest a document.** The following query ingests a document into an index named `testindex1`: @@ -100,21 +118,3 @@ To retrieve the document, run the following query: GET testindex1/_doc/1 ``` {% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working correctly and producing the expected output: - -```json -{ - "_index": "testindex1", - "_id": "1", - "_version": 44, - "_seq_no": 43, - "_primary_term": 3, - "found": true, - "_source": { - "name": "JOHN" - } -} -``` \ No newline at end of file From 944522900f9601d1c9b99868f6a88472a90fa054 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 13:55:07 -0600 Subject: [PATCH 280/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- .../ingest-apis/processors/uppercase.md | 23 +++++++++++-------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index dfadab4108..314c1785d0 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -86,15 +86,20 @@ The following example response confirms that the pipeline is working correctly a ```json { - "_index": "testindex1", - "_id": "1", - "_version": 44, - "_seq_no": 43, - "_primary_term": 3, - "found": true, - "_source": { - "name": "JOHN" - } + "docs": [ + { + "doc": { + "_index": "testindex1", + "_id": "1", + "_source": { + "name": "JOHN" + }, + "_ingest": { + "timestamp": "2023-08-28T19:54:42.289624792Z" + } + } + } + ] } ``` From f7d8a6da25bfa74ee2665ac17427a7581b03e566 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 14:00:55 -0600 Subject: [PATCH 281/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 46 +++++++++---------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index ab5cff4e7d..e2aded8daf 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -26,10 +26,10 @@ The following table lists the required and optional parameters for the `date` pr Parameter | Required | Description | |-----------|-----------|-----------| -`field` | Required | The name of the field where the data should be converted. Supports template snippets. | +`field` | Required | The name of the field to which the data should be converted. Supports template snippets. | `formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | `description` | Optional | A brief description of the processor. | -`if` | Optional | Condition to run this processor. | +`if` | Optional | A condition for running this processor. | `ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | `locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | `on_failure` | Optional | A list of processors to run if the processor fails. | @@ -89,27 +89,6 @@ POST _ingest/pipeline/date-output-format/_simulate ``` {% include copy-curl.html %} -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=date-output-format -{ - "date_european": "30/06/2023" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - #### Response The following example response confirms that the pipeline is working correctly and producing the expected output: @@ -133,3 +112,24 @@ The following example response confirms that the pipeline is working correctly a ] } ``` + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `testindex1`: + +```json +PUT testindex1/_doc/1?pipeline=date-output-format +{ + "date_european": "30/06/2023" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET testindex1/_doc/1 +``` +{% include copy-curl.html %} From 09bc013767fc7316a57e7d66cb1ad6625f7e5444 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 14:06:04 -0600 Subject: [PATCH 282/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/simulate-ingest.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index a1b89bb68b..9ca40b791c 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -123,7 +123,7 @@ The request returns the following response: ### Example: Verbose mode -When the previous request is run with the `verbose` parameter set to `true`, the response shows the sequence of transformations made on each document. For example, for the document with the ID `1`, the response contains the results of applying each processor in the pipeline in turn: +When the previous request is run with the `verbose` parameter set to `true`, the response shows the sequence of transformations for each document. For example, for the document with the ID `1`, the response contains the results of applying each processor in the pipeline in sequence: ```json { @@ -235,6 +235,8 @@ POST /_ingest/pipeline/_simulate ``` {% include copy-curl.html %} +#### Response + The request returns the following response: ```json From 3cf22cde00d917bed2ac2706ee8fe6fab96fbc86 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 14:28:12 -0600 Subject: [PATCH 283/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md index 2652758226..e4009e162b 100644 --- a/_api-reference/ingest-apis/processors/csv.md +++ b/_api-reference/ingest-apis/processors/csv.md @@ -92,7 +92,7 @@ POST _ingest/pipeline/csv-processor/_simulate #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { From 74f887b87e02937758ee93a55ff85b6fbecd0df2 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 14:29:10 -0600 Subject: [PATCH 284/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/date.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md index e2aded8daf..46e9b9115f 100644 --- a/_api-reference/ingest-apis/processors/date.md +++ b/_api-reference/ingest-apis/processors/date.md @@ -91,7 +91,7 @@ POST _ingest/pipeline/date-output-format/_simulate #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { From 36f3c5361cc4ad4bd2d38b21a3f60f1f67773917 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 28 Aug 2023 14:30:41 -0600 Subject: [PATCH 285/286] Address editorial review feedback Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/uppercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md index 314c1785d0..6ea5ebb137 100644 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ b/_api-reference/ingest-apis/processors/uppercase.md @@ -82,7 +82,7 @@ POST _ingest/pipeline/uppercase/_simulate #### Response -The following example response confirms that the pipeline is working correctly and producing the expected output: +The following example response confirms that the pipeline is working as expected: ```json { From 7e9c29dbc986093244fc8611c7bfc147ba877660 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 29 Aug 2023 14:01:21 -0600 Subject: [PATCH 286/286] Update _api-reference/ingest-apis/processors/append.md Co-authored-by: Heemin Kim Signed-off-by: Melissa Vagi --- _api-reference/ingest-apis/processors/append.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md index 5d496be09a..d0f2363ce5 100644 --- a/_api-reference/ingest-apis/processors/append.md +++ b/_api-reference/ingest-apis/processors/append.md @@ -115,7 +115,6 @@ The following query ingests a document into an index named `testindex1`: ```json PUT testindex1/_doc/1?pipeline=user-behavior { - "event_types": "page_view" } ``` {% include copy-curl.html %}