diff --git a/.github/vale/styles/OpenSearch/AcronymParentheses.yml b/.github/vale/styles/OpenSearch/AcronymParentheses.yml index bebef2c77d..8fbd3cf761 100644 --- a/.github/vale/styles/OpenSearch/AcronymParentheses.yml +++ b/.github/vale/styles/OpenSearch/AcronymParentheses.yml @@ -5,13 +5,12 @@ level: warning scope: summary ignorecase: false # Ensures that the existence of 'first' implies the existence of 'second'. -first: '\b((? **Extension Settings** and select **suggestion** in the **Vale > Vale CLI: Min Alert Level** dropdown list. - -## Getting help - -For help with the contribution process, reach out to one of the [points of contact](README.md#points-of-contact). \ No newline at end of file +See the [LICENSE](LICENSE) file for our project's licensing. diff --git a/FORMATTING_GUIDE.md b/FORMATTING_GUIDE.md index 40536444a5..fc18d2fa4f 100644 --- a/FORMATTING_GUIDE.md +++ b/FORMATTING_GUIDE.md @@ -9,7 +9,6 @@ This guide provides an overview of the formatted elements commonly used in the O * [Adding pages or sections](#adding-pages-or-sections) * [Buttons](#buttons) * [Callouts](#callouts) -* [Collapsible blocks](#collapsible-blocks) * [Dashes](#dashes) * [Horizontal rule](#horizontal-rule) * [Images](#images) @@ -31,23 +30,6 @@ This guide provides an overview of the formatted elements commonly used in the O ## Adding pages or sections -This repository contains [Markdown](https://guides.github.com/features/mastering-markdown/) files organized into Jekyll _collections_ (for example, `_api-reference` or `_dashboards`). Each Markdown file corresponds to one page on the website. - -In addition to the content for a given page, each Markdown file contains some Jekyll [front matter](https://jekyllrb.com/docs/front-matter/) similar to the following: - -``` ---- -layout: default -title: Date -nav_order: 25 -has_children: false -parent: Date field types -grand_parent: Supported field types ---- -``` - -If you want to reorganize content or add a new page, make sure to set the appropriate `has_children`, `parent`, `grand_parent`, and `nav_order` variables, which define the hierarchy of pages in the left navigation. - When adding a page or a section, make the `nav_order` of the child pages multiples of 10. For example, if you have a parent page `Clients`, make child pages `Java`, `Python`, and `JavaScript` have a `nav_order` of 10, 20, and 30, respectively. Doing so makes inserting additional child pages easier because it does not require you to renumber existing pages. Each collection must have an `index.md` file that corresponds to the collection's index page. In the `index.md` file's front matter, specify `nav_excluded: true` so that the page does not appear separately under the collection. @@ -109,31 +91,6 @@ For a callout with multiple paragraphs or lists, use `>`: ``` -## Collapsible blocks - -To insert a collapsible block, use the `
` element as follows: - -````html -
- - Response - - {: .text-delta} - -```json -{ - "_nodes" : { - "total" : 1, - "successful" : 1, - "failed" : 0 - } -} -``` -
-```` - -Collapsible blocks are useful for long responses and for the Table of Contents at the beginning of a page. - ## Dashes Use one dash for hyphens, two for en dashes, and three for em dashes: @@ -422,9 +379,7 @@ Body 1 | List:
 • One
 • Two You can style text in the following ways: * ```**bold**``` -* ```_italic_``` or ```*italic*``` - -For guidance on using code examples and when to use code font, see [Code examples](https://github.com/opensearch-project/documentation-website/blob/main/STYLE_GUIDE.md#code-examples). +* ```_italic_``` or ```*italic*``` ## Variables in curly braces @@ -446,4 +401,4 @@ To insert a video, add a YouTube player include similar to the following: {% include youtube-player.html id='_g46WiGPhFs' %} ``` -Note that the `id` variable refers to the YouTube video ID at the end of the URL. For example, the YouTube video at the URL `https://youtu.be/_g46WiGPhFs` has the ID `_g46WiGPhFs`. The ID must be surrounded with single quotation marks. +Note that the `id` variable refers to the YouTube video ID at the end of the URL. For example, the YouTube video at the URL `https://youtu.be/_g46WiGPhFs` has the ID `_g46WiGPhFs`. The ID must be surrounded with single quotation marks. \ No newline at end of file diff --git a/Gemfile b/Gemfile index 7825dcd02b..7bfb39856e 100644 --- a/Gemfile +++ b/Gemfile @@ -39,4 +39,3 @@ gem "webrick", "~> 1.7" # Link checker gem "typhoeus" gem "ruby-link-checker" -gem "ruby-enum" diff --git a/README.md b/README.md index 13dae8a5bd..a7c5033080 100644 --- a/README.md +++ b/README.md @@ -2,19 +2,12 @@ # About the OpenSearch documentation repo -The `documentation-website` repository contains the user documentation for OpenSearch. You can find the rendered documentation at [opensearch.org/docs](https://opensearch.org/docs). +The documentation repository contains the documentation for OpenSearch, the search, analytics, and visualization suite with advanced security, alerting, SQL support, automated index management, deep performance analysis, and more. You can find the rendered documentation at [opensearch.org/docs](https://opensearch.org/docs). -## Contributing +## How you can help -Community contributions remain essential to keeping the documentation comprehensive, useful, well organized, and up to date. If you are interested in submitting an issue or contributing content, see [CONTRIBUTING](CONTRIBUTING.md). - -The following resources provide important guidance regarding contributions to the documentation: - -- [OpenSearch Project Style Guidelines](STYLE_GUIDE.md) -- The style guide covers the style standards to be observed when creating OpenSearch Project content. -- [OpenSearch terms](TERMS.md) -- The terms list contains key OpenSearch terms and tips on how and when to use them. -- [API Style Guide](API_STYLE_GUIDE.md) -- The API Style Guide provides the basic structure for creating OpenSearch API documentation. -- [Formatting Guide](FORMATTING_GUIDE.md) -- The OpenSearch documentation uses a modified version of the [just-the-docs](https://github.com/pmarsceill/just-the-docs) Jekyll theme. The Formatting Guide provides an overview of the commonly used formatting elements and how to add a page to the website. +Community contributions remain essential in keeping this documentation comprehensive, useful, well-organized, and up-to-date. If you are interested in contributing, please see the [Contribution](https://github.com/opensearch-project/documentation-website/blob/main/CONTRIBUTING.md) file. ## Points of contact @@ -28,6 +21,128 @@ If you encounter problems or have questions when contributing to the documentati - [vagimeli](https://github.com/vagimeli) +## How the website works + +This repository contains [Markdown](https://guides.github.com/features/mastering-markdown/) files organized into Jekyll "collections" (e.g., `_search-plugins`, `_opensearch`, etc.). Each Markdown file correlates with one page on the website. + +Using plain text on GitHub has many advantages: + +- Everything is free, open source, and works on every operating system. Use your favorite text editor, Ruby, Jekyll, and Git. +- Markdown is easy to learn and looks good in side-by-side diffs. +- The workflow is no different than contributing code. Make your changes, build locally to check your work, and submit a pull request. Reviewers check the PR before merging. +- Alternatives like wikis and WordPress are full web applications that require databases and ongoing maintenance. They also have inferior versioning and content review processes compared to Git. Static websites, such as the ones Jekyll produces, are faster, more secure, and more stable. + +In addition to the content for a given page, each Markdown file contains some Jekyll [front matter](https://jekyllrb.com/docs/front-matter/). Front matter looks like this: + +``` +--- +layout: default +title: Alerting security +nav_order: 10 +parent: Alerting +has_children: false +--- +``` + +If you want to reorganize content or add new pages, keep an eye on `has_children`, `parent`, and `nav_order`, which define the hierarchy and order of pages in the lefthand navigation. For more information, see the documentation for [our upstream Jekyll theme](https://pmarsceill.github.io/just-the-docs/docs/navigation-structure/). + + +## Contribute content + +There are a few ways to contribute content, depending on the magnitude of the change. + +- [Minor changes](#minor-changes) +- [Major changes](#major-changes) +- [Create an issue](https://github.com/opensearch-project/documentation-website/issues) + + +### Minor changes + +If you want to add a few paragraphs across multiple files and are comfortable with Git, try this approach: + +1. Fork this repository. + +1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork. + +1. Navigate to the repository root. + +1. Create a new branch. + +1. Edit the Markdown files in `/docs`. + +1. Commit, [sign off](https://github.com/src-d/guide/blob/9171d013c648236c39faabcad8598be3c0cf8f56/developer-community/fix-DCO.md#how-to-prevent-missing-sign-offs-in-the-future), push your changes to your fork, and submit a pull request. + + +### Major changes + +If you're making major changes to the documentation and need to see the rendered HTML before submitting a pull request, here's how to make the changes and view them locally: + +1. Fork this repository. + +1. Download [GitHub Desktop](https://desktop.github.com), install it, and clone your fork. + +1. Navigate to the repository root. + +1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but use whatever method you prefer: + + ``` + curl -sSL https://get.rvm.io | bash -s stable + rvm install 2.6 + ruby -v + ``` + +1. Install [Jekyll](https://jekyllrb.com/) if you don't already have it: + + ``` + gem install bundler jekyll + ``` + +1. Install dependencies: + + ``` + bundle install + ``` + +1. Build: + + ``` + sh build.sh + ``` + +1. If the build script doesn't automatically open your web browser (it should), open [http://localhost:4000/docs/](http://localhost:4000/docs/). + +1. Create a new branch. + +1. Edit the Markdown files in each collection (e.g. `_security/`). + + If you're a web developer, you can customize `_layouts/default.html` and `_sass/custom/custom.scss`. + +1. When you save a file, marvel as Jekyll automatically rebuilds the site and refreshes your web browser. This process can take anywhere from 10-30 seconds. + +1. When you're happy with how everything looks, commit, [sign off](https://github.com/src-d/guide/blob/9171d013c648236c39faabcad8598be3c0cf8f56/developer-community/fix-DCO.md#how-to-prevent-missing-sign-offs-in-the-future), push your changes to your fork, and submit a pull request. + + +## Writing tips + +The OpenSearch team released [style guidelines](https://github.com/opensearch-project/documentation-website/blob/main/STYLE_GUIDE.md) for our documentation and marketing content. These guidelines cover the style standards and terms to be observed when creating OpenSearch content. We ask that you please adhere to these guidelines whenever contributing content. + +We also provide guidelines on terminology. For a list of OpenSearch terms, see [Terms](https://github.com/opensearch-project/documentation-website/blob/main/TERMS.md). + + +## Formatting documentation + +The OpenSearch documentation uses a modified version of the [just-the-docs](https://github.com/pmarsceill/just-the-docs) Jekyll theme. For an overview of the commonly used formatted elements, including callouts, videos, and buttons, see the [FORMATTING_GUIDE](FORMATTING_GUIDE.md). + + +## Style linting + +We use the [Vale](https://github.com/errata-ai/vale) linter to ensure that our documentation adheres to the [OpenSearch Project Style Guidelines](STYLE_GUIDE.md). To install Vale locally, follow these steps: + +1. Run `brew install vale`. +2. Run `vale *` from the documentation site root directory to lint all Markdown files. To lint a specific file, run `vale /path/to/file`. + +Optionally, you can install the [Vale VSCode](https://github.com/chrischinchilla/vale-vscode) extension that integrates Vale with Visual Studio Code. By default, only _errors_ and _warnings_ are underlined. To change the minimum alert level to include _suggestions_, go to **Vale VSCode** > **Extension Settings** and select **suggestion** in the **Vale > Vale CLI: Min Alert Level** dropdown list. + ## Code of conduct This project has adopted an [Open Source Code of Conduct](https://opensearch.org/codeofconduct.html). @@ -35,12 +150,12 @@ This project has adopted an [Open Source Code of Conduct](https://opensearch.org ## Security -If you discover a potential security issue in this project, we ask that you notify AWS/Amazon Security using our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Do **not** create a public GitHub issue. +See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License -This project is licensed under the [Apache 2.0 License](LICENSE). +This project is licensed under the Apache-2.0 License. ## Copyright diff --git a/API_STYLE_GUIDE.md b/STYLE_API_TEMPLATE.md similarity index 88% rename from API_STYLE_GUIDE.md rename to STYLE_API_TEMPLATE.md index 6dc40df017..0002e90846 100644 --- a/API_STYLE_GUIDE.md +++ b/STYLE_API_TEMPLATE.md @@ -1,14 +1,12 @@ -# API Style Guide +# API reference page template -This guide provides the basic structure for creating OpenSearch API documentation. It includes the various elements that we feel are most important to creating complete and useful API documentation, as well as description and examples where appropriate. +This template provides the basic structure for creating OpenSearch API documentation. It includes the most important elements that should appear in the documentation and helpful suggestions to help support them. Depending on the intended purpose of the API, *some sections will be required while others may not be applicable*. -Use the [API_TEMPLATE](templates/API_TEMPLATE.md) to create an API documentation page. - ### A note on terminology ### -Terminology for API parameters varies in the software industry, where two or even three names may be used to label the same type of parameter. For consistency, we use the following nomenclature for parameters in our API documentation: +Terminology for API parameters varies in the software industry, where two or even three names may be used to label the same type of parameter. For the sake of consistency, we use the following nomenclature for parameters in our API documentation: * *Path parameter* – "path parameter" and "URL parameter" are sometimes used synonymously. To avoid confusion, we use "path parameter" in this documentation. * *Query parameter* – This parameter name is often used synonymously with "request parameter." We use "query parameter" to be consistent. @@ -26,7 +24,7 @@ Provide a REST API call example in `json` format. Optionally, also include the ` ## Basic elements for documentation -The following sections describe the basic API documentation structure. Each section is discussed under its respective heading. Include only those elements appropriate to the API. +The following sections describe the basic API documentation structure. Each section is discussed under its respective heading below. You can include only those elements appropriate to the API. Depending on where the documentation appears within a section or subsection, heading levels may be adjusted to fit with other content. @@ -72,11 +70,10 @@ GET /_nodes//stats// While the API endpoint states a point of entry to a resource, the path parameter acts on the resource that precedes it. Path parameters come after the resource name in the URL. -In the following example, the resource is `scroll` and its path parameter is ``: - ```json GET _search/scroll/ ``` +In the example above, the resource is `scroll` and its path parameter is ``. Introduce what the path parameters can do at a high level. Provide a table with parameter names and descriptions. Include a table with the following columns: *Parameter* – Parameter name in plain font. @@ -90,12 +87,12 @@ Parameter | Data type | Description In terms of placement, query parameters are always appended to the end of the URL and located to the right of the operator "?". Query parameters serve the purpose of modifying information to be retrieved from the resource. -In the following example, the endpoint is `aliases` and its query parameter is `v` (provides verbose output): - ```json GET _cat/aliases?v ``` +In the example above, the endpoint is `aliases` and its query parameter is `v` (provides verbose output). + Include a paragraph that describes how to use the query parameters with an example in code font. Include the query parameter operator "?" to delineate query parameters from path parameters. For GET and DELETE APIs: Introduce what you can do with the optional parameters. Include a table with the same columns as the path parameter table. @@ -117,7 +114,7 @@ Field | Data type | Description #### Example request -Provide a sentence that describes what is shown in the example, followed by a cut-and-paste-ready API request in JSON format. Make sure that you test the request yourself in the Dashboards Dev Tools console to make sure it works. See the following examples. +Provide a sentence that describes what is shown in the example, followed by a cut-and-paste-ready API request in JSON format. Make sure that you test the request yourself in the Dashboards Dev Tools console to make sure it works. See the examples below. The following request gets all the settings in your index: @@ -141,7 +138,7 @@ POST _reindex #### Example response -Include a JSON example response to show what the API returns. See the following examples. +Include a JSON example response to show what the API returns. See the examples below. The `GET /sample-index1/_settings` request returns the following response fields: diff --git a/STYLE_GUIDE.md b/STYLE_GUIDE.md index fa3b687bb3..f3efe38c10 100644 --- a/STYLE_GUIDE.md +++ b/STYLE_GUIDE.md @@ -14,12 +14,9 @@ The following naming conventions should be observed in OpenSearch Project conten * Capitalize both words when referring to the *OpenSearch Project*. * *OpenSearch* is the name for the distributed search and analytics engine used by Amazon OpenSearch Service. -* Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch. Use the full name *Amazon OpenSearch Service* on first appearance. The abbreviated service name, *OpenSearch Service*, can be used for subsequent appearances. -* Amazon OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. Use the full name *Amazon OpenSearch Serverless* on first appearance. The abbreviated service name, *OpenSearch Serverless*, can be used for subsequent appearances. -* OpenSearch Dashboards is the UI for OpenSearch. On first appearance, use the full name *OpenSearch Dashboards*. *Dashboards* can be used for subsequent appearances. -* *Security Analytics* is a security information and event management (SIEM) solution for OpenSearch. Capitalize both words when referring to the name of the solution. -* Observability is collection of plugins and applications that let you visualize data-driven events by using Piped Processing Language (PPL). Capitalize *Observability* when referring to the name of the solution. -* Refer to OpenSearch Project customers as *users*, and refer to the larger group of users as *the community*. Do not refer to the OpenSearch Project or to the AWS personnel working on the project as a *team*, as this implies differentiation within the community. +* Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch. Use the full name *Amazon OpenSearch Service* on first appearance. The abbreviated service name, *OpenSearch Service*, can be used for subsequent appearances. +* OpenSearch Dashboards is the UI for OpenSearch. On first appearance, use the full name *OpenSearch Dashboards*. *Dashboards* can be used for subsequent appearances. +* Refer to OpenSearch Project customers as *users*, and refer to the larger group of users as *the community*. #### Product names @@ -46,7 +43,7 @@ Use lowercase when referring to features, unless you are referring to a formally * “*Remote-backed storage* is an experimental feature. Therefore, we do not recommend the use of *remote-backed storage* in a production environment.” * “You can take and restore *snapshots* using the snapshot API.” * “You can use the *VisBuilder* visualization type in OpenSearch Dashboards to create data visualizations by using a drag-and-drop gesture.” (You can refer to VisBuilder alone or qualify the term with “visualization type”.) -* “As of OpenSearch 2.4, the *ML framework* only supports text-embedding models without GPU acceleration.” +* “As of OpenSearch 2.4, the *model-serving framework* only supports text embedding models without GPU acceleration.” #### Plugin names @@ -65,7 +62,7 @@ The voice of the OpenSearch Project is people oriented and focused on empowering Whenever possible, use the active voice instead of the passive voice. The passive form is typically wordier and can often cause writers to obscure the details of the action. For example, change the agentless passive _it is recommended_ to the more direct _we recommend_. -Refer to the reader as _you_ (second person), and refer to the OpenSearch Project as _we_ (first person). If there are multiple authors for a blog post, you can use _we_ to refer to the authors as individuals. Do not refer to the OpenSearch Project or to the AWS personnel working on the project as a *team*, as this implies differentiation within the community. +Refer to the reader as _you_ (second person), and refer to the OpenSearch Project as _we_ (first person). If there are multiple authors for a blog post, you can use _we_ to refer to the authors as individuals. Describe the actions that the user takes, rather than contextualizing from the feature perspective. For example, use phrases such as “With this feature, you can...” or “Use this feature to...” instead of saying a feature *allows*, *enables*, or *lets* the user do something. @@ -123,8 +120,6 @@ The following table lists acronyms that you don't need to spell out. | BASIC | Beginner's All-Purpose Symbolic Instruction Code | | BM25 | Best Match 25 | | CPU | central processing unit | -| CSV | comma-separated values | -| DNS | Domain Name System | | DOS | disk operating system | | FAQ | frequently asked questions | | FTP | File Transfer Protocol | @@ -159,76 +154,14 @@ The following table lists acronyms that you don't need to spell out. | XML | Extensible Markup Language | | YAML | YAML Ain't Markup Language | -### Code examples - -Calling out code within a sentence or code block makes it clear to readers which items are code specific. The following is general guidance about using code examples and when to use `code font`: - -* In Markdown, use single backticks (`` ` ``) for inline code formatting and triple backticks (```` ``` ````) for code blocks. For example, writing `` `discovery.type` `` in Markdown will render as `discovery.type`. A line containing three backticks should be included both before and after an example code block. -* In sentences, use code font for things relating to code, for example, “The `from` and `size` parameters are stateless, so the results are based on the latest available data.” -* Use lead-in sentences to clarify the example. Exception: API examples, for which a caption-style lead-in (heading 4) is sufficient. -* Use the phrase *such as* for brief examples within a sentence. -* Use language-specific indentation in code examples. -* Make code blocks as copy-and-paste friendly as possible. Use either the [`copy` or `copy-curl` buttons](https://github.com/opensearch-project/documentation-website/blob/main/FORMATTING_GUIDE.md#buttons). - -#### Code formatting checklist - -The following items should be in `code font`: - -* Field names, variables (including environment variables), and settings (`discovery.type`, `@timestamp`, `PATH`). Use code font for variable and setting values if it improves readability (`false`, `1h`, `5`, or 5). -* Placeholder variables. Use angle brackets for placeholder variables (`docker exec -it /bin/bash`). -* Commands, command-line utilities, and options (`docker container ls -a`, `curl`, `-v`). -* File names, file paths, and directory names (`docker-compose.yml`, `/var/www/simplesamlphp/config/`). -* URLs and URL components (`localhost`, `http://localhost:5601`). -* Index names (`logs-000001`, `.opendistro-ism-config`), endpoints (`_cluster/settings`), and query parameters (`timeout`). -* Language keywords (`if`, `for`, `SELECT`, `AND`, `FROM`). -* Operators and symbols (`/`, `<`, `*`). -* Regular expression, date, or other patterns (`^.*-\d+$`, `yyyy-MM-dd`). -* Class names (`SettingsModule`) and interface names (*`RestHandler`*). Use italics for interface names. -* Text field inputs (Enter the password `admin`). -* Email addresses (`example@example.org`). - -#### Caption-style examples - -If you use a caption-style example, use the heading **Example**, with a colon, as appropriate. The following are caption-style examples: - - **Example: Retrieve a specified document from an index** - - The following example shows a request that retrieves a specific document and its information from an index: - - `GET sample-index1/_doc/1` - - **Example request** - - `GET sample-index1/_doc/1` - -Sometimes, you might not want to break up the flow of the text with a new heading. In these cases, you can use an example with no heading. - - The following command maps ports 9200 and 9600, sets the discovery type to single-node, and requests the newest image of OpenSearch: - - `docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest` - -#### Lead-in sentences - -When using lead-in sentences, summarize, clarify, or refer to the example that follows. A lead-in sentence is a complete sentence that ends in a colon. - - For example, the following query requests statistics for `docs` and `search`: - - `GET _nodes/stats/indices/docs,search` - -#### Referring to a variable or placeholder - -When introducing a code or command line example that refers to a variable or placeholder in the example, be direct by including the variable or placeholder name in the text. Surround the variable or placeholder name with angle brackets (`<` and `>`), for example, ``. Don't refer to the variable or placeholder by its color or format because these can change. If variable or placeholder texts have a lot in common and there are several for the user to complete, be direct by including a “template” for the input in the replaceable text. - - In the following example, replace `` with your own information: - - `~/workspace/project-name$ eb init --modules ` - ### Formatting and organization - Use a colon to introduce example blocks (for example, code and scripts) and most lists. Do not use a colon to introduce tables or images. - Use bold text for all UI elements, including pages, panes, and dialog boxes. In all cases, emphasize what the user must do as opposed to talking about the UI element itself. +- Reference images in the text that precedes them. For example, "..., as shown in the following image." + - Stacked headings should never appear in our content. Stacked headings are any two consecutive headings without intervening text. Even if it is just an introductory sentence, there should always be text under any heading. - Use italics for the titles of books, periodicals, and reference guides. However, do not use italics when the title of a work is also a hyperlink. @@ -238,16 +171,6 @@ When introducing a code or command line example that refers to a variable or pla 2. When referring to API operations by the exact name of the endpoint, use lowercase with code format (example: "`_field_caps` API"). 3. When describing API operations but not using the exact name of the endpoint, use lowercase (example: "field capabilities API operations" or "field capabilities operations"). -### Images - -- Add introductory text that provides sufficient context for each image. - -- Add ALT text that describes the image for screen readers. - -- When you’re describing the location of an image, use words such as *preceding*, *previous*, or *following* instead of *above* and *below*. - -- Text that introduces an image should be a complete sentence and end with a period, not a colon. - ### Links - **Formal cross-references**: In most cases, a formal cross-reference (the title of the page you're linking to) is the preferred style because it provides context and helps readers understand where they're going when they choose the link. Follow these guidelines for formal cross-references: @@ -314,10 +237,10 @@ We follow a slightly modified version of the _Microsoft Writing Style Guide_ gui | **Close** | - Apps and programs
- Dialog boxes
- Files and folders
- Notifications and alerts
- Tabs
- The action a program or app takes when it encounters a problem and can't continue. (Don't confuse with _stop responding_). | - Close the Alarms app.
- Close Excel.
- Save and close the document.
- Closing Excel also closes all open worksheets. | | **Leave** | Websites and webpages | Select **Submit** to complete the survey and leave this page. | | **Go to** | - Opening a menu.
- Going to a tab or another particular place in the UI.
- Going to a website or webpage.
- It's ok to use _On the **XXX** tab_ if the instruction is brief and continues immediately. | - Go to Search, enter the word **settings**, and then select **Settings**.
- Go to **File**, and then select **Close**.
- On the ribbon, go to the **Design** tab.
- Go to the **Deploy** tab. in the **Configuration** list ...
- On the **Deploy** tab, in the **Configuration** list ...
- Go to Example.com to register. | -| **Select** | Instructing the user to select a specific item, including:
- Selecting an option, such as a button.
- Selecting a checkbox.
- Selecting a value from a list box.
- Selecting link text to go to a link.
- Selecting an item on a menu or shortcut menu.
- Selecting an item from a gallery. | - Select the **Modify** button.
- For **Alignment**, select **Left**.
- Select the text, open the shortcut menu, and then select **Font**.
- Select **Open in new tab**.
- Select the **LinkName** link. | +| **Select** | Instructing the user to select a specific item, including:
- Selecting an option, such as a button.
- Selecting a check box.
- Selecting a value from a list box.
- Selecting link text to go to a link.
- Selecting an item on a menu or shortcut menu.
- Selecting an item from a gallery. | - Select the **Modify** button.
- For **Alignment**, select **Left**.
- Select the text, open the shortcut menu, and then select **Font**.
- Select **Open in new tab**.
- Select the **LinkName** link. | | **Select and hold, select and hold (or right-click)** | Use to describe pressing and holding an element in the UI. It's OK to use _right-click_ with _select and hold_ when the instruction isn't specific to touch devices. | - To flag a message that you want to deal with later, select and hold it, and then select **Set flag**.
- Select and hold (or right-click) the Windows taskbar, and then select **Cascade windows**.
- Select and hold (or right-click) the **Start** button, and then select **Device Manager**. | | **>** | Use a greater-than symbol (>) to separate sequential steps.
Only use this approach when there's a clear and obvious path through the UI and the selection method is the same for each step. For example, don't mix things that require opening, selecting, and choosing.
Don't bold the greater-than symbol. Include a space before and after the symbol. | Select **Accounts** > **Other accounts** > **Add an account**. | -| **Clear** | Clearing the selection from a checkbox. | Clear the **Header row** checkbox. | +| **Clear** | Clearing the selection from a check box. | Clear the **Header row** checkbox. | | **Choose** | Choosing an option, based on the customer's preference or desired outcome. | On the **Font** tab, choose the effects you want. | | **Switch, turn on, turn off** | Turning a toggle key or toggle switch on or off. | - Use the **Caps lock** key to switch from typing capital letter to typing lowercase letters.
- To keep all applied filters, turn on the **Pass all filters** toggle. | | **Enter** | Instructing the customer to type or otherwise insert a value, or to type or select a value in a combo box. | - In the search box, enter...
- In the **Tab stop position** box, enter the location where you want to set the new tab.
- In the **Deployment script name** box, enter a name for this script. | @@ -327,8 +250,6 @@ We follow a slightly modified version of the _Microsoft Writing Style Guide_ gui ### Punctuation and capitalization -- Use only one space after a period. - - Use contractions carefully for a more casual tone. Use common contractions. Avoid future tense (I’ll), archaic (‘twas), colloquial (ain’t), or compound (couldn’t’ve) contractions. - Use sentence case for titles, headings, and table headers. Titles of standalone documents may use title case. @@ -419,7 +340,7 @@ Follow these basic guidelines when writing UI text. * Keep it short. Users don’t want to read dense text. Remember that UI text can expand by 30% when it’s translated into other languages. * Keep it simple. Try to use simple sentences (one subject, one verb, one main clause and idea) rather than compound or complex sentences. * Prefer active voice over passive voice. For example, "You can attach up to 10 policies" is active voice, and "Up to 10 policies can be attached" is passive voice. -* Use device-agnostic language rather than mouse-specific language. For example, use _choose_ instead of _click_ (exception: use _select_ for checkboxes). +* Use device-agnostic language rather than mouse-specific language. For example, use _choose_ instead of _click_ (exception: use _select_ for check boxes). ##### Tone diff --git a/TERMS.md b/TERMS.md index 02df1fcf46..7e47452768 100644 --- a/TERMS.md +++ b/TERMS.md @@ -75,10 +75,6 @@ Messages and pop-up boxes appear. Windows, pages, and applications open. The ver Do not abbreviate as app server. -**artificial intelligence** - -On first mention, use *artificial intelligence (AI)*. Use *AI* thereafter. There is no need to redefine *AI* when either *AI/ML* or *GenAI* has already been defined. - **as well as** Avoid. Replace with in addition to or and as appropriate. @@ -93,10 +89,6 @@ Lower case scaling, auto scaling, and automatic scaling (but not autoscaling) ar Do not use hyphenated auto-scaling as a compound modifier. Instead, use scaling (for example, scaling policy), or scalable (for example, scalable target or scalable, load-balanced environment). -**AWS Signature Version 4** - -Use on first appearance. On subsequent appearances, *Signature Version 4* may be used. Only use *SigV4* when space is limited. - ## B **below** @@ -154,8 +146,6 @@ certificate authority Use _certificates_ on first mention. It’s OK to use _certs_ thereafter. -**checkbox, checkboxes** - **CI/CD** Use _continuous integration_ and _continuous delivery (CI/CD)_ or _continuous integration and delivery (CI/CD)_ on first mention. @@ -168,10 +158,6 @@ A collection of one or more nodes. A single node that routes requests for the cluster and makes changes to other nodes. Each cluster contains a single cluster manager. -**command line, command-line** - -Two words as a noun. Hyphenate as an adjective. - **console** A tool inside OpenSearch Dashboards used to interact with the OpenSearch REST API. @@ -297,25 +283,9 @@ Use frontend as an adjective and a noun. Do not use front end or front-end. Do n ## G -**GenAI** - -On first mention, use *generative artificial intelligence (GenAI)*. Use *GenAI* thereafter. To avoid the overuse of *GenAI*, *AI/ML-powered applications* may also be used. - -**geodistance** - -**geohash** - -**geohex** - **geopoint** -**geopolygon** - **geoshape** - -**geospatial** - -**geotile** ## H @@ -333,8 +303,6 @@ Do not use. This term is unnecessarily violent for technical documentation. Use **hostname** -**Hugging Face** - ## I **i.e.** @@ -365,8 +333,6 @@ A collection of JSON documents. Non-hardcoded references to *indices* should be **Index State Management (ISM)** -**inline** - **install in, on** install in a folder, directory, or path; install on a disk, drive, or instance. @@ -457,7 +423,7 @@ Apache Lucene™ is a high-performance, full-featured search engine library writ **machine learning** -When *machine learning* is used multiple times in a document, use *machine learning (ML)* on first mention and *ML* thereafter. There is no need to redefine *ML* when *AI/ML* has already been defined. If spelled out, write *machine learning* as two words (no hyphen) in all cases, including when used as an adjective before a noun. +Write as two words (no hyphen) in all cases, including when used as an adjective before a noun. Abbreviate to ML after first use if appropriate. **Machine Learning (ML) Commons** @@ -513,16 +479,12 @@ OpenSearch is a community-driven, open-source search and analytics suite derived **OpenSearch Dashboards** -The default visualization tool for data in OpenSearch. On first appearance, use the full name. *Dashboards* may be used on subsequent appearances. +The default visualization tool for data in OpenSearch. On first appearance, use the full name. “Dashboards” may be used on subsequent appearances. open source (n.), open-source (adj.) Use _open source_ as a noun (for example, “The code used throughout this tutorial is open source and can be freely modified”). Use _open-source_ as an adjective _(open-source software)_. -**OpenSearch Playground** - -OpenSearch Playground provides a central location for existing and evaluating users to explore features in OpenSearch and OpenSearch Dashboards without downloading or installing any OpenSearch components locally. - **operating system** When referencing operating systems in documentation, follow these guidelines: @@ -580,8 +542,6 @@ Correct: an on-premises solution Incorrect: an on-premise solution, an on-prem solution -**pretrain** - **primary shard** A Lucene instance that contains data for some or all of an index. diff --git a/_aggregations/bucket/adjacency-matrix.md b/_aggregations/bucket/adjacency-matrix.md deleted file mode 100644 index fd521f8510..0000000000 --- a/_aggregations/bucket/adjacency-matrix.md +++ /dev/null @@ -1,101 +0,0 @@ ---- -layout: default -title: Adjacency matrix -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 10 -redirect_from: - - /query-dsl/aggregations/bucket/adjacency-matrix/ ---- - -# Adjacency matrix aggregations - -The `adjacency_matrix` aggregation lets you define filter expressions and returns a matrix of the intersecting filters where each non-empty cell in the matrix represents a bucket. You can find how many documents fall within any combination of filters. - -Use the `adjacency_matrix` aggregation to discover how concepts are related by visualizing the data as graphs. - -For example, in the sample eCommerce dataset, to analyze how the different manufacturing companies are related: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "interactions": { - "adjacency_matrix": { - "filters": { - "grpA": { - "match": { - "manufacturer.keyword": "Low Tide Media" - } - }, - "grpB": { - "match": { - "manufacturer.keyword": "Elitelligence" - } - }, - "grpC": { - "match": { - "manufacturer.keyword": "Oceanavigations" - } - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - - ```json - { - ... - "aggregations" : { - "interactions" : { - "buckets" : [ - { - "key" : "grpA", - "doc_count" : 1553 - }, - { - "key" : "grpA&grpB", - "doc_count" : 590 - }, - { - "key" : "grpA&grpC", - "doc_count" : 329 - }, - { - "key" : "grpB", - "doc_count" : 1370 - }, - { - "key" : "grpB&grpC", - "doc_count" : 299 - }, - { - "key" : "grpC", - "doc_count" : 1218 - } - ] - } - } - } -``` - - Let’s take a closer look at the result: - - ```json - { - "key" : "grpA&grpB", - "doc_count" : 590 - } - ``` - -- `grpA`: Products manufactured by Low Tide Media. -- `grpB`: Products manufactured by Elitelligence. -- `590`: Number of products that are manufactured by both. - -You can use OpenSearch Dashboards to represent this data with a network graph. \ No newline at end of file diff --git a/_aggregations/bucket/date-histogram.md b/_aggregations/bucket/date-histogram.md deleted file mode 100644 index e308104e16..0000000000 --- a/_aggregations/bucket/date-histogram.md +++ /dev/null @@ -1,61 +0,0 @@ ---- -layout: default -title: Date histogram -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 20 -redirect_from: - - /query-dsl/aggregations/bucket/date-histogram/ ---- - -# Date histogram aggregations - -The `date_histogram` aggregation uses [date math]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#date-math) to generate histograms for time-series data. - -For example, you can find how many hits your website gets per month: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "logs_per_month": { - "date_histogram": { - "field": "@timestamp", - "interval": "month" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "logs_per_month" : { - "buckets" : [ - { - "key_as_string" : "2020-10-01T00:00:00.000Z", - "key" : 1601510400000, - "doc_count" : 1635 - }, - { - "key_as_string" : "2020-11-01T00:00:00.000Z", - "key" : 1604188800000, - "doc_count" : 6844 - }, - { - "key_as_string" : "2020-12-01T00:00:00.000Z", - "key" : 1606780800000, - "doc_count" : 5595 - } - ] - } -} -} -``` - -The response has three months worth of logs. If you graph these values, you can see the peak and valleys of the request traffic to your website month over month. diff --git a/_aggregations/bucket/date-range.md b/_aggregations/bucket/date-range.md deleted file mode 100644 index c7d66d729d..0000000000 --- a/_aggregations/bucket/date-range.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -layout: default -title: Date range -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 30 -redirect_from: - - /query-dsl/aggregations/bucket/date-range/ ---- - -# Date range aggregations - -The `date_range` aggregation is conceptually the same as the `range` aggregation, except that it lets you perform date math. -For example, you can get all documents from the last 10 days. To make the date more readable, include the format with a `format` parameter: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "number_of_bytes": { - "date_range": { - "field": "@timestamp", - "format": "MM-yyyy", - "ranges": [ - { - "from": "now-10d/d", - "to": "now" - } - ] - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "number_of_bytes" : { - "buckets" : [ - { - "key" : "03-2021-03-2021", - "from" : 1.6145568E12, - "from_as_string" : "03-2021", - "to" : 1.615451329043E12, - "to_as_string" : "03-2021", - "doc_count" : 0 - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/diversified-sampler.md b/_aggregations/bucket/diversified-sampler.md deleted file mode 100644 index 303f29f9a3..0000000000 --- a/_aggregations/bucket/diversified-sampler.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -layout: default -title: Diversified sampler -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 40 -redirect_from: - - /query-dsl/aggregations/bucket/diversified-sampler/ ---- - -# Diversified sampler aggregations - -The `diversified_sampler` aggregation lets you reduce the bias in the distribution of the sample pool. You can use the `field` setting to control the maximum number of documents collected on any one shard which shares a common value: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "sample": { - "diversified_sampler": { - "shard_size": 1000, - "field": "response.keyword" - }, - "aggs": { - "terms": { - "terms": { - "field": "agent.keyword" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "sample" : { - "doc_count" : 3, - "terms" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", - "doc_count" : 2 - }, - { - "key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)", - "doc_count" : 1 - } - ] - } - } - } -} -``` diff --git a/_aggregations/bucket/filter.md b/_aggregations/bucket/filter.md deleted file mode 100644 index 0768ea1148..0000000000 --- a/_aggregations/bucket/filter.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -layout: default -title: Filter -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 50 -redirect_from: - - /query-dsl/aggregations/bucket/filter/ ---- - -# Filter aggregations - -A `filter` aggregation is a query clause, exactly like a search query — `match` or `term` or `range`. You can use the `filter` aggregation to narrow down the entire set of documents to a specific set before creating buckets. - -The following example shows the `avg` aggregation running within the context of a filter. The `avg` aggregation only aggregates the documents that match the `range` query: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "low_value": { - "filter": { - "range": { - "taxful_total_price": { - "lte": 50 - } - } - }, - "aggs": { - "avg_amount": { - "avg": { - "field": "taxful_total_price" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "low_value" : { - "doc_count" : 1633, - "avg_amount" : { - "value" : 38.363175998928355 - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/filters.md b/_aggregations/bucket/filters.md deleted file mode 100644 index b3977da7c1..0000000000 --- a/_aggregations/bucket/filters.md +++ /dev/null @@ -1,81 +0,0 @@ ---- -layout: default -title: Filters -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 60 -redirect_from: - - /query-dsl/aggregations/bucket/filters/ ---- - -# Filters aggregations - -A `filters` aggregation is the same as the `filter` aggregation, except that it lets you use multiple filter aggregations. -While the `filter` aggregation results in a single bucket, the `filters` aggregation returns multiple buckets, one for each of the defined filters. - -To create a bucket for all the documents that didn't match the any of the filter queries, set the `other_bucket` property to `true`: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "200_os": { - "filters": { - "other_bucket": true, - "filters": [ - { - "term": { - "response.keyword": "200" - } - }, - { - "term": { - "machine.os.keyword": "osx" - } - } - ] - }, - "aggs": { - "avg_amount": { - "avg": { - "field": "bytes" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "200_os" : { - "buckets" : [ - { - "doc_count" : 12832, - "avg_amount" : { - "value" : 5897.852711970075 - } - }, - { - "doc_count" : 2825, - "avg_amount" : { - "value" : 5620.347256637168 - } - }, - { - "doc_count" : 1017, - "avg_amount" : { - "value" : 3247.0963618485744 - } - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/geo-distance.md b/_aggregations/bucket/geo-distance.md deleted file mode 100644 index a111015ac1..0000000000 --- a/_aggregations/bucket/geo-distance.md +++ /dev/null @@ -1,160 +0,0 @@ ---- -layout: default -title: Geodistance -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 70 -redirect_from: - - /query-dsl/aggregations/bucket/geo-distance/ ---- - -# Geodistance aggregations - -The `geo_distance` aggregation groups documents into concentric circles based on distances from an origin `geo_point` field. -It's the same as the `range` aggregation, except that it works on geo locations. - -For example, you can use the `geo_distance` aggregation to find all pizza places within 1 km of you. The search results are limited to the 1 km radius specified by you, but you can add another result found within 2 km. - -You can only use the `geo_distance` aggregation on fields mapped as `geo_point`. - -A point is a single geographical coordinate, such as your current location shown by your smart-phone. A point in OpenSearch is represented as follows: - -```json -{ - "location": { - "type": "point", - "coordinates": { - "lat": 83.76, - "lon": -81.2 - } - } -} -``` - -You can also specify the latitude and longitude as an array `[-81.20, 83.76]` or as a string `"83.76, -81.20"` - -This table lists the relevant fields of a `geo_distance` aggregation: - -Field | Description | Required -:--- | :--- |:--- -`field` | Specify the geopoint field that you want to work on. | Yes -`origin` | Specify the geopoint that's used to compute the distances from. | Yes -`ranges` | Specify a list of ranges to collect documents based on their distance from the target point. | Yes -`unit` | Define the units used in the `ranges` array. The `unit` defaults to `m` (meters), but you can switch to other units like `km` (kilometers), `mi` (miles), `in` (inches), `yd` (yards), `cm` (centimeters), and `mm` (millimeters). | No -`distance_type` | Specify how OpenSearch calculates the distance. The default is `sloppy_arc` (faster but less accurate), but can also be set to `arc` (slower but most accurate) or `plane` (fastest but least accurate). Because of high error margins, use `plane` only for small geographic areas. | No - -The syntax is as follows: - -```json -{ - "aggs": { - "aggregation_name": { - "geo_distance": { - "field": "field_1", - "origin": "x, y", - "ranges": [ - { - "to": "value_1" - }, - { - "from": "value_2", - "to": "value_3" - }, - { - "from": "value_4" - } - ] - } - } - } -} -``` - -This example forms buckets from the following distances from a `geo-point` field: - -- Fewer than 10 km -- From 10 to 20 km -- From 20 to 50 km -- From 50 to 100 km -- Above 100 km - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "position": { - "geo_distance": { - "field": "geo.coordinates", - "origin": { - "lat": 83.76, - "lon": -81.2 - }, - "ranges": [ - { - "to": 10 - }, - { - "from": 10, - "to": 20 - }, - { - "from": 20, - "to": 50 - }, - { - "from": 50, - "to": 100 - }, - { - "from": 100 - } - ] - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "position" : { - "buckets" : [ - { - "key" : "*-10.0", - "from" : 0.0, - "to" : 10.0, - "doc_count" : 0 - }, - { - "key" : "10.0-20.0", - "from" : 10.0, - "to" : 20.0, - "doc_count" : 0 - }, - { - "key" : "20.0-50.0", - "from" : 20.0, - "to" : 50.0, - "doc_count" : 0 - }, - { - "key" : "50.0-100.0", - "from" : 50.0, - "to" : 100.0, - "doc_count" : 0 - }, - { - "key" : "100.0-*", - "from" : 100.0, - "doc_count" : 14074 - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/geohash-grid.md b/_aggregations/bucket/geohash-grid.md deleted file mode 100644 index 778bfb86fe..0000000000 --- a/_aggregations/bucket/geohash-grid.md +++ /dev/null @@ -1,261 +0,0 @@ ---- -layout: default -title: Geohash grid -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 80 -redirect_from: - - /query-dsl/aggregations/bucket/geohash-grid/ ---- - -# Geohash grid aggregations - -The `geohash_grid` aggregation buckets documents for geographical analysis. It organizes a geographical region into a grid of smaller regions of different sizes or precisions. Lower values of precision represent larger geographical areas, and higher values represent smaller, more precise geographical areas. You can aggregate documents on [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) or [geoshape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/) fields using a geohash grid aggregation. One notable difference is that a geopoint is only present in one bucket, but a geoshape is counted in all geohash grid cells with which it intersects. - -The number of results returned by a query might be far too many to display each geopoint individually on a map. The `geohash_grid` aggregation buckets nearby geopoints together by calculating the geohash for each point, at the level of precision that you define (between 1 to 12; the default is 5). To learn more about geohash, see [Wikipedia](https://en.wikipedia.org/wiki/Geohash). - -The web logs example data is spread over a large geographical area, so you can use a lower precision value. You can zoom in on this map by increasing the precision value: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "geo_hash": { - "geohash_grid": { - "field": "geo.coordinates", - "precision": 4 - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "geo_hash" : { - "buckets" : [ - { - "key" : "c1cg", - "doc_count" : 104 - }, - { - "key" : "dr5r", - "doc_count" : 26 - }, - { - "key" : "9q5b", - "doc_count" : 20 - }, - { - "key" : "c20g", - "doc_count" : 19 - }, - { - "key" : "dr70", - "doc_count" : 18 - } - ... - ] - } - } -} -``` - -You can visualize the aggregated response on a map using OpenSearch Dashboards. - -The more accurate you want the aggregation to be, the more resources OpenSearch consumes because of the number of buckets that the aggregation has to calculate. By default, OpenSearch does not generate more than 10,000 buckets. You can change this behavior by using the `size` attribute, but keep in mind that the performance might suffer for very wide queries consisting of thousands of buckets. - -## Aggregating geoshapes - -To run an aggregation on a geoshape field, first create an index and map the `location` field as a `geo_shape`: - -```json -PUT national_parks -{ - "mappings": { - "properties": { - "location": { - "type": "geo_shape" - } - } - } -} -``` -{% include copy-curl.html %} - -Next, index some documents into the `national_parks` index: - -```json -PUT national_parks/_doc/1 -{ - "name": "Yellowstone National Park", - "location": - {"type": "envelope","coordinates": [ [-111.15, 45.12], [-109.83, 44.12] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/2 -{ - "name": "Yosemite National Park", - "location": - {"type": "envelope","coordinates": [ [-120.23, 38.16], [-119.05, 37.45] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/3 -{ - "name": "Death Valley National Park", - "location": - {"type": "envelope","coordinates": [ [-117.34, 37.01], [-116.38, 36.25] ]} -} -``` -{% include copy-curl.html %} - -You can run an aggregation on the `location` field as follows: - -```json -GET national_parks/_search -{ - "aggregations": { - "grouped": { - "geohash_grid": { - "field": "location", - "precision": 1 - } - } - } -} -``` -{% include copy-curl.html %} - -When aggregating geoshapes, one geoshape can be counted for multiple buckets because it overlaps multiple grid cells: - -
- - Response - - {: .text-delta} - -```json -{ - "took" : 24, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 3, - "relation" : "eq" - }, - "max_score" : 1.0, - "hits" : [ - { - "_index" : "national_parks", - "_id" : "1", - "_score" : 1.0, - "_source" : { - "name" : "Yellowstone National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -111.15, - 45.12 - ], - [ - -109.83, - 44.12 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "2", - "_score" : 1.0, - "_source" : { - "name" : "Yosemite National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -120.23, - 38.16 - ], - [ - -119.05, - 37.45 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "3", - "_score" : 1.0, - "_source" : { - "name" : "Death Valley National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -117.34, - 37.01 - ], - [ - -116.38, - 36.25 - ] - ] - } - } - } - ] - }, - "aggregations" : { - "grouped" : { - "buckets" : [ - { - "key" : "9", - "doc_count" : 3 - }, - { - "key" : "c", - "doc_count" : 1 - } - ] - } - } -} -``` -
- -Currently, OpenSearch supports geoshape aggregation through the API but not in OpenSearch Dashboards visualizations. If you'd like to see geoshape aggregation implemented for visualizations, upvote the related [GitHub issue](https://github.com/opensearch-project/dashboards-maps/issues/250). -{: .note} - -## Supported parameters - -Geohash grid aggregation requests support the following parameters. - -Parameter | Data type | Description -:--- | :--- | :--- -field | String | The field on which aggregation is performed. This field must be mapped as a `geo_point` or `geo_shape` field. If the field contains an array, all array values are aggregated. Required. -precision | Integer | The zoom level used to determine grid cells for bucketing results. Valid values are in the [0, 15] range. Optional. Default is 5. -bounds | Object | The bounding box for filtering geopoints and geoshapes. The bounding box is defined by the upper-left and lower-right vertices. Only shapes that intersect with this bounding box or are completely enclosed by this bounding box are included in the aggregation output. The vertices are specified as geopoints in one of the following formats:
- An object with a latitude and longitude
- An array in the [`longitude`, `latitude`] format
- A string in the "`latitude`,`longitude`" format
- A geohash
- WKT
See the [geopoint formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) for formatting examples. Optional. -size | Integer | The maximum number of buckets to return. When there are more buckets than `size`, OpenSearch returns buckets with more documents. Optional. Default is 10,000. -shard_size | Integer | The maximum number of buckets to return from each shard. Optional. Default is max (10, `size` · number of shards), which provides a more accurate count of more highly prioritized buckets. \ No newline at end of file diff --git a/_aggregations/bucket/geotile-grid.md b/_aggregations/bucket/geotile-grid.md deleted file mode 100644 index cb4347288c..0000000000 --- a/_aggregations/bucket/geotile-grid.md +++ /dev/null @@ -1,550 +0,0 @@ ---- -layout: default -title: Geotile grid -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 87 -redirect_from: - - /query-dsl/aggregations/bucket/geotile-grid/ ---- - -# Geotile grid aggregations - -The geotile grid aggregation groups documents into grid cells for geographical analysis. Each grid cell corresponds to a [map tile](https://en.wikipedia.org/wiki/Tiled_web_map) and is identified using the `{zoom}/{x}/{y}` format. You can aggregate documents on [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) or [geoshape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-shape/) fields using a geotile grid aggregation. One notable difference is that a geopoint is only present in one bucket, but a geoshape is counted in all geotile grid cells with which it intersects. - -## Precision - -The `precision` parameter controls the level of granularity that determines the grid cell size. The lower the precision, the larger the grid cells. - -The following example illustrates low-precision and high-precision aggregation requests. - -To start, create an index and map the `location` field as a `geo_point`: - -```json -PUT national_parks -{ - "mappings": { - "properties": { - "location": { - "type": "geo_point" - } - } - } -} -``` -{% include copy-curl.html %} - -Index the following documents into the sample index: - -```json -PUT national_parks/_doc/1 -{ - "name": "Yellowstone National Park", - "location": "44.42, -110.59" -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/2 -{ - "name": "Yosemite National Park", - "location": "37.87, -119.53" -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/3 -{ - "name": "Death Valley National Park", - "location": "36.53, -116.93" -} -``` -{% include copy-curl.html %} - -You can index geopoints in several formats. For a list of all supported formats, see the [geopoint documentation]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats). -{: .note} - -## Low-precision requests - -Run a low-precision request that buckets all three documents together: - -```json -GET national_parks/_search -{ - "aggregations": { - "grouped": { - "geotile_grid": { - "field": "location", - "precision": 1 - } - } - } -} -``` -{% include copy-curl.html %} - -You can use either the `GET` or `POST` HTTP method for geotile grid aggregation queries. -{: .note} - -The response groups all documents together because they are close enough to be bucketed in one grid cell: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 51, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "national_parks", - "_id": "1", - "_score": 1, - "_source": { - "name": "Yellowstone National Park", - "location": "44.42, -110.59" - } - }, - { - "_index": "national_parks", - "_id": "2", - "_score": 1, - "_source": { - "name": "Yosemite National Park", - "location": "37.87, -119.53" - } - }, - { - "_index": "national_parks", - "_id": "3", - "_score": 1, - "_source": { - "name": "Death Valley National Park", - "location": "36.53, -116.93" - } - } - ] - }, - "aggregations": { - "grouped": { - "buckets": [ - { - "key": "1/0/0", - "doc_count": 3 - } - ] - } - } -} -``` -
- -## High-precision requests - -Now run a high-precision request: - -```json -GET national_parks/_search -{ - "aggregations": { - "grouped": { - "geotile_grid": { - "field": "location", - "precision": 6 - } - } - } -} -``` -{% include copy-curl.html %} - -All three documents are bucketed separately because of higher granularity: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 15, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "national_parks", - "_id": "1", - "_score": 1, - "_source": { - "name": "Yellowstone National Park", - "location": "44.42, -110.59" - } - }, - { - "_index": "national_parks", - "_id": "2", - "_score": 1, - "_source": { - "name": "Yosemite National Park", - "location": "37.87, -119.53" - } - }, - { - "_index": "national_parks", - "_id": "3", - "_score": 1, - "_source": { - "name": "Death Valley National Park", - "location": "36.53, -116.93" - } - } - ] - }, - "aggregations": { - "grouped": { - "buckets": [ - { - "key": "6/12/23", - "doc_count": 1 - }, - { - "key": "6/11/25", - "doc_count": 1 - }, - { - "key": "6/10/24", - "doc_count": 1 - } - ] - } - } -} -``` -
- -You can also restrict the geographical area by providing the coordinates of the bounding envelope in the `bounds` parameter. Both `bounds` and `geo_bounding_box` coordinates can be specified in any of the [geopoint formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats). The following query uses the well-known text (WKT) "POINT(`longitude` `latitude`)" format for the `bounds` parameter: - -```json -GET national_parks/_search -{ - "size": 0, - "aggregations": { - "grouped": { - "geotile_grid": { - "field": "location", - "precision": 6, - "bounds": { - "top_left": "POINT (-120 38)", - "bottom_right": "POINT (-116 36)" - } - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains only the two results that are within the specified bounds: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 48, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 3, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "national_parks", - "_id": "1", - "_score": 1, - "_source": { - "name": "Yellowstone National Park", - "location": "44.42, -110.59" - } - }, - { - "_index": "national_parks", - "_id": "2", - "_score": 1, - "_source": { - "name": "Yosemite National Park", - "location": "37.87, -119.53" - } - }, - { - "_index": "national_parks", - "_id": "3", - "_score": 1, - "_source": { - "name": "Death Valley National Park", - "location": "36.53, -116.93" - } - } - ] - }, - "aggregations": { - "grouped": { - "buckets": [ - { - "key": "6/11/25", - "doc_count": 1 - }, - { - "key": "6/10/24", - "doc_count": 1 - } - ] - } - } -} -``` -
- -The `bounds` parameter can be used with or without the `geo_bounding_box` filter; these two parameters are independent and can have any spatial relationship to each other. - -## Aggregating geoshapes - -To run an aggregation on a geoshape field, first create an index and map the `location` field as a `geo_shape`: - -```json -PUT national_parks -{ - "mappings": { - "properties": { - "location": { - "type": "geo_shape" - } - } - } -} -``` -{% include copy-curl.html %} - -Next, index some documents into the `national_parks` index: - -```json -PUT national_parks/_doc/1 -{ - "name": "Yellowstone National Park", - "location": - {"type": "envelope","coordinates": [ [-111.15, 45.12], [-109.83, 44.12] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/2 -{ - "name": "Yosemite National Park", - "location": - {"type": "envelope","coordinates": [ [-120.23, 38.16], [-119.05, 37.45] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/3 -{ - "name": "Death Valley National Park", - "location": - {"type": "envelope","coordinates": [ [-117.34, 37.01], [-116.38, 36.25] ]} -} -``` -{% include copy-curl.html %} - -You can run an aggregation on the `location` field as follows: - -```json -GET national_parks/_search -{ - "aggregations": { - "grouped": { - "geotile_grid": { - "field": "location", - "precision": 6 - } - } - } -} -``` -{% include copy-curl.html %} - -When aggregating geoshapes, one geoshape can be counted for multiple buckets because it overlaps with multiple grid cells: - -
- - Response - - {: .text-delta} - -```json -{ - "took" : 3, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 3, - "relation" : "eq" - }, - "max_score" : 1.0, - "hits" : [ - { - "_index" : "national_parks", - "_id" : "1", - "_score" : 1.0, - "_source" : { - "name" : "Yellowstone National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -111.15, - 45.12 - ], - [ - -109.83, - 44.12 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "2", - "_score" : 1.0, - "_source" : { - "name" : "Yosemite National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -120.23, - 38.16 - ], - [ - -119.05, - 37.45 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "3", - "_score" : 1.0, - "_source" : { - "name" : "Death Valley National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -117.34, - 37.01 - ], - [ - -116.38, - 36.25 - ] - ] - } - } - } - ] - }, - "aggregations" : { - "grouped" : { - "buckets" : [ - { - "key" : "6/12/23", - "doc_count" : 1 - }, - { - "key" : "6/12/22", - "doc_count" : 1 - }, - { - "key" : "6/11/25", - "doc_count" : 1 - }, - { - "key" : "6/11/24", - "doc_count" : 1 - }, - { - "key" : "6/10/24", - "doc_count" : 1 - } - ] - } - } -} -``` -
- -Currently, OpenSearch supports geoshape aggregation through the API but not in OpenSearch Dashboards visualizations. If you'd like to see geoshape aggregation implemented for visualizations, upvote the related [GitHub issue](https://github.com/opensearch-project/dashboards-maps/issues/250). -{: .note} - -## Supported parameters - -Geotile grid aggregation requests support the following parameters. - -Parameter | Data type | Description -:--- | :--- | :--- -field | String | The field that contains the geopoints. This field must be mapped as a `geo_point` field. If the field contains an array, all array values are aggregated. Required. -precision | Integer | The zoom level used to determine grid cells for bucketing results. Valid values are in the [0, 15] range. Optional. Default is 5. -bounds | Object | The bounding box for filtering geopoints. The bounding box is defined by the upper-left and lower-right vertices. The vertices are specified as geopoints in one of the following formats:
- An object with a latitude and longitude
- An array in the [`longitude`, `latitude`] format
- A string in the "`latitude`,`longitude`" format
- A geohash
- WKT
See the [geopoint formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) for formatting examples. Optional. -size | Integer | The maximum number of buckets to return. When there are more buckets than `size`, OpenSearch returns buckets with more documents. Optional. Default is 10,000. -shard_size | Integer | The maximum number of buckets to return from each shard. Optional. Default is max (10, `size` · number of shards), which provides a more accurate count of more highly prioritized buckets. \ No newline at end of file diff --git a/_aggregations/bucket/global.md b/_aggregations/bucket/global.md deleted file mode 100644 index bfd516b8a3..0000000000 --- a/_aggregations/bucket/global.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -layout: default -title: Global -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 90 -redirect_from: - - /query-dsl/aggregations/bucket/global/ ---- - -# Global aggregations - -The `global` aggregations lets you break out of the aggregation context of a filter aggregation. Even if you have included a filter query that narrows down a set of documents, the `global` aggregation aggregates on all documents as if the filter query wasn't there. It ignores the `filter` aggregation and implicitly assumes the `match_all` query. - -The following example returns the `avg` value of the `taxful_total_price` field from all documents in the index: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "query": { - "range": { - "taxful_total_price": { - "lte": 50 - } - } - }, - "aggs": { - "total_avg_amount": { - "global": {}, - "aggs": { - "avg_price": { - "avg": { - "field": "taxful_total_price" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "total_avg_amount" : { - "doc_count" : 4675, - "avg_price" : { - "value" : 75.05542864304813 - } - } - } -} -``` - -You can see that the average value for the `taxful_total_price` field is 75.05 and not the 38.36 as seen in the `filter` example when the query matched. \ No newline at end of file diff --git a/_aggregations/bucket/histogram.md b/_aggregations/bucket/histogram.md deleted file mode 100644 index 0d9f2bb964..0000000000 --- a/_aggregations/bucket/histogram.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -layout: default -title: Histogram -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 100 -redirect_from: - - /query-dsl/aggregations/bucket/histogram/ ---- - -# Histogram aggregations - -The `histogram` aggregation buckets documents based on a specified interval. - -With `histogram` aggregations, you can visualize the distributions of values in a given range of documents very easily. Now OpenSearch doesn’t give you back an actual graph of course, that’s what OpenSearch Dashboards is for. But it'll give you the JSON response that you can use to construct your own graph. - -The following example buckets the `number_of_bytes` field by 10,000 intervals: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "number_of_bytes": { - "histogram": { - "field": "bytes", - "interval": 10000 - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "number_of_bytes" : { - "buckets" : [ - { - "key" : 0.0, - "doc_count" : 13372 - }, - { - "key" : 10000.0, - "doc_count" : 702 - } - ] - } - } -} -``` diff --git a/_aggregations/bucket/index.md b/_aggregations/bucket/index.md deleted file mode 100644 index 1658c06ea5..0000000000 --- a/_aggregations/bucket/index.md +++ /dev/null @@ -1,45 +0,0 @@ ---- -layout: default -title: Bucket aggregations -has_children: true -has_toc: false -nav_order: 3 -redirect_from: - - /opensearch/bucket-agg/ - - /query-dsl/aggregations/bucket-agg/ - - /query-dsl/aggregations/bucket/ - - /aggregations/bucket-agg/ ---- - -# Bucket aggregations - -Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines the bucket for a given document. - -You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter the results. - -## Supported bucket aggregations - -OpenSearch supports the following bucket aggregations: - -- [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/) -- [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/) -- [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/) -- [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/) -- [Filter]({{site.url}}{{site.baseurl}}/aggregations/bucket/filter/) -- [Filters]({{site.url}}{{site.baseurl}}/aggregations/bucket/filters/) -- [Geodistance]({{site.url}}{{site.baseurl}}/aggregations/bucket/geo-distance/) -- [Geohash grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohash-grid/) -- [Geohex grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohex-grid/) -- [Geotile grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geotile-grid/) -- [Global]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/) -- [Histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/histogram/) -- [IP range]({{site.url}}{{site.baseurl}}/aggregations/bucket/ip-range/) -- [Missing]({{site.url}}{{site.baseurl}}/aggregations/bucket/missing/) -- [Multi-terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/multi-terms/) -- [Nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/nested/) -- [Range]({{site.url}}{{site.baseurl}}/aggregations/bucket/range/) -- [Reverse nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/reverse-nested/) -- [Sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/) -- [Significant terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-terms/) -- [Significant text]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-text/) -- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/) \ No newline at end of file diff --git a/_aggregations/bucket/ip-range.md b/_aggregations/bucket/ip-range.md deleted file mode 100644 index 897827d412..0000000000 --- a/_aggregations/bucket/ip-range.md +++ /dev/null @@ -1,77 +0,0 @@ ---- -layout: default -title: IP range -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 110 -redirect_from: - - /query-dsl/aggregations/bucket/ip-range/ ---- - -# IP range aggregations - -The `ip_range` aggregation is for IP addresses. -It works on `ip` type fields. You can define the IP ranges and masks in the [CIDR](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) notation. - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "access": { - "ip_range": { - "field": "ip", - "ranges": [ - { - "from": "1.0.0.0", - "to": "126.158.155.183" - }, - { - "mask": "1.0.0.0/8" - } - ] - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "access" : { - "buckets" : [ - { - "key" : "1.0.0.0/8", - "from" : "1.0.0.0", - "to" : "2.0.0.0", - "doc_count" : 98 - }, - { - "key" : "1.0.0.0-126.158.155.183", - "from" : "1.0.0.0", - "to" : "126.158.155.183", - "doc_count" : 7184 - } - ] - } - } -} -``` - -If you add a document with malformed fields to an index that has `ip_range` set to `false` in its mappings, OpenSearch rejects the entire document. You can set `ignore_malformed` to `true` to specify that OpenSearch should ignore malformed fields. The default is `false`. - -```json -... -"mappings": { - "properties": { - "ips": { - "type": "ip_range", - "ignore_malformed": true - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/missing.md b/_aggregations/bucket/missing.md deleted file mode 100644 index 547076859d..0000000000 --- a/_aggregations/bucket/missing.md +++ /dev/null @@ -1,82 +0,0 @@ ---- -layout: default -title: Missing -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 120 -redirect_from: - - /query-dsl/aggregations/bucket/missing/ ---- - -# Missing aggregations - -If you have documents in your index that don’t contain the aggregating field at all or the aggregating field has a value of NULL, use the `missing` parameter to specify the name of the bucket such documents should be placed in. - -The following example adds any missing values to a bucket named "N/A": - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "response_codes": { - "terms": { - "field": "response.keyword", - "size": 10, - "missing": "N/A" - } - } - } -} -``` -{% include copy-curl.html %} - -Because the default value for the `min_doc_count` parameter is 1, the `missing` parameter doesn't return any buckets in its response. Set `min_doc_count` parameter to 0 to see the "N/A" bucket in the response: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "response_codes": { - "terms": { - "field": "response.keyword", - "size": 10, - "missing": "N/A", - "min_doc_count": 0 - } - } - } -} -``` - -#### Example response - -```json -... -"aggregations" : { - "response_codes" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : "200", - "doc_count" : 12832 - }, - { - "key" : "404", - "doc_count" : 801 - }, - { - "key" : "503", - "doc_count" : 441 - }, - { - "key" : "N/A", - "doc_count" : 0 - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/multi-terms.md b/_aggregations/bucket/multi-terms.md deleted file mode 100644 index eb779e7c48..0000000000 --- a/_aggregations/bucket/multi-terms.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -layout: default -title: Multi-terms -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 130 -redirect_from: - - /query-dsl/aggregations/multi-terms/ ---- - -# Multi-terms aggregations - -Similar to the `terms` bucket aggregation, you can also search for multiple terms using the `multi_terms` aggregation. Multi-terms aggregations are useful when you need to sort by document count, or when you need to sort by a metric aggregation on a composite key and get the top `n` results. For example, you could search for a specific number of documents (e.g., 1000) and the number of servers per location that show CPU usage greater than 90%. The top number of results would be returned for this multi-term query. - -The `multi_terms` aggregation does consume more memory than a `terms` aggregation, so its performance might be slower. -{: .tip } - -## Multi-terms aggregation parameters - -Parameter | Description -:--- | :--- -multi_terms | Indicates a multi-terms aggregation that gathers buckets of documents together based on criteria specified by multiple terms. -size | Specifies the number of buckets to return. Default is 10. -order | Indicates the order to sort the buckets. By default, buckets are ordered according to document count per bucket. If the buckets contain the same document count, then `order` can be explicitly set to the term value instead of document count. (e.g., set `order` to "max-cpu"). -doc_count | Specifies the number of documents to be returned in each bucket. By default, the top 10 terms are returned. - -#### Example request - -```json -GET sample-index100/_search -{ - "size": 0, - "aggs": { - "hot": { - "multi_terms": { - "terms": [{ - "field": "region" - },{ - "field": "host" - }], - "order": {"max-cpu": "desc"} - }, - "aggs": { - "max-cpu": { "max": { "field": "cpu" } } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 118, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 8, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "multi-terms": { - "doc_count_error_upper_bound": 0, - "sum_other_doc_count": 0, - "buckets": [ - { - "key": [ - "dub", - "h1" - ], - "key_as_string": "dub|h1", - "doc_count": 2, - "max-cpu": { - "value": 90.0 - } - }, - { - "key": [ - "dub", - "h2" - ], - "key_as_string": "dub|h2", - "doc_count": 2, - "max-cpu": { - "value": 70.0 - } - }, - { - "key": [ - "iad", - "h2" - ], - "key_as_string": "iad|h2", - "doc_count": 2, - "max-cpu": { - "value": 50.0 - } - }, - { - "key": [ - "iad", - "h1" - ], - "key_as_string": "iad|h1", - "doc_count": 2, - "max-cpu": { - "value": 15.0 - } - } - ] - } - } -} -``` diff --git a/_aggregations/bucket/nested.md b/_aggregations/bucket/nested.md deleted file mode 100644 index 94a0f4416a..0000000000 --- a/_aggregations/bucket/nested.md +++ /dev/null @@ -1,106 +0,0 @@ ---- -layout: default -title: Nested -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 140 -redirect_from: - - /query-dsl/aggregations/bucket/nested/ ---- - -# Nested aggregations - -The `nested` aggregation lets you aggregate on fields inside a nested object. The `nested` type is a specialized version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other - -With the `object` type, all the data is stored in the same document, so matches for a search can go across sub documents. For example, imagine a `logs` index with `pages` mapped as an `object` datatype: - -```json -PUT logs/_doc/0 -{ - "response": "200", - "pages": [ - { - "page": "landing", - "load_time": 200 - }, - { - "page": "blog", - "load_time": 500 - } - ] -} -``` -{% include copy-curl.html %} - -OpenSearch merges all sub-properties of the entity relations that looks something like this: - -```json -{ - "logs": { - "pages": ["landing", "blog"], - "load_time": ["200", "500"] - } -} -``` - -So, if you wanted to search this index with `pages=landing` and `load_time=500`, this document matches the criteria even though the `load_time` value for landing is 200. - -If you want to make sure such cross-object matches don’t happen, map the field as a `nested` type: - -```json -PUT logs -{ - "mappings": { - "properties": { - "pages": { - "type": "nested", - "properties": { - "page": { "type": "text" }, - "load_time": { "type": "double" } - } - } - } - } -} -``` -{% include copy-curl.html %} - -Nested documents allow you to index the same JSON document but will keep your pages in separate Lucene documents, making only searches like `pages=landing` and `load_time=200` return the expected result. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others. - -You have to specify a nested path relative to parent that contains the nested documents: - - -```json -GET logs/_search -{ - "query": { - "match": { "response": "200" } - }, - "aggs": { - "pages": { - "nested": { - "path": "pages" - }, - "aggs": { - "min_load_time": { "min": { "field": "pages.load_time" } } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "pages" : { - "doc_count" : 2, - "min_price" : { - "value" : 200.0 - } - } - } -} -``` diff --git a/_aggregations/bucket/range.md b/_aggregations/bucket/range.md deleted file mode 100644 index 61ec2f6276..0000000000 --- a/_aggregations/bucket/range.md +++ /dev/null @@ -1,78 +0,0 @@ ---- -layout: default -title: Range -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 150 -redirect_from: - - /query-dsl/aggregations/bucket/range/ ---- - -# Range aggregations - -The `range` aggregation lets you define the range for each bucket. - -For example, you can find the number of bytes between 1000 and 2000, 2000 and 3000, and 3000 and 4000. -Within the `range` parameter, you can define ranges as objects of an array. - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "number_of_bytes_distribution": { - "range": { - "field": "bytes", - "ranges": [ - { - "from": 1000, - "to": 2000 - }, - { - "from": 2000, - "to": 3000 - }, - { - "from": 3000, - "to": 4000 - } - ] - } - } - } -} -``` -{% include copy-curl.html %} - -The response includes the `from` key values and excludes the `to` key values: - -#### Example response - -```json -... -"aggregations" : { - "number_of_bytes_distribution" : { - "buckets" : [ - { - "key" : "1000.0-2000.0", - "from" : 1000.0, - "to" : 2000.0, - "doc_count" : 805 - }, - { - "key" : "2000.0-3000.0", - "from" : 2000.0, - "to" : 3000.0, - "doc_count" : 1369 - }, - { - "key" : "3000.0-4000.0", - "from" : 3000.0, - "to" : 4000.0, - "doc_count" : 1422 - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/reverse-nested.md b/_aggregations/bucket/reverse-nested.md deleted file mode 100644 index bfd04986fa..0000000000 --- a/_aggregations/bucket/reverse-nested.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -layout: default -title: Reverse nested -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 160 -redirect_from: - - /query-dsl/aggregations/bucket/reverse-nested/ ---- - -# Reverse nested aggregations - -You can aggregate values from nested documents to their parent; this aggregation is called `reverse_nested`. -You can use `reverse_nested` to aggregate a field from the parent document after grouping by the field from the nested object. The `reverse_nested` aggregation "joins back" the root page and gets the `load_time` for each for your variations. - -The `reverse_nested` aggregation is a sub-aggregation inside a nested aggregation. It accepts a single option named `path`. This option defines how many steps backwards in the document hierarchy OpenSearch takes to calculate the aggregations. - -```json -GET logs/_search -{ - "query": { - "match": { "response": "200" } - }, - "aggs": { - "pages": { - "nested": { - "path": "pages" - }, - "aggs": { - "top_pages_per_load_time": { - "terms": { - "field": "pages.load_time" - }, - "aggs": { - "comment_to_logs": { - "reverse_nested": {}, - "aggs": { - "min_load_time": { - "min": { - "field": "pages.load_time" - } - } - } - } - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "pages" : { - "doc_count" : 2, - "top_pages_per_load_time" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : 200.0, - "doc_count" : 1, - "comment_to_logs" : { - "doc_count" : 1, - "min_load_time" : { - "value" : null - } - } - }, - { - "key" : 500.0, - "doc_count" : 1, - "comment_to_logs" : { - "doc_count" : 1, - "min_load_time" : { - "value" : null - } - } - } - ] - } - } - } -} -``` - -The response shows the logs index has one page with a `load_time` of 200 and one with a `load_time` of 500. \ No newline at end of file diff --git a/_aggregations/bucket/sampler.md b/_aggregations/bucket/sampler.md deleted file mode 100644 index 3668f3c755..0000000000 --- a/_aggregations/bucket/sampler.md +++ /dev/null @@ -1,82 +0,0 @@ ---- -layout: default -title: Sampler -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 170 ---- - -# Sampler aggregations - -If you're aggregating over millions of documents, you can use a `sampler` aggregation to reduce its scope to a small sample of documents for a faster response. The `sampler` aggregation selects the samples by top-scoring documents. - -The results are approximate but closely represent the distribution of the real data. The `sampler` aggregation significantly improves query performance, but the estimated responses are not entirely reliable. - -The basic syntax is: - -```json -“aggs”: { - "SAMPLE": { - "sampler": { - "shard_size": 100 - }, - "aggs": {...} - } -} -``` - -The `shard_size` property tells OpenSearch how many documents (at most) to collect from each shard. - -The following example limits the number of documents collected on each shard to 1,000 and then buckets the documents by a `terms` aggregation: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "sample": { - "sampler": { - "shard_size": 1000 - }, - "aggs": { - "terms": { - "terms": { - "field": "agent.keyword" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "sample" : { - "doc_count" : 1000, - "terms" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", - "doc_count" : 368 - }, - { - "key" : "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24", - "doc_count" : 329 - }, - { - "key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)", - "doc_count" : 303 - } - ] - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/significant-terms.md b/_aggregations/bucket/significant-terms.md deleted file mode 100644 index 017e3b7dd8..0000000000 --- a/_aggregations/bucket/significant-terms.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -layout: default -title: Significant terms -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 180 ---- - -# Significant terms aggregations - -The `significant_terms` aggregation lets you spot unusual or interesting term occurrences in a filtered subset relative to the rest of the data in an index. - -A foreground set is the set of documents that you filter. A background set is a set of all documents in an index. -The `significant_terms` aggregation examines all documents in the foreground set and finds a score for significant occurrences in contrast to the documents in the background set. - -In the sample web log data, each document has a field containing the `user-agent` of the visitor. This example searches for all requests from an iOS operating system. A regular `terms` aggregation on this foreground set returns Firefox because it has the most number of documents within this bucket. On the other hand, a `significant_terms` aggregation returns Internet Explorer (IE) because IE has a significantly higher appearance in the foreground set as compared to the background set. - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "query": { - "terms": { - "machine.os.keyword": [ - "ios" - ] - } - }, - "aggs": { - "significant_response_codes": { - "significant_terms": { - "field": "agent.keyword" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "significant_response_codes" : { - "doc_count" : 2737, - "bg_count" : 14074, - "buckets" : [ - { - "key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)", - "doc_count" : 818, - "score" : 0.01462731514608217, - "bg_count" : 4010 - }, - { - "key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", - "doc_count" : 1067, - "score" : 0.009062566630410223, - "bg_count" : 5362 - } - ] - } - } -} -``` - -If the `significant_terms` aggregation doesn't return any result, you might have not filtered the results with a query. Alternatively, the distribution of terms in the foreground set might be the same as the background set, implying that there isn't anything unusual in the foreground set. - -The default source of statistical information for background term frequencies is the entire index. You can narrow this scope with a background filter for more focus - diff --git a/_aggregations/bucket/significant-text.md b/_aggregations/bucket/significant-text.md deleted file mode 100644 index 1c136603d6..0000000000 --- a/_aggregations/bucket/significant-text.md +++ /dev/null @@ -1,132 +0,0 @@ ---- -layout: default -title: Significant text -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 190 ---- - -# Significant text aggregations - -The `significant_text` aggregation is similar to the `significant_terms` aggregation but it's for raw text fields. -Significant text measures the change in popularity measured between the foreground and background sets using statistical analysis. For example, it might suggest Tesla when you look for its stock acronym TSLA. - -The `significant_text` aggregation re-analyzes the source text on the fly, filtering noisy data like duplicate paragraphs, boilerplate headers and footers, and so on, which might otherwise skew the results. - -Re-analyzing high-cardinality datasets can be a very CPU-intensive operation. We recommend using the `significant_text` aggregation inside a sampler aggregation to limit the analysis to a small selection of top-matching documents, for example 200. - -You can set the following parameters: - -- `min_doc_count` - Return results that match more than a configured number of top hits. We recommend not setting `min_doc_count` to 1 because it tends to return terms that are typos or misspellings. Finding more than one instance of a term helps reinforce that the significance is not the result of a one-off accident. The default value of 3 is used to provide a minimum weight-of-evidence. -- `shard_size` - Setting a high value increases stability (and accuracy) at the expense of computational performance. -- `shard_min_doc_count` - If your text contains many low frequency words and you're not interested in these (for example typos), then you can set the `shard_min_doc_count` parameter to filter out candidate terms at a shard level with a reasonable certainty to not reach the required `min_doc_count` even after merging the local significant text frequencies. The default value is 1, which has no impact until you explicitly set it. We recommend setting this value much lower than the `min_doc_count` value. - -Assume that you have the complete works of Shakespeare indexed in an OpenSearch cluster. You can find significant texts in relation to the word "breathe" in the `text_entry` field: - -```json -GET shakespeare/_search -{ - "query": { - "match": { - "text_entry": "breathe" - } - }, - "aggregations": { - "my_sample": { - "sampler": { - "shard_size": 100 - }, - "aggregations": { - "keywords": { - "significant_text": { - "field": "text_entry", - "min_doc_count": 4 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -"aggregations" : { - "my_sample" : { - "doc_count" : 59, - "keywords" : { - "doc_count" : 59, - "bg_count" : 111396, - "buckets" : [ - { - "key" : "breathe", - "doc_count" : 59, - "score" : 1887.0677966101694, - "bg_count" : 59 - }, - { - "key" : "air", - "doc_count" : 4, - "score" : 2.641295376716233, - "bg_count" : 189 - }, - { - "key" : "dead", - "doc_count" : 4, - "score" : 0.9665839666414213, - "bg_count" : 495 - }, - { - "key" : "life", - "doc_count" : 5, - "score" : 0.9090787433467572, - "bg_count" : 805 - } - ] - } - } - } -} -``` - -The most significant texts in relation to `breathe` are `air`, `dead`, and `life`. - -The `significant_text` aggregation has the following limitations: - -- Doesn't support child aggregations because child aggregations come at a high memory cost. As a workaround, you can add a follow-up query using a `terms` aggregation with an include clause and a child aggregation. -- Doesn't support nested objects because it works with the document JSON source. -- The counts of documents might have some (typically small) inaccuracies as it's based on summing the samples returned from each shard. You can use the `shard_size` parameter to fine-tune the trade-off between accuracy and performance. By default, the `shard_size` is set to -1 to automatically estimate the number of shards and the `size` parameter. - -The default source of statistical information for background term frequencies is the entire index. You can narrow this scope with a background filter for more focus: - -```json -GET shakespeare/_search -{ - "query": { - "match": { - "text_entry": "breathe" - } - }, - "aggregations": { - "my_sample": { - "sampler": { - "shard_size": 100 - }, - "aggregations": { - "keywords": { - "significant_text": { - "field": "text_entry", - "background_filter": { - "term": { - "speaker": "JOHN OF GAUNT" - } - } - } - } - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/bucket/terms.md b/_aggregations/bucket/terms.md deleted file mode 100644 index 229ded6133..0000000000 --- a/_aggregations/bucket/terms.md +++ /dev/null @@ -1,156 +0,0 @@ ---- -layout: default -title: Terms -parent: Bucket aggregations -grand_parent: Aggregations -nav_order: 200 ---- - -# Terms aggregations - -The `terms` aggregation dynamically creates a bucket for each unique term of a field. - -The following example uses the `terms` aggregation to find the number of documents per response code in web log data: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggs": { - "response_codes": { - "terms": { - "field": "response.keyword", - "size": 10 - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "response_codes" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : "200", - "doc_count" : 12832 - }, - { - "key" : "404", - "doc_count" : 801 - }, - { - "key" : "503", - "doc_count" : 441 - } - ] - } - } -} -``` - -The values are returned with the key `key`. -`doc_count` specifies the number of documents in each bucket. By default, the buckets are sorted in descending order of `doc-count`. - -The response also includes two keys named `doc_count_error_upper_bound` and `sum_other_doc_count`. - -The `terms` aggregation returns the top unique terms. So, if the data has many unique terms, then some of them might not appear in the results. The `sum_other_doc_count` field is the sum of the documents that are left out of the response. In this case, the number is 0 because all the unique values appear in the response. - -The `doc_count_error_upper_bound` field represents the maximum possible count for a unique value that's left out of the final results. Use this field to estimate the error margin for the count. - -The count might not be accurate. A coordinating node that’s responsible for the aggregation prompts each shard for its top unique terms. Imagine a scenario where the `size` parameter is 3. -The `terms` aggregation requests each shard for its top 3 unique terms. The coordinating node takes each of the results and aggregates them to compute the final result. If a shard has an object that’s not part of the top 3, then it won't show up in the response. - -This is especially true if `size` is set to a low number. Because the default size is 10, an error is unlikely to happen. If you don’t need high accuracy and want to increase the performance, you can reduce the size. - -## Account for pre-aggregated data - -While the `doc_count` field provides a representation of the number of individual documents aggregated in a bucket, `doc_count` by itself does not have a way to correctly increment documents that store pre-aggregated data. To account for pre-aggregated data and accurately calculate the number of documents in a bucket, you can use the `_doc_count` field to add the number of documents in a single summary field. When a document includes the `_doc_count` field, all bucket aggregations recognize its value and increase the bucket `doc_count` cumulatively. Keep these considerations in mind when using the `_doc_count` field: - -* The field does not support nested arrays; only positive integers can be used. -* If a document does not contain the `_doc_count` field, aggregation uses the document to increase the count by 1. - -OpenSearch features that rely on an accurate document count illustrate the importance of using the `_doc_count` field. To see how this field can be used to support other search tools, refer to [Index rollups](https://opensearch.org/docs/latest/im-plugin/index-rollups/index/), an OpenSearch feature for the Index Management (IM) plugin that stores documents with pre-aggregated data in rollup indexes. -{: .tip} - -#### Example request - -```json -PUT /my_index/_doc/1 -{ - "response_code": 404, - "date":"2022-08-05", - "_doc_count": 20 -} - -PUT /my_index/_doc/2 -{ - "response_code": 404, - "date":"2022-08-06", - "_doc_count": 10 -} - -PUT /my_index/_doc/3 -{ - "response_code": 200, - "date":"2022-08-06", - "_doc_count": 300 -} - -GET /my_index/_search -{ - "size": 0, - "aggs": { - "response_codes": { - "terms": { - "field" : "response_code" - } - } - } -} -``` - -#### Example response - -```json -{ - "took" : 20, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 3, - "relation" : "eq" - }, - "max_score" : null, - "hits" : [ ] - }, - "aggregations" : { - "response_codes" : { - "doc_count_error_upper_bound" : 0, - "sum_other_doc_count" : 0, - "buckets" : [ - { - "key" : 200, - "doc_count" : 300 - }, - { - "key" : 404, - "doc_count" : 30 - } - ] - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/average.md b/_aggregations/metric/average.md deleted file mode 100644 index 247d497aef..0000000000 --- a/_aggregations/metric/average.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -layout: default -title: Average -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 10 -redirect_from: - - /query-dsl/aggregations/metric/average/ ---- - -# Average aggregations - -The `avg` metric is a single-value metric aggregations that returns the average value of a field. - -The following example calculates the average of the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "avg_taxful_total_price": { - "avg": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 85, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4675, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "sum_taxful_total_price": { - "value": 75.05542864304813 - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/cardinality.md b/_aggregations/metric/cardinality.md deleted file mode 100644 index c40dbb4497..0000000000 --- a/_aggregations/metric/cardinality.md +++ /dev/null @@ -1,62 +0,0 @@ ---- -layout: default -title: Cardinality -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 20 -redirect_from: - - /query-dsl/aggregations/metric/cardinality/ ---- - -# Cardinality aggregations - -The `cardinality` metric is a single-value metric aggregation that counts the number of unique or distinct values of a field. - -The following example finds the number of unique products in an eCommerce store: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "unique_products": { - "cardinality": { - "field": "products.product_id" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... - "aggregations" : { - "unique_products" : { - "value" : 7033 - } - } -} -``` - -Cardinality count is approximate. -If you have tens of thousands of products in your hypothetical store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well; it requires huge amounts of memory and can cause high latencies. - -You can control the trade-off between memory and accuracy with the `precision_threshold` setting. This setting defines the threshold below which counts are expected to be close to accurate. Above this value, counts might become a bit less accurate. The default value of `precision_threshold` is 3,000. The maximum supported value is 40,000. - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "unique_products": { - "cardinality": { - "field": "products.product_id", - "precision_threshold": 10000 - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/extended-stats.md b/_aggregations/metric/extended-stats.md deleted file mode 100644 index 633407dab0..0000000000 --- a/_aggregations/metric/extended-stats.md +++ /dev/null @@ -1,77 +0,0 @@ ---- -layout: default -title: Extended stats -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 30 -redirect_from: - - /query-dsl/aggregations/metric/extended-stats/ ---- - -# Extended stats aggregations - -The `extended_stats` aggregation is an extended version of the [`stats`]({{site.url}}{{site.baseurl}}/query-dsl/aggregations/metric/stats/) aggregation. Apart from including basic stats, `extended_stats` also returns stats such as `sum_of_squares`, `variance`, and `std_deviation`. -The following example returns extended stats for `taxful_total_price`: -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "extended_stats_taxful_total_price": { - "extended_stats": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "extended_stats_taxful_total_price" : { - "count" : 4675, - "min" : 6.98828125, - "max" : 2250.0, - "avg" : 75.05542864304813, - "sum" : 350884.12890625, - "sum_of_squares" : 3.9367749294174194E7, - "variance" : 2787.59157113862, - "variance_population" : 2787.59157113862, - "variance_sampling" : 2788.187974983536, - "std_deviation" : 52.79764740155209, - "std_deviation_population" : 52.79764740155209, - "std_deviation_sampling" : 52.80329511482722, - "std_deviation_bounds" : { - "upper" : 180.6507234461523, - "lower" : -30.53986616005605, - "upper_population" : 180.6507234461523, - "lower_population" : -30.53986616005605, - "upper_sampling" : 180.66201887270256, - "lower_sampling" : -30.551161586606312 - } - } - } -} -``` - -The `std_deviation_bounds` object provides a visual variance of the data with an interval of plus/minus two standard deviations from the mean. -To set the standard deviation to a different value, say 3, set `sigma` to 3: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "extended_stats_taxful_total_price": { - "extended_stats": { - "field": "taxful_total_price", - "sigma": 3 - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/geobounds.md b/_aggregations/metric/geobounds.md deleted file mode 100644 index 27b7646ca5..0000000000 --- a/_aggregations/metric/geobounds.md +++ /dev/null @@ -1,229 +0,0 @@ ---- -layout: default -title: Geobounds -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 40 -redirect_from: - - /query-dsl/aggregations/metric/geobounds/ ---- - -## Geobounds aggregations - -The `geo_bounds` metric is a multi-value metric aggregation that calculates the [geographic bounding box](https://docs.ogc.org/is/12-063r5/12-063r5.html#30) containing all values of a given `geo_point` or `geo_shape` field. The bounding box is returned as the upper-left and lower-right vertices of the rectangle in terms of latitude and longitude. - -The following example returns the `geo_bounds` metrics for the `geoip.location` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "geo": { - "geo_bounds": { - "field": "geoip.location" - } - } - } -} -``` - -#### Example response - -```json -"aggregations" : { - "geo" : { - "bounds" : { - "top_left" : { - "lat" : 52.49999997206032, - "lon" : -118.20000001229346 - }, - "bottom_right" : { - "lat" : 4.599999985657632, - "lon" : 55.299999956041574 - } - } - } - } -} -``` - -## Aggregating geoshapes - -To run an aggregation on a geoshape field, first create an index and map the `location` field as a `geo_shape`: - -```json -PUT national_parks -{ - "mappings": { - "properties": { - "location": { - "type": "geo_shape" - } - } - } -} -``` -{% include copy-curl.html %} - -Next, index some documents into the `national_parks` index: - -```json -PUT national_parks/_doc/1 -{ - "name": "Yellowstone National Park", - "location": - {"type": "envelope","coordinates": [ [-111.15, 45.12], [-109.83, 44.12] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/2 -{ - "name": "Yosemite National Park", - "location": - {"type": "envelope","coordinates": [ [-120.23, 38.16], [-119.05, 37.45] ]} -} -``` -{% include copy-curl.html %} - -```json -PUT national_parks/_doc/3 -{ - "name": "Death Valley National Park", - "location": - {"type": "envelope","coordinates": [ [-117.34, 37.01], [-116.38, 36.25] ]} -} -``` -{% include copy-curl.html %} - -You can run a `geo_bounds` aggregation on the `location` field as follows: - -```json -GET national_parks/_search -{ - "aggregations": { - "grouped": { - "geo_bounds": { - "field": "location", - "wrap_longitude": true - } - } - } -} -``` -{% include copy-curl.html %} - -The optional `wrap_longitude` parameter specifies whether the bounding box returned by the aggregation can overlap the international date line (180° meridian). If `wrap_longitude` is set to `true`, the bounding box can overlap the international date line and return a `bounds` object in which the lower-left longitude is greater than the upper-right longitude. The default value for `wrap_longitude` is `true`. - -The response contains the geo-bounding box that encloses all shapes in the `location` field: - -
- - Response - - {: .text-delta} - -```json -{ - "took" : 3, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 3, - "relation" : "eq" - }, - "max_score" : 1.0, - "hits" : [ - { - "_index" : "national_parks", - "_id" : "1", - "_score" : 1.0, - "_source" : { - "name" : "Yellowstone National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -111.15, - 45.12 - ], - [ - -109.83, - 44.12 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "2", - "_score" : 1.0, - "_source" : { - "name" : "Yosemite National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -120.23, - 38.16 - ], - [ - -119.05, - 37.45 - ] - ] - } - } - }, - { - "_index" : "national_parks", - "_id" : "3", - "_score" : 1.0, - "_source" : { - "name" : "Death Valley National Park", - "location" : { - "type" : "envelope", - "coordinates" : [ - [ - -117.34, - 37.01 - ], - [ - -116.38, - 36.25 - ] - ] - } - } - } - ] - }, - "aggregations" : { - "Grouped" : { - "bounds" : { - "top_left" : { - "lat" : 45.11999997776002, - "lon" : -120.23000006563962 - }, - "bottom_right" : { - "lat" : 36.249999976716936, - "lon" : -109.83000006526709 - } - } - } - } -} -``` -
- -Currently, OpenSearch supports geoshape aggregation through the API but not in OpenSearch Dashboards visualizations. If you'd like to see geoshape aggregation implemented for visualizations, upvote the related [GitHub issue](https://github.com/opensearch-project/dashboards-maps/issues/250). -{: .note} diff --git a/_aggregations/metric/index.md b/_aggregations/metric/index.md deleted file mode 100644 index 7553933c32..0000000000 --- a/_aggregations/metric/index.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -layout: default -title: Metric aggregations -has_children: true -has_toc: false -nav_order: 2 -redirect_from: - - /opensearch/metric-agg/ - - /query-dsl/aggregations/metric-agg/ - - /aggregations/metric-agg/ - - /query-dsl/aggregations/metric/ ---- - -# Metric aggregations - -Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field. - -## Types of metric aggregations - -There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations. - -### Single-value metric aggregations - -Single-value metric aggregations return a single metric, for example, `sum`, `min`, `max`, `avg`, `cardinality`, or `value_count`. - -### Multi-value metric aggregations - -Multi-value metric aggregations return more than one metric. These include `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`. - -## Supported metric aggregations - -OpenSearch supports the following metric aggregations: - -- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/) -- [Cardinality]({{site.url}}{{site.baseurl}}/aggregations/metric/cardinality/) -- [Extended stats]({{site.url}}{{site.baseurl}}/aggregations/metric/extended-stats/) -- [Geobounds]({{site.url}}{{site.baseurl}}/aggregations/metric/geobounds/) -- [Matrix stats]({{site.url}}{{site.baseurl}}/aggregations/metric/matrix-stats/) -- [Maximum]({{site.url}}{{site.baseurl}}/aggregations/metric/maximum/) -- [Minimum]({{site.url}}{{site.baseurl}}/aggregations/metric/minimum/) -- [Percentile ranks]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile-ranks/) -- [Percentile]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile/) -- [Scripted metric]({{site.url}}{{site.baseurl}}/aggregations/metric/scripted-metric/) -- [Stats]({{site.url}}{{site.baseurl}}/aggregations/metric/stats/) -- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/) -- [Top hits]({{site.url}}{{site.baseurl}}/aggregations/metric/top-hits/) -- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/) \ No newline at end of file diff --git a/_aggregations/metric/matrix-stats.md b/_aggregations/metric/matrix-stats.md deleted file mode 100644 index 475e0caa24..0000000000 --- a/_aggregations/metric/matrix-stats.md +++ /dev/null @@ -1,87 +0,0 @@ ---- -layout: default -title: Matrix stats -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 50 -redirect_from: - - /query-dsl/aggregations/metric/matrix-stats/ ---- - -# Matrix stats aggregations - -The `matrix_stats` aggregation generates advanced stats for multiple fields in a matrix form. -The following example returns advanced stats in a matrix form for the `taxful_total_price` and `products.base_price` fields: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "matrix_stats_taxful_total_price": { - "matrix_stats": { - "fields": ["taxful_total_price", "products.base_price"] - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "matrix_stats_taxful_total_price" : { - "doc_count" : 4675, - "fields" : [ - { - "name" : "products.base_price", - "count" : 4675, - "mean" : 34.994239430147196, - "variance" : 360.5035285833703, - "skewness" : 5.530161335032702, - "kurtosis" : 131.16306324042148, - "covariance" : { - "products.base_price" : 360.5035285833703, - "taxful_total_price" : 846.6489362233166 - }, - "correlation" : { - "products.base_price" : 1.0, - "taxful_total_price" : 0.8444765264325268 - } - }, - { - "name" : "taxful_total_price", - "count" : 4675, - "mean" : 75.05542864304839, - "variance" : 2788.1879749835402, - "skewness" : 15.812149139924037, - "kurtosis" : 619.1235507385902, - "covariance" : { - "products.base_price" : 846.6489362233166, - "taxful_total_price" : 2788.1879749835402 - }, - "correlation" : { - "products.base_price" : 0.8444765264325268, - "taxful_total_price" : 1.0 - } - } - ] - } - } -} -``` - -The following table lists all response fields. - -Statistic | Description -:--- | :--- -`count` | The number of samples measured. -`mean` | The average value of the field measured from the sample. -`variance` | How far the values of the field measured are spread out from its mean value. The larger the variance, the more it's spread from its mean value. -`skewness` | An asymmetric measure of the distribution of the field's values around the mean. -`kurtosis` | A measure of the tail heaviness of a distribution. As the tail becomes lighter, kurtosis decreases. As the tail becomes heavier, kurtosis increases. To learn about kurtosis, see [Wikipedia](https://en.wikipedia.org/wiki/Kurtosis). -`covariance` | A measure of the joint variability between two fields. A positive value means their values move in the same direction and the other way around. -`correlation` | A measure of the strength of the relationship between two fields. The valid values are between [-1, 1]. A value of -1 means that the value is negatively correlated and a value of 1 means that it's positively correlated. A value of 0 means that there's no identifiable relationship between them. \ No newline at end of file diff --git a/_aggregations/metric/maximum.md b/_aggregations/metric/maximum.md deleted file mode 100644 index 63b4d62a7b..0000000000 --- a/_aggregations/metric/maximum.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -layout: default -title: Maximum -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 60 -redirect_from: - - /query-dsl/aggregations/metric/maximum/ ---- - -# Maximum aggregations - -The `max` metric is a single-value metric aggregations that returns the maximum value of a field. - -The following example calculates the maximum of the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "max_taxful_total_price": { - "max": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 17, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4675, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "max_taxful_total_price": { - "value": 2250 - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/minimum.md b/_aggregations/metric/minimum.md deleted file mode 100644 index dd17c854a9..0000000000 --- a/_aggregations/metric/minimum.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -layout: default -title: Minimum -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 70 -redirect_from: - - /query-dsl/aggregations/metric/minimum/ ---- - -# Minimum aggregations - -The `min` metric is a single-value metric aggregations that returns the minimum value of a field. - -The following example calculates the minimum of the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "min_taxful_total_price": { - "min": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 13, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4675, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "min_taxful_total_price": { - "value": 6.98828125 - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/percentile-ranks.md b/_aggregations/metric/percentile-ranks.md deleted file mode 100644 index 33ccb3d291..0000000000 --- a/_aggregations/metric/percentile-ranks.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -layout: default -title: Percentile ranks -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 80 -redirect_from: - - /query-dsl/aggregations/metric/percentile-ranks/ ---- - -# Percentile rank aggregations - -Percentile rank is the percentile of values at or below a threshold grouped by a specified value. For example, if a value is greater than or equal to 80% of the values, it has a percentile rank of 80. - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "percentile_rank_taxful_total_price": { - "percentile_ranks": { - "field": "taxful_total_price", - "values": [ - 10, - 15 - ] - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "percentile_rank_taxful_total_price" : { - "values" : { - "10.0" : 0.055096056411283456, - "15.0" : 0.0830092961834656 - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/percentile.md b/_aggregations/metric/percentile.md deleted file mode 100644 index c68b0e0ec7..0000000000 --- a/_aggregations/metric/percentile.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -layout: default -title: Percentile -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 90 -redirect_from: - - /query-dsl/aggregations/metric/percentile/ ---- - -# Percentile aggregations - -Percentile is the percentage of the data that's at or below a certain threshold value. - -The `percentile` metric is a multi-value metric aggregation that lets you find outliers in your data or figure out the distribution of your data. - -Like the `cardinality` metric, the `percentile` metric is also approximate. - -The following example calculates the percentile in relation to the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "percentile_taxful_total_price": { - "percentiles": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "percentile_taxful_total_price" : { - "values" : { - "1.0" : 21.984375, - "5.0" : 27.984375, - "25.0" : 44.96875, - "50.0" : 64.22061688311689, - "75.0" : 93.0, - "95.0" : 156.0, - "99.0" : 222.0 - } - } - } -} -``` diff --git a/_aggregations/metric/scripted-metric.md b/_aggregations/metric/scripted-metric.md deleted file mode 100644 index d1807efbc0..0000000000 --- a/_aggregations/metric/scripted-metric.md +++ /dev/null @@ -1,73 +0,0 @@ ---- -layout: default -title: Scripted metric -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 100 -redirect_from: - - /query-dsl/aggregations/metric/scripted-metric/ ---- - -# Scripted metric aggregations - -The `scripted_metric` metric is a multi-value metric aggregation that returns metrics calculated from a specified script. - -A script has four stages: the initial stage, the map stage, the combine stage, and the reduce stage. - -* `init_script`: (OPTIONAL) Sets the initial state and executes before any collection of documents. -* `map_script`: Checks the value of the `type` field and executes the aggregation on the collected documents. -* `combine_script`: Aggregates the state returned from every shard. The aggregated value is returned to the coordinating node. -* `reduce_script`: Provides access to the variable states; this variable combines the results from the `combine_script` on each shard into an array. - -The following example aggregates the different HTTP response types in web log data: - -```json -GET opensearch_dashboards_sample_data_logs/_search -{ - "size": 0, - "aggregations": { - "responses.counts": { - "scripted_metric": { - "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]", - "map_script": """ - def code = doc['response.keyword'].value; - if (code.startsWith('5') || code.startsWith('4')) { - state.responses.error += 1 ; - } else if(code.startsWith('2')) { - state.responses.success += 1; - } else { - state.responses.other += 1; - } - """, - "combine_script": "state.responses", - "reduce_script": """ - def counts = ['error': 0L, 'success': 0L, 'other': 0L]; - for (responses in states) { - counts.error += responses['error']; - counts.success += responses['success']; - counts.other += responses['other']; - } - return counts; - """ - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "responses.counts" : { - "value" : { - "other" : 0, - "success" : 12832, - "error" : 1242 - } - } - } -} -``` diff --git a/_aggregations/metric/stats.md b/_aggregations/metric/stats.md deleted file mode 100644 index 0a54831522..0000000000 --- a/_aggregations/metric/stats.md +++ /dev/null @@ -1,46 +0,0 @@ ---- -layout: default -title: Stats -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 110 -redirect_from: - - /query-dsl/aggregations/metric/stats/ ---- - -# Stats aggregations - -The `stats` metric is a multi-value metric aggregation that returns all basic metrics such as `min`, `max`, `sum`, `avg`, and `value_count` in one aggregation query. - -The following example returns the basic stats for the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "stats_taxful_total_price": { - "stats": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "stats_taxful_total_price" : { - "count" : 4675, - "min" : 6.98828125, - "max" : 2250.0, - "avg" : 75.05542864304813, - "sum" : 350884.12890625 - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/sum.md b/_aggregations/metric/sum.md deleted file mode 100644 index 0320de63fc..0000000000 --- a/_aggregations/metric/sum.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -layout: default -title: Sum -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 120 -redirect_from: - - /query-dsl/aggregations/metric/sum/ ---- - -# Sum aggregations - -The `sum` metric is a single-value metric aggregations that returns the sum of the values of a field. - -The following example calculates the total sum of the `taxful_total_price` field: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "sum_taxful_total_price": { - "sum": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -{ - "took": 16, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4675, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "sum_taxful_total_price": { - "value": 350884.12890625 - } - } -} -``` diff --git a/_aggregations/metric/top-hits.md b/_aggregations/metric/top-hits.md deleted file mode 100644 index b6752300b2..0000000000 --- a/_aggregations/metric/top-hits.md +++ /dev/null @@ -1,149 +0,0 @@ ---- -layout: default -title: Top hits -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 130 -redirect_from: - - /query-dsl/aggregations/metric/top-hits/ ---- - -# Top hits aggregations - -The `top_hits` metric is a multi-value metric aggregation that ranks the matching documents based on a relevance score for the field that's being aggregated. - -You can specify the following options: - -- `from`: The starting position of the hit. -- `size`: The maximum size of hits to return. The default value is 3. -- `sort`: How the matching hits are sorted. By default, the hits are sorted by the relevance score of the aggregation query. - -The following example returns the top 5 products in your eCommerce data: - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "top_hits_products": { - "top_hits": { - "size": 5 - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... -"aggregations" : { - "top_hits_products" : { - "hits" : { - "total" : { - "value" : 4675, - "relation" : "eq" - }, - "max_score" : 1.0, - "hits" : [ - { - "_index" : "opensearch_dashboards_sample_data_ecommerce", - "_type" : "_doc", - "_id" : "glMlwXcBQVLeQPrkHPtI", - "_score" : 1.0, - "_source" : { - "category" : [ - "Women's Accessories", - "Women's Clothing" - ], - "currency" : "EUR", - "customer_first_name" : "rania", - "customer_full_name" : "rania Evans", - "customer_gender" : "FEMALE", - "customer_id" : 24, - "customer_last_name" : "Evans", - "customer_phone" : "", - "day_of_week" : "Sunday", - "day_of_week_i" : 6, - "email" : "rania@evans-family.zzz", - "manufacturer" : [ - "Tigress Enterprises" - ], - "order_date" : "2021-02-28T14:16:48+00:00", - "order_id" : 583581, - "products" : [ - { - "base_price" : 10.99, - "discount_percentage" : 0, - "quantity" : 1, - "manufacturer" : "Tigress Enterprises", - "tax_amount" : 0, - "product_id" : 19024, - "category" : "Women's Accessories", - "sku" : "ZO0082400824", - "taxless_price" : 10.99, - "unit_discount_amount" : 0, - "min_price" : 5.17, - "_id" : "sold_product_583581_19024", - "discount_amount" : 0, - "created_on" : "2016-12-25T14:16:48+00:00", - "product_name" : "Snood - white/grey/peach", - "price" : 10.99, - "taxful_price" : 10.99, - "base_unit_price" : 10.99 - }, - { - "base_price" : 32.99, - "discount_percentage" : 0, - "quantity" : 1, - "manufacturer" : "Tigress Enterprises", - "tax_amount" : 0, - "product_id" : 19260, - "category" : "Women's Clothing", - "sku" : "ZO0071900719", - "taxless_price" : 32.99, - "unit_discount_amount" : 0, - "min_price" : 17.15, - "_id" : "sold_product_583581_19260", - "discount_amount" : 0, - "created_on" : "2016-12-25T14:16:48+00:00", - "product_name" : "Cardigan - grey", - "price" : 32.99, - "taxful_price" : 32.99, - "base_unit_price" : 32.99 - } - ], - "sku" : [ - "ZO0082400824", - "ZO0071900719" - ], - "taxful_total_price" : 43.98, - "taxless_total_price" : 43.98, - "total_quantity" : 2, - "total_unique_products" : 2, - "type" : "order", - "user" : "rani", - "geoip" : { - "country_iso_code" : "EG", - "location" : { - "lon" : 31.3, - "lat" : 30.1 - }, - "region_name" : "Cairo Governorate", - "continent_name" : "Africa", - "city_name" : "Cairo" - }, - "event" : { - "dataset" : "sample_ecommerce" - } - } - ... - } - ] - } - } - } -} -``` \ No newline at end of file diff --git a/_aggregations/metric/value-count.md b/_aggregations/metric/value-count.md deleted file mode 100644 index dfddaf9417..0000000000 --- a/_aggregations/metric/value-count.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -layout: default -title: Value count -parent: Metric aggregations -grand_parent: Aggregations -nav_order: 140 -redirect_from: - - /query-dsl/aggregations/metric/value-count/ ---- - -# Value count aggregations - -The `value_count` metric is a single-value metric aggregation that calculates the number of values that an aggregation is based on. - -For example, you can use the `value_count` metric with the `avg` metric to find how many numbers the aggregation uses to calculate an average value. - -```json -GET opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "number_of_values": { - "value_count": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response - -```json -... - "aggregations" : { - "number_of_values" : { - "value" : 4675 - } - } -} -``` \ No newline at end of file diff --git a/_analyzers/index-analyzers.md b/_analyzers/index-analyzers.md deleted file mode 100644 index 72332758d0..0000000000 --- a/_analyzers/index-analyzers.md +++ /dev/null @@ -1,65 +0,0 @@ ---- -layout: default -title: Index analyzers -nav_order: 20 ---- - -# Index analyzers - -Index analyzers are specified at indexing time and are used to analyze [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) fields when indexing a document. - -## Determining which index analyzer to use - -To determine which analyzer to use for a field when a document is indexed, OpenSearch examines the following parameters in order: - -1. The `analyzer` mapping parameter of the field -1. The `analysis.analyzer.default` index setting -1. The `standard` analyzer (default) - -When specifying an index analyzer, keep in mind that in most cases, specifying an analyzer for each `text` field in an index works best. Analyzing both the text field (at indexing time) and the query string (at query time) with the same analyzer ensures that the search uses the same terms as those that are stored in the index. -{: .important } - -For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings). - -## Specifying an index analyzer for a field - -When creating index mappings, you can supply the `analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. For example, the following request specifies the `simple` analyzer for the `text_entry` field: - -```json -PUT testindex -{ - "mappings": { - "properties": { - "text_entry": { - "type": "text", - "analyzer": "simple" - } - } - } -} -``` -{% include copy-curl.html %} - -## Specifying a default index analyzer for an index - -If you want to use the same analyzer for all text fields in an index, you can specify it in the `analysis.analyzer.default` setting as follows: - -```json -PUT testindex -{ - "settings": { - "analysis": { - "analyzer": { - "default": { - "type": "simple" - } - } - } - } -} -``` -{% include copy-curl.html %} - -If you don't specify a default analyzer, the `standard` analyzer is used. -{: .note} - diff --git a/_analyzers/index.md b/_analyzers/index.md deleted file mode 100644 index ff7fb88094..0000000000 --- a/_analyzers/index.md +++ /dev/null @@ -1,163 +0,0 @@ ---- -layout: default -title: Text analysis -has_children: true -nav_order: 5 -nav_exclude: true -has_toc: false -redirect_from: - - /opensearch/query-dsl/text-analyzers/ - - /query-dsl/analyzers/text-analyzers/ - - /analyzers/text-analyzers/ ---- - -# Text analysis - -When you are searching documents using a full-text search, you want to receive all relevant results and not only exact matches. If you're looking for "walk", you're interested in results that contain any form of the word, like "Walk", "walked", or "walking." To facilitate full-text search, OpenSearch uses text analysis. - -Text analysis consists of the following steps: - -1. _Tokenize_ text into terms: For example, after tokenization, the phrase `Actions speak louder than words` is split into tokens `Actions`, `speak`, `louder`, `than`, and `words`. -1. _Normalize_ the terms by converting them into a standard format, for example, converting them to lowercase or performing stemming (reducing the word to its root): For example, after normalization, `Actions` becomes `action`, `louder` becomes `loud`, and `words` becomes `word`. - -## Analyzers - -In OpenSearch, text analysis is performed by an _analyzer_. Each analyzer contains the following sequentially applied components: - -1. **Character filters**: First, a character filter receives the original text as a stream of characters and adds, removes, or modifies characters in the text. For example, a character filter can strip HTML characters from a string so that the text `

Actions speak louder than words

` becomes `\nActions speak louder than words\n`. The output of a character filter is a stream of characters. - -1. **Tokenizer**: Next, a tokenizer receives the stream of characters that has been processed by the character filter and splits the text into individual _tokens_ (usually, words). For example, a tokenizer can split text on white space so that the preceding text becomes [`Actions`, `speak`, `louder`, `than`, `words`]. Tokenizers also maintain metadata about tokens, such as their starting and ending positions in the text. The output of a tokenizer is a stream of tokens. - -1. **Token filters**: Last, a token filter receives the stream of tokens from the tokenizer and adds, removes, or modifies tokens. For example, a token filter may lowercase the tokens so that `Actions` becomes `action`, remove stopwords like `than`, or add synonyms like `talk` for the word `speak`. - -An analyzer must contain exactly one tokenizer and may contain zero or more character filters and zero or more token filters. -{: .note} - -## Built-in analyzers - -The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`. - -Analyzer | Analysis performed | Analyzer output -:--- | :--- | :--- -**Standard** (default) | - Parses strings into tokens at word boundaries
- Removes most punctuation
- Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] -**Simple** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`] -**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`] -**Stop** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Removes stop words
- Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`] -**Keyword** (noop) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`] -**Pattern** | - Parses strings into tokens using regular expressions
- Supports converting strings to lowercase
- Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`] -[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`] -**Fingerprint** | - Parses strings on any non-letter character
- Normalizes characters by converting them to ASCII
- Converts tokens to lowercase
- Sorts, deduplicates, and concatenates tokens into a single token
- Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`]
Note that the apostrophe was converted to its ASCII counterpart. - -## Custom analyzers - -If needed, you can combine tokenizers, token filters, and character filters to create a custom analyzer. - -## Text analysis at indexing time and query time - -OpenSearch performs text analysis on text fields when you index a document and when you send a search request. Depending on the time of text analysis, the analyzers used for it are classified as follows: - -- An _index analyzer_ performs analysis at indexing time: When you are indexing a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field, OpenSearch analyzes it before indexing it. For more information about ways to specify index analyzers, see [Index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/). - -- A _search analyzer_ performs analysis at query time: OpenSearch analyzes the query string when you run a full-text query on a text field. For more information about ways to specify search analyzers, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). - -In most cases, you should use the same analyzer at both indexing and search time because the text field and the query string will be analyzed in the same way and the resulting tokens will match as expected. -{: .tip} - -### Example - -When you index a document that has a text field with the text `Actions speak louder than words`, OpenSearch analyzes the text and produces the following list of tokens: - -Text field tokens = [`action`, `speak`, `loud`, `than`, `word`] - -When you search for documents that match the query `speaking loudly`, OpenSearch analyzes the query string and produces the following list of tokens: - -Query string tokens = [`speak`, `loud`] - -Then OpenSearch compares each token in the query string against the list of text field tokens and finds that both lists contain the tokens `speak` and `loud`, so OpenSearch returns this document as part of the search results that match the query. - -## Testing an analyzer - -To test a built-in analyzer and view the list of tokens it generates when a document is indexed, you can use the [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/#apply-a-built-in-analyzer). - -Specify the analyzer and the text to be analyzed in the request: - -```json -GET /_analyze -{ - "analyzer" : "standard", - "text" : "Let’s contribute to OpenSearch!" -} -``` -{% include copy-curl.html %} - -The following image shows the query string. - -![Query string with indices]({{site.url}}{{site.baseurl}}/images/string-indices.png) - -The response contains each token and its start and end offsets that correspond to the starting index in the original string (inclusive) and the ending index (exclusive): - -```json -{ - "tokens": [ - { - "token": "let’s", - "start_offset": 0, - "end_offset": 5, - "type": "", - "position": 0 - }, - { - "token": "contribute", - "start_offset": 6, - "end_offset": 16, - "type": "", - "position": 1 - }, - { - "token": "to", - "start_offset": 17, - "end_offset": 19, - "type": "", - "position": 2 - }, - { - "token": "opensearch", - "start_offset": 20, - "end_offset": 30, - "type": "", - "position": 3 - } - ] -} -``` - -## Verifying analyzer settings - -To verify which analyzer is associated with which field, you can use the get mapping API operation: - -```json -GET /testindex/_mapping -``` -{% include copy-curl.html %} - -The response provides information about the analyzers for each field: - -```json -{ - "testindex": { - "mappings": { - "properties": { - "text_entry": { - "type": "text", - "analyzer": "simple", - "search_analyzer": "whitespace" - } - } - } - } -} -``` - -## Next steps - -- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). \ No newline at end of file diff --git a/_analyzers/search-analyzers.md b/_analyzers/search-analyzers.md deleted file mode 100644 index b47e739d28..0000000000 --- a/_analyzers/search-analyzers.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -layout: default -title: Search analyzers -nav_order: 30 ---- - -# Search analyzers - -Search analyzers are specified at query time and are used to analyze the query string when you run a full-text query on a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. - -## Determining which search analyzer to use - -To determine which analyzer to use for a query string at query time, OpenSearch examines the following parameters in order: - -1. The `analyzer` parameter of the query -1. The `search_analyzer` mapping parameter of the field -1. The `analysis.analyzer.default_search` index setting -1. The `analyzer` mapping parameter of the field -1. The `standard` analyzer (default) - -In most cases, specifying a search analyzer that is different from the index analyzer is not necessary and could negatively impact search result relevance or lead to unexpected search results. -{: .warning} - -For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings). - -## Specifying a search analyzer for a query string - -Specify the name of the analyzer you want to use at query time in the `analyzer` field: - -```json -GET shakespeare/_search -{ - "query": { - "match": { - "text_entry": { - "query": "speak the truth", - "analyzer": "english" - } - } - } -} -``` -{% include copy-curl.html %} - -Valid values for [built-in analyzers]({{site.url}}{{site.baseurl}}/analyzers/index#built-in-analyzers) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/). - -## Specifying a search analyzer for a field - -When creating index mappings, you can provide the `search_analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. When providing the `search_analyzer`, you must also provide the `analyzer` parameter, which specifies the [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time. - -For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `text_entry` field: - -```json -PUT testindex -{ - "mappings": { - "properties": { - "text_entry": { - "type": "text", - "analyzer": "simple", - "search_analyzer": "whitespace" - } - } - } -} -``` -{% include copy-curl.html %} - -## Specifying the default search analyzer for an index - -If you want to analyze all query strings at search time with the same analyzer, you can specify the search analyzer in the `analysis.analyzer.default_search` setting. When providing the `analysis.analyzer.default_search`, you must also provide the `analysis.analyzer.default` parameter, which specifies the [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time. - -For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `testindex` index: - -```json -PUT testindex -{ - "settings": { - "analysis": { - "analyzer": { - "default": { - "type": "simple" - }, - "default_search": { - "type": "whitespace" - } - } - } - } -} - -``` -{% include copy-curl.html %} diff --git a/_api-reference/index-apis/alias.md b/_api-reference/alias.md similarity index 94% rename from _api-reference/index-apis/alias.md rename to _api-reference/alias.md index 96272b4698..2a19f1522e 100644 --- a/_api-reference/index-apis/alias.md +++ b/_api-reference/alias.md @@ -1,11 +1,9 @@ --- layout: default title: Alias -parent: Index APIs nav_order: 5 redirect_from: - /opensearch/rest-api/alias/ - - /api-reference/alias/ --- # Alias @@ -50,7 +48,7 @@ All alias parameters are optional. Parameter | Data Type | Description :--- | :--- | :--- -cluster_manager_timeout | Time | The amount of time to wait for a response from the cluster manager node. Default is `30s`. +master_timeout | Time | The amount of time to wait for a response from the master node. Default is `30s`. timeout | Time | The amount of time to wait for a response from the cluster. Default is `30s`. ## Request body diff --git a/_api-reference/analyze-apis/index.md b/_api-reference/analyze-apis/index.md new file mode 100644 index 0000000000..8d415339af --- /dev/null +++ b/_api-reference/analyze-apis/index.md @@ -0,0 +1,12 @@ +--- +layout: default +title: Analyze API +has_children: true +nav_order: 7 +redirect_from: + - /opensearch/rest-api/analyze-apis/ +--- + +# Analyze API + +The analyze API allows you to perform text analysis, which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search. \ No newline at end of file diff --git a/_api-reference/analyze-apis.md b/_api-reference/analyze-apis/perform-text-analysis.md similarity index 93% rename from _api-reference/analyze-apis.md rename to _api-reference/analyze-apis/perform-text-analysis.md index a820dd281e..08a2cbf741 100644 --- a/_api-reference/analyze-apis.md +++ b/_api-reference/analyze-apis/perform-text-analysis.md @@ -1,20 +1,16 @@ --- layout: default -title: Analyze API -has_children: true -nav_order: 7 -redirect_from: - - /opensearch/rest-api/analyze-apis/ - - /api-reference/analyze-apis/ ---- +title: Perform text analysis +parent: Analyze API -# Analyze API +nav_order: 2 +--- -The Analyze API allows you to perform [text analysis]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/), which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search. +# Perform text analysis -The Analyze API analyzes a text string and returns the resulting tokens. +The perform text analysis API analyzes a text string and returns the resulting tokens. -If you use the Security plugin, you must have the `manage index` privilege. If you only want to analyze text, you must have the `manage cluster` privilege. +If you use the Security plugin, you must have the `manage index` privilege. If you simply want to analyze text, you must have the `manager cluster` privilege. {: .note} ## Path and HTTP methods @@ -26,7 +22,7 @@ POST /_analyze POST /{index}/_analyze ``` -Although you can issue an analyze request using both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared with data that is already in the index. `POST` requests are not cached. +Although you can issue an analyzer request via both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared to data that is already in the index. `POST` requests are not cached. {: .note} ## Path parameter @@ -653,7 +649,7 @@ PUT /books2 ```` {% include copy-curl.html %} -The preceding request is an index API rather than an analyze API. See [DYNAMIC INDEX SETTINGS]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/#dynamic-index-settings) for additional details. +The preceding request is an index API rather than an analyze API. See [DYNAMIC INDEX SETTINGS]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#dynamic-index-settings) for additional details. {: .note} ### Response fields diff --git a/_api-reference/analyze-apis/terminology.md b/_api-reference/analyze-apis/terminology.md index 17d26308ae..364440545a 100644 --- a/_api-reference/analyze-apis/terminology.md +++ b/_api-reference/analyze-apis/terminology.md @@ -20,7 +20,7 @@ If needed, you can combine tokenizers, token filters, and character filters to c #### Tokenizers -Tokenizers break unstructured text into tokens and maintain metadata about tokens, such as their starting and ending positions in the text. +Tokenizers break unstuctured text into tokens and maintain metadata about tokens, such as their start and ending positions in the text. #### Character filters diff --git a/_api-reference/cat/cat-cluster_manager.md b/_api-reference/cat/cat-cluster_manager.md index 152584dd57..2508fce675 100644 --- a/_api-reference/cat/cat-cluster_manager.md +++ b/_api-reference/cat/cat-cluster_manager.md @@ -2,8 +2,7 @@ layout: default title: CAT cluster manager parent: CAT API -redirect_from: - - /opensearch/rest-api/cat/cat-master/ + nav_order: 30 has_children: false --- diff --git a/_api-reference/cat/cat-segment-replication.md b/_api-reference/cat/cat-segment-replication.md index 2b5b5d2e0d..9a84b861eb 100644 --- a/_api-reference/cat/cat-segment-replication.md +++ b/_api-reference/cat/cat-segment-replication.md @@ -34,18 +34,17 @@ Parameter | Type | Description The CAT segment replication API operation supports the following optional query parameters. -Parameter | Data type | Description -:--- |:-----------| :--- -`active_only` | Boolean | If `true`, the response only includes active segment replications. Defaults to `false`. -[`detailed`](#additional-detailed-response-metrics) | String | If `true`, the response includes additional metrics for each stage of a segment replication event. Defaults to `false`. -`shards` | String | A comma-separated list of shards to display. -`bytes` | Byte units | [Units]({{site.url}}{{site.baseurl}}/opensearch/units/) used to display byte size values. -`format` | String | A short version of the HTTP accept header. Valid values include `JSON` and `YAML`. -`h` | String | A comma-separated list of column names to display. -`help` | Boolean | If `true`, the response includes help information. Defaults to `false`. -`time` | Time units | [Units]({{site.url}}{{site.baseurl}}/opensearch/units/) used to display time values. -`v` | Boolean | If `true`, the response includes column headings. Defaults to `false`. -`s` | String | Specifies to sort the results. For example, `s=shardId:desc` sorts by shardId in descending order. +Parameter | Data type | Description +:--- |:---| :--- +`active_only` | Boolean | If `true`, the response only includes active segment replications. Defaults to `false`. +[`detailed`](#additional-detailed-response-metrics) | String | If `true`, the response includes additional metrics for each stage of a segment replication event. Defaults to `false`. +`shards` | String | A comma-separated list of shards to display. +`format` | String | A short version of the HTTP accept header. Valid values include `JSON` and `YAML`. +`h` | String | A comma-separated list of column names to display. +`help` | Boolean | If `true`, the response includes help information. Defaults to `false`. +`time` | Time value | [Units]({{site.url}}{{site.baseurl}}/opensearch/units) used to display time values. Defaults to `ms` (milliseconds). +`v` | Boolean | If `true`, the response includes column headings. Defaults to `false`. +`s` | String | Specifies to sort the results. For example, `s=shardId:desc` sorts by shardId in descending order. ## Examples diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 657c1eb45f..a4b6370629 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -14,9 +14,6 @@ Introduced 1.0 The bulk operation lets you add, update, or delete multiple documents in a single request. Compared to individual OpenSearch indexing requests, the bulk operation has significant performance benefits. Whenever practical, we recommend batching indexing operations into bulk requests. -Beginning in OpenSearch 2.9, when indexing documents using the bulk operation, the document `_id` must be 512 bytes or less in size. -{: .note} - ## Example ```json diff --git a/_api-reference/document-apis/index-document.md b/_api-reference/document-apis/index-document.md index 05cb787e92..90247c4dd8 100644 --- a/_api-reference/document-apis/index-document.md +++ b/_api-reference/document-apis/index-document.md @@ -4,7 +4,7 @@ title: Index document parent: Document APIs nav_order: 1 redirect_from: - - /opensearch/rest-api/document-apis/index-document/ + - /opensearch/rest-api/document-apis/index-document --- # Index document @@ -91,6 +91,6 @@ result | The result of the index operation. _shards | Detailed information about the cluster's shards. total | The total number of shards. successful | The number of shards OpenSearch successfully added the document to. -failed | The number of shards OpenSearch failed to add the document to. +failed | The number of shards OpenSearch failed to added the document to. _seq_no | The sequence number assigned when the document was indexed. _primary_term | The primary term assigned when the document was indexed. diff --git a/_api-reference/document-apis/multi-get.md b/_api-reference/document-apis/multi-get.md index ff2e6de59b..cde566d7c7 100644 --- a/_api-reference/document-apis/multi-get.md +++ b/_api-reference/document-apis/multi-get.md @@ -4,57 +4,16 @@ title: Multi-get document parent: Document APIs nav_order: 30 redirect_from: - - /opensearch/rest-api/document-apis/multi-get/ + - /opensearch/rest-api/document-apis/mulit-get/ --- # Multi-get documents Introduced 1.0 {: .label .label-purple } -The multi-get operation allows you to run multiple GET operations in one request, so you can get back all documents that match your criteria. +The multi-get operation allows you to execute multiple GET operations in one request, so you can get back all documents that match your criteria. -## Path and HTTP methods - -``` -GET _mget -GET /_mget -POST _mget -POST /_mget -``` - -## URL parameters - -All multi-get URL parameters are optional. - -Parameter | Type | Description -:--- | :--- | :--- | :--- -<index> | String | Name of the index to retrieve documents from. -preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is random. -realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`. -refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. -routing | String | Value used to route the multi-get operation to a specific shard. -stored_fields | Boolean | Specifies whether OpenSearch should retrieve documents fields from the index instead of the document's `_source`. Default is `false`. -_source | String | Whether to include the `_source` field in the query response. Default is `true`. -_source_excludes | String | A comma-separated list of source fields to exclude in the query response. -_source_includes | String | A comma-separated list of source fields to include in the query response. - -## Request body - -If you don't specify an index in your request's URL, you must specify your target indexes and the relevant document IDs in the request body. Other fields are optional. - -Field | Type | Description | Required -:--- | :--- | :--- | :--- -docs | Array | The documents you want to retrieve data from. Can contain the attributes: `_id`, `_index`, `_routing`, `_source`, and `_stored_fields`. If you specify an index in the URL, you can omit this field and add IDs of the documents to retrieve. | Yes if an index is not specified in the URL -_id | String | The ID of the document. | Yes if `docs` is specified in the request body -_index | String | Name of the index. | Yes if an index is not specified in the URL -_routing | String | The value of the shard that has the document. | Yes if a routing value was used when indexing the document -_source | Object | Specifies whether to return the `_source` field from an index (boolean), whether to return specific fields (array), or whether to include or exclude certain fields. | No -_source.includes | Array | Specifies which fields to include in the query response. For example, `"_source": { "include": ["Title"] }` retrieves `Title` from the index. | No -_source.excludes | Array | Specifies which fields to exclude in the query response. For example, `"_source": { "exclude": ["Director"] }` excludes `Director` from the query response. | No -ids | Array | IDs of the documents to retrieve. Only allowed when an index is specified in the URL. | No - - -#### Example without specifying index in URL +## Example without specifying index in URL ```json GET _mget @@ -76,10 +35,11 @@ GET _mget ``` {% include copy-curl.html %} -#### Example of specifying index in URL +## Example of specifying index in URL ```json GET sample-index1/_mget + { "docs": [ { @@ -95,7 +55,45 @@ GET sample-index1/_mget ``` {% include copy-curl.html %} -#### Example Response +## Path and HTTP methods + +``` +GET _mget +GET /_mget +``` + +## URL parameters + +All multi-get URL parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- | :--- +<index> | String | Name of the index to retrieve documents from. +preference | String | Specifies the nodes or shards OpenSearch should execute the multi-get operation on. Default is random. +realtime | Boolean | Specifies whether the operation should run in realtime. If false, the operation waits for the index to refresh to analyze the source to retrieve data, which makes the operation near-realtime. Default is `true`. +refresh | Boolean | If true, OpenSearch refreshes shards to make the multi-get operation available to search results. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is `false`. +routing | String | Value used to route the multi-get operation to a specific shard. +stored_fields | Boolean | Specifies whether OpenSearch should retrieve documents fields from the index instead of the document's `_source`. Default is `false`. +_source | String | Whether to include the `_source` field in the query response. Default is `true`. +_source_excludes | String | A comma-separated list of source fields to exclude in the query response. +_source_includes | String | A comma-separated list of source fields to include in the query response. + +## Request body + +If you don't specify an index in your request's URL, you must specify your target indexes and the relevant document IDs in the request body. Other fields are optional. + +Field | Type | Description | Required +:--- | :--- | :--- | :--- +docs | Array | The documents you want to retrieve data from. Can contain the attributes: `_id`, `_index`, `_routing`, `_source`, and `_stored_fields`. If you specify an index in the URL, you can omit this field and add IDs of the documents to retrieve. | Yes if an index is not specified in the URL +_id | String | The ID of the document. | Yes if `docs` is specified in the request body +_index | String | Name of the index. | Yes if an index is not specified in the URL +_routing | String | The value of the shard that has the document. | Yes if a routing value was used when indexing the document +_source | Object | Specifies whether to return the `_source` field from an index (boolean), whether to return specific fields (array), or whether to include or exclude certain fields. | No +_source.includes | Array | Specifies which fields to include in the query response. For example, `"_source": { "include": ["Title"] }` retrieves `Title` from the index. | No +_source.excludes | Array | Specifies which fields to exclude in the query response. For example, `"_source": { "exclude": ["Director"] }` excludes `Director` from the query response. | No +ids | Array | IDs of the documents to retrieve. Only allowed when an index is specified in the URL. | No + +## Response ```json { "docs": [ diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md index 845042f3f6..29735c05e9 100644 --- a/_api-reference/document-apis/reindex.md +++ b/_api-reference/document-apis/reindex.md @@ -4,8 +4,7 @@ title: Reindex document parent: Document APIs nav_order: 60 redirect_from: - - /opensearch/reindex-data/ - - /opensearch/rest-api/document-apis/reindex/ + - /opensearch/reindex-data/ --- # Reindex document diff --git a/_api-reference/document-apis/update-by-query.md b/_api-reference/document-apis/update-by-query.md index 528d30e156..03b3bc1e34 100644 --- a/_api-reference/document-apis/update-by-query.md +++ b/_api-reference/document-apis/update-by-query.md @@ -69,7 +69,7 @@ scroll | Time | How long to keep the search context open. scroll_size | Integer | Size of the operation's scroll request. Default is 1000. search_type | String | Whether OpenSearch should use global term and document frequencies calculating relevance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It’s usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It’s usually slower but more accurate. Default is `query_then_fetch`. search_timeout | Time | How long to wait until OpenSearch deems the request timed out. Default is no timeout. -slices | String or integer | The number slices to split an operation into for faster processing, specified by integer. When set to `auto` OpenSearch it should decides how many the number of slices for the operation. Default is `1`, which indicates an operation will not be split. +slices | Integer | Number of sub-tasks OpenSearch should divide this task into. Default is 1, which means OpenSearch should not divide this task. sort | List | A comma-separated list of <field> : <direction> pairs to sort by. _source | String | Whether to include the `_source` field in the response. _source_excludes | String | A comma-separated list of source fields to exclude from the response. diff --git a/_api-reference/document-apis/update-document.md b/_api-reference/document-apis/update-document.md index 7354277eaf..75241f7b54 100644 --- a/_api-reference/document-apis/update-document.md +++ b/_api-reference/document-apis/update-document.md @@ -50,7 +50,7 @@ Parameter | Type | Description | Required :--- | :--- | :--- | :--- <index> | String | Name of the index. | Yes <_id> | String | The ID of the document to update. | Yes -if_seq_no | Integer | Only perform the update operation if the document has the specified sequence number. | No +if_seq_no | Integer | Only perform the delete operation if the document's version number matches the specified number. | No if_primary_term | Integer | Perform the update operation if the document has the specified primary term. | No lang | String | Language of the script. Default is `painless`. | No require_alias | Boolean | Specifies whether the destination must be an index alias. Default is false. | No @@ -143,10 +143,10 @@ Field | Description _index | The name of the index. _id | The document's ID. _version | The document's version. -_result | The result of the update operation. +_result | The result of the delete operation. _shards | Detailed information about the cluster's shards. total | The total number of shards. -successful | The number of shards OpenSearch successfully updated the document in. -failed | The number of shards OpenSearch failed to update the document in. +successful | The number of shards OpenSearch successfully deleted the document from. +failed | The number of shards OpenSearch failed to delete the document from. _seq_no | The sequence number assigned when the document was indexed. _primary_term | The primary term assigned when the document was indexed. diff --git a/_api-reference/index-apis/clear-index-cache.md b/_api-reference/index-apis/clear-index-cache.md index 3f4e93e67b..fef459f01a 100644 --- a/_api-reference/index-apis/clear-index-cache.md +++ b/_api-reference/index-apis/clear-index-cache.md @@ -1,11 +1,11 @@ --- layout: default -title: Clear cache +title: Clear Index or Data Stream Cache parent: Index APIs -nav_order: 10 +nav_order: 320 --- -# Clear cache +## Clear index or data stream cache The clear cache API operation clears the caches of one or more indexes. For data streams, the API clears the caches of the stream’s backing indexes. @@ -13,14 +13,14 @@ The clear cache API operation clears the caches of one or more indexes. For data If you use the Security plugin, you must have the `manage index` privileges. {: .note} -## Path parameters +### Path parameters | Parameter | Data type | Description | :--- | :--- | :--- -| target | String | Comma-delimited list of data streams, indexes, and index aliases to which cache clearing is applied. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, omit this parameter or use `_all` or `*`. Optional. | +| target | String | Comma-delimited list of data streams, indexes, and index aliases to which cache clearing will be applied. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, omit this parameter or use `_all` or `*`. Optional. | -## Query parameters +### Query parameters All query parameters are optional. @@ -29,83 +29,75 @@ All query parameters are optional. | allow_no_indices | Boolean | Whether to ignore wildcards, index aliases, or `_all` target (`target` path parameter) values that don’t match any indexes. If `false`, the request returns an error if any wildcard expression, index alias, or `_all` target value doesn't match any indexes. This behavior also applies if the request targets include other open indexes. For example, a request where the target is `fig*,app*` returns an error if an index starts with `fig` but no index starts with `app`. Defaults to `true`. | | expand_wildcards | String | Determines the index types that wildcard expressions can expand to. Accepts multiple values separated by a comma, such as `open,hidden`. Valid values are:

`all` -- Expand to open, closed, and hidden indexes.

`open` -- Expand only to open indexes.

`closed` -- Expand only to closed indexes

`hidden` -- Expand to include hidden indexes. Must be combined with `open`, `closed`, or `both`.

`none` -- Expansions are not accepted.

Defaults to `open`. | | fielddata | Boolean | If `true`, clears the fields cache. Use the `fields` parameter to clear specific fields' caches. Defaults to `true`. | -| fields | String | Used in conjunction with the `fielddata` parameter. Comma-delimited list of field names that are cleared out of the cache. Does not support objects or field aliases. Defaults to all fields. | -| file | Boolean | If `true`, clears the unused entries from the file cache on nodes with the Search role. Defaults to `false`. | -| index | String | Comma-delimited list of index names that are cleared out of the cache. | +| fields | String | Used in conjunction with the `fielddata` parameter. Comma-delimited list of field names that will be cleared out of the cache. Does not support objects or field aliases. Defaults to all fields. | +| index | String | Comma-delimited list of index names that will be cleared out of the cache. | | ignore_unavailable | Boolean | If `true`, OpenSearch ignores missing or closed indexes. Defaults to `false`. | | query | Boolean | If `true`, clears the query cache. Defaults to `true`. | | request | Boolean | If `true`, clears the request cache. Defaults to `true`. | -## Example requests +#### Example requests The following example requests show multiple clear cache API uses. -### Clear a specific cache +##### Clear a specific cache The following request clears the fields cache only: -```json +````json POST /my-index/_cache/clear?fielddata=true -``` +```` {% include copy-curl.html %}
The following request clears the query cache only: -```json +````json POST /my-index/_cache/clear?query=true -``` +```` {% include copy-curl.html %}
The following request clears the request cache only: -```json +````json POST /my-index/_cache/clear?request=true -``` +```` {% include copy-curl.html %} -### Clear the cache for specific fields +##### Clear the cache for specific fields The following request clears the fields caches of `fielda` and `fieldb`: -```json +````json POST /my-index/_cache/clear?fields=fielda,fieldb -``` +```` {% include copy-curl.html %} -### Clear caches for specific data streams or indexes +##### Clear caches for specific data streams and indexes The following request clears the cache for two specific indexes: -```json +````json POST /my-index,my-index2/_cache/clear -``` +```` {% include copy-curl.html %} -### Clear caches for all data streams and indexes +##### Clear caches for all data streams and indexes The following request clears the cache for all data streams and indexes: -```json +````json POST /_cache/clear -``` -{% include copy-curl.html %} - -### Clear unused entries from the cache on search-capable nodes - -```json -POST /*/_cache/clear?file=true -``` +```` {% include copy-curl.html %} #### Example response The `POST /books,hockey/_cache/clear` request returns the following fields: -```json +````json { "_shards" : { "total" : 4, @@ -113,9 +105,9 @@ The `POST /books,hockey/_cache/clear` request returns the following fields: "failed" : 0 } } -``` +```` -#### Response fields +### Response fields The `POST /books,hockey/_cache/clear` request returns the following response fields: diff --git a/_api-reference/index-apis/clone.md b/_api-reference/index-apis/clone.md index 52e558e48d..788e7e81f1 100644 --- a/_api-reference/index-apis/clone.md +++ b/_api-reference/index-apis/clone.md @@ -2,7 +2,7 @@ layout: default title: Clone index parent: Index APIs -nav_order: 15 +nav_order: 70 redirect_from: - /opensearch/rest-api/index-apis/clone/ --- @@ -55,14 +55,14 @@ Parameter | Type | Description <source-index> | String | The source index to clone. <target-index> | String | The index to create and add cloned data to. wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for the request to return. Default is `30s`. wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. ## Request body -The clone index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) to apply to the target index. +The clone index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-settings) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias) to apply to the target index. ## Response diff --git a/_api-reference/index-apis/close-index.md b/_api-reference/index-apis/close-index.md index 28675f1c98..5a1d3c25a1 100644 --- a/_api-reference/index-apis/close-index.md +++ b/_api-reference/index-apis/close-index.md @@ -2,7 +2,7 @@ layout: default title: Close index parent: Index APIs -nav_order: 20 +nav_order: 30 redirect_from: - /opensearch/rest-api/index-apis/close-index/ --- @@ -32,12 +32,12 @@ All parameters are optional. Parameter | Type | Description :--- | :--- | :--- -<index-name> | String | The index to close. Can be a comma-separated list of multiple index names. Use `_all` or * to close all indexes. -allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. -expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is open. -ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. +<index-name> | String | The index to close. Can be a comma-separated list of multiple index names. Use `_all` or * to close all indices. +allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indices. Default is true. +expand_wildcards | String | Expands wildcard expressions to different indices. Combine multiple values with commas. Available values are all (match all indices), open (match open indices), closed (match closed indices), hidden (match hidden indices), and none (do not accept wildcard expressions). Default is open. +ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indices. Default is false. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for a response from the cluster. Default is `30s`. diff --git a/_api-reference/index-apis/create-index.md b/_api-reference/index-apis/create-index.md index 6a798fa87e..635e7484f7 100644 --- a/_api-reference/index-apis/create-index.md +++ b/_api-reference/index-apis/create-index.md @@ -2,10 +2,10 @@ layout: default title: Create index parent: Index APIs -nav_order: 25 +nav_order: 1 redirect_from: - /opensearch/rest-api/index-apis/create-index/ - - /opensearch/rest-api/create-index/ + - opensearch/rest-api/create-index/ --- # Create index @@ -14,7 +14,31 @@ Introduced 1.0 While you can create an index by using a document as a base, you can also create an empty index for later use. -When creating an index, you can specify its mappings, settings, and aliases. +## Example + +The following example demonstrates how to create an index with a non-default number of primary and replica shards, specifies that `age` is of type `integer`, and assigns a `sample-alias1` alias to the index. + +```json +PUT /sample-index1 +{ + "settings": { + "index": { + "number_of_shards": 2, + "number_of_replicas": 1 + } + }, + "mappings": { + "properties": { + "age": { + "type": "integer" + } + } + }, + "aliases": { + "sample-alias1": {} + } +} +``` ## Path and HTTP methods @@ -32,46 +56,61 @@ OpenSearch indexes have the following naming restrictions: `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` -## Path parameters - -| Parameter | Description | -:--- | :--- -| index | String | The index name. Must conform to the [index naming restrictions](#index-naming-restrictions). Required. | +## URL parameters -## Query parameters - -You can include the following query parameters in your request. All parameters are optional. +You can include the following URL parameters in your request. All parameters are optional. Parameter | Type | Description :--- | :--- | :--- wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for the request to return. Default is `30s`. ## Request body -As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. - -#### Example request - -```json -PUT /sample-index1 -{ - "settings": { - "index": { - "number_of_shards": 2, - "number_of_replicas": 1 - } - }, - "mappings": { - "properties": { - "age": { - "type": "integer" - } - } - }, - "aliases": { - "sample-alias1": {} - } -} -``` +As part of your request, you can supply parameters in your request's body that specify index settings, mappings, and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. The following sections provide more information about index settings and mappings. + + +### Index settings + +Index settings are separated into two varieties: static index settings and dynamic index settings. Static index settings are settings that you specify at index creation and can't change later. You can change dynamic settings at any time, including at index creation. + +#### Static index settings + +Setting | Description +:--- | :--- +index.number_of_shards | The number of primary shards in the index. Default is 1. +index.number_of_routing_shards | The number of routing shards used to split an index. +index.shard.check_on_startup | Whether the index's shards should be checked for corruption. Available options are `false` (do not check for corruption), `checksum` (check for physical corruption), and `true` (check for both physical and logical corruption). Default is `false`. +index.codec | The compression type to use to compress stored data. Available values are `default` (optimizes for retrieval speed) and `best_compression` (optimizes for better compression at the expense of speed, leading to smaller data sizes on disk). +index.routing_partition_size | The number of shards a custom routing value can go to. Routing helps an imbalanced cluster by relocating values to a subset of shards rather than just a single shard. To enable, set this value to greater than 1 but less than `index.number_of_shards`. Default is 1. +index.soft_deletes.retention_lease.period | The maximum amount of time to retain a shard's history of operations. Default is `12h`. +index.load_fixed_bitset_filters_eagerly | Whether OpenSearch should pre-load cached filters. Available options are `true` and `false`. Default is `true`. +index.hidden | Whether the index should be hidden. Hidden indexes are not returned as part of queries that have wildcards. Available options are `true` and `false`. Default is `false`. + +#### Dynamic index Settings + +Setting | Description +:--- | :--- +index.number_of_replicas | The number of replica shards each primary shard should have. For example, if you have 4 primary shards and set `index.number_of_replicas` to 3, the index has 12 replica shards. Default is 1. +index.auto_expand_replicas | Whether the cluster should automatically add replica shards based on the number of data nodes. Specify a lower bound and upper limit (for example, 0-9), or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0-3, then the cluster does not automatically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled. +index.search.idle.after | Amount of time a shard should wait for a search or get request until it goes idle. Default is `30s`. +index.refresh_interval | How often the index should refresh, which publishes its most recent changes and makes them available for searching. Can be set to `-1` to disable refreshing. Default is `1s`. +index.max_result_window | The maximum value of `from` + `size` for searches to the index. `from` is the starting index to search from, and `size` is the amount of results to return. Default: 10000. +index.max_inner_result_window | Maximum value of `from` + `size` to return nested search hits and most relevant document aggregated during the query. `from` is the starting index to search from, and `size` is the amount of top hits to return. Default is 100. +index.max_rescore_window | The maximum value of `window_size` for rescore requests to the index. Rescore requests reorder the index's documents and return a new score, which can be more precise. Default is the same as index.max_inner_result_window or 10000 by default. +index.max_docvalue_fields_search | Maximum amount of `docvalue_fields` allowed in a query. Default is 100. +index.max_script_fields | Maximum amount of `script_fields` allowed in a query. Default is 32. +index.max_ngram_diff | Maximum difference between `min_gram` and `max_gram` values for `NGramTokenizer` and `NGramTokenFilter` fields. Default is 1. +index.max_shingle_diff | Maximum difference between `max_shingle_size` and `min_shingle_size` to feed into the `shingle` token filter. Default is 3. +index.max_refresh_listeners | Maximum amount of refresh listeners each shard is allowed to have. +index.analyze.max_token_count | Maximum amount of tokens that can return from the `_analyze` API operation. Default is 10000. +index.highlight.max_analyzed_offset | The amount of characters a highlight request can analyze. Default is 1000000. +index.max_terms_count | The maximum amount of terms a terms query can accept. Default is 65536. +index.max_regex_length | The maximum character length of regex that can be in a regexp query. Default is 1000. +index.query.default_field | A field or list of fields that OpenSearch uses in queries in case a field isn't specified in the parameters. +index.routing.allocation.enable | Specifies options for the index’s shard allocation. Available options are all (allow allocation for all shards), primaries (allow allocation only for primary shards), new_primaries (allow allocation only for new primary shards), and none (do not allow allocation). Default is all. +index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`. +index.gc_deletes | Amount of time to retain a deleted document's version number. Default is `60s`. +index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. +index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. \ No newline at end of file diff --git a/_api-reference/index-apis/dangling-index.md b/_api-reference/index-apis/dangling-index.md index 5f9c184d66..1513a72339 100644 --- a/_api-reference/index-apis/dangling-index.md +++ b/_api-reference/index-apis/dangling-index.md @@ -2,7 +2,7 @@ layout: default title: Dangling indexes parent: index-apis -nav_order: 30 +nav_order: 84 --- # Dangling indexes API @@ -45,7 +45,7 @@ Query parameter | Data type | Description :--- | :--- | :--- accept_data_loss | Boolean | Must be set to `true` for an `import` or `delete` because OpenSearch is unaware of where the dangling index data came from. timeout | Time units | The amount of time to wait for a response. If no response is received in the defined time period, an error is returned. Default is `30` seconds. -cluster_manager_timeout | Time units | The amount of time to wait for a connection to the cluster manager. If no response is received in the defined time period, an error is returned. Default is `30` seconds. +master_timeout | Time units | The amount of time to wait for the connection to the cluster manager. If no response is received in the defined time period, an error is returned. Default is `30` seconds. ## Examples diff --git a/_api-reference/index-apis/delete-index.md b/_api-reference/index-apis/delete-index.md index 91b991b85d..29984b7aa2 100644 --- a/_api-reference/index-apis/delete-index.md +++ b/_api-reference/index-apis/delete-index.md @@ -2,7 +2,7 @@ layout: default title: Delete index parent: Index APIs -nav_order: 35 +nav_order: 10 redirect_from: - /opensearch/rest-api/index-apis/delete-index/ --- @@ -35,7 +35,7 @@ Parameter | Type | Description allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indexes. Default is true. expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions), which must be used with open, closed, or both. Default is open. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for the response to return. Default is `30s`. diff --git a/_api-reference/index-apis/exists.md b/_api-reference/index-apis/exists.md index f06745157e..88acd1daea 100644 --- a/_api-reference/index-apis/exists.md +++ b/_api-reference/index-apis/exists.md @@ -2,7 +2,7 @@ layout: default title: Index exists parent: Index APIs -nav_order: 50 +nav_order: 5 redirect_from: - /opensearch/rest-api/index-apis/exists/ --- @@ -37,7 +37,7 @@ expand_wildcards | String | Expands wildcard expressions to different indexes. C flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789". include_defaults | Boolean | Whether to include default settings as part of the response. This parameter is useful for identifying the names and current values of settings you want to update. ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. -local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is false. +local | Boolean | Whether to return information from only the local node instead of from the master node. Default is false. ## Response diff --git a/_api-reference/index-apis/get-index.md b/_api-reference/index-apis/get-index.md index f47f737826..2c759587af 100644 --- a/_api-reference/index-apis/get-index.md +++ b/_api-reference/index-apis/get-index.md @@ -2,7 +2,7 @@ layout: default title: Get index parent: Index APIs -nav_order: 40 +nav_order: 20 redirect_from: - /opensearch/rest-api/index-apis/get-index/ --- @@ -37,8 +37,8 @@ expand_wildcards | String | Expands wildcard expressions to different indexes. C flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of "index": { "creation_date": "123456789" } is "index.creation_date": "123456789". include_defaults | Boolean | Whether to include default settings as part of the response. This parameter is useful for identifying the names and current values of settings you want to update. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -local | Boolean | Whether to return information from only the local node instead of from the cluster manager node. Default is false. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +local | Boolean | Whether to return information from only the local node instead of from the master node. Default is false. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. ## Response diff --git a/_api-reference/index-apis/get-settings.md b/_api-reference/index-apis/get-settings.md index c6689a54a9..f695ddb2cc 100644 --- a/_api-reference/index-apis/get-settings.md +++ b/_api-reference/index-apis/get-settings.md @@ -2,7 +2,7 @@ layout: default title: Get settings parent: Index APIs -nav_order: 45 +nav_order: 100 redirect_from: - /opensearch/rest-api/index-apis/get-index/ --- @@ -30,7 +30,7 @@ GET //_settings/ ## URL parameters -All get settings parameters are optional. +All update settings parameters are optional. Parameter | Data type | Description :--- | :--- | :--- @@ -41,8 +41,8 @@ expand_wildcards | String | Expands wildcard expressions to different indexes. C flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of “index”: { “creation_date”: “123456789” } is “index.creation_date”: “123456789”. include_defaults | String | Whether to include default settings, including settings used within OpenSearch plugins, in the response. Default is false. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. -local | Boolean | Whether to return information from the local node only instead of the cluster manager node. Default is false. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +local | Boolean | Whether to return information from the local node only instead of the master node. Default is false. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. ## Response diff --git a/_api-reference/index-apis/open-index.md b/_api-reference/index-apis/open-index.md index 90a4491898..98d020ea8d 100644 --- a/_api-reference/index-apis/open-index.md +++ b/_api-reference/index-apis/open-index.md @@ -2,7 +2,7 @@ layout: default title: Open index parent: Index APIs -nav_order: 55 +nav_order: 40 redirect_from: - /opensearch/rest-api/index-apis/open-index/ --- @@ -37,7 +37,7 @@ allow_no_indices | Boolean | Whether to ignore wildcards that don't match any in expand_wildcards | String | Expands wildcard expressions to different indexes. Combine multiple values with commas. Available values are all (match all indexes), open (match open indexes), closed (match closed indexes), hidden (match hidden indexes), and none (do not accept wildcard expressions). Default is open. ignore_unavailable | Boolean | If true, OpenSearch does not search for missing or closed indexes. Default is false. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for a response from the cluster. Default is `30s`. wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. diff --git a/_api-reference/index-apis/put-mapping.md b/_api-reference/index-apis/put-mapping.md index dfbbdc139c..528dc9db76 100644 --- a/_api-reference/index-apis/put-mapping.md +++ b/_api-reference/index-apis/put-mapping.md @@ -2,10 +2,9 @@ layout: default title: Create or update mappings parent: Index APIs -nav_order: 27 +nav_order: 220 redirect_from: - /opensearch/rest-api/index-apis/update-mapping/ - - /opensearch/rest-api/update-mapping/ --- # Create or update mappings @@ -51,8 +50,8 @@ You can make the document structure match the structure of the index mapping by ```json { - "dynamic": "strict", "properties":{ + "dynamic": "strict", "color":{ "type": "text" } diff --git a/_api-reference/index-apis/shrink-index.md b/_api-reference/index-apis/shrink-index.md index 0476e19c68..402f894327 100644 --- a/_api-reference/index-apis/shrink-index.md +++ b/_api-reference/index-apis/shrink-index.md @@ -2,7 +2,7 @@ layout: default title: Shrink index parent: Index APIs -nav_order: 65 +nav_order: 50 redirect_from: - /opensearch/rest-api/index-apis/shrink-index/ --- @@ -51,7 +51,7 @@ Parameter | Type | description <index-name> | String | The index to shrink. <target-index> | String | The target index to shrink the source index into. wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for the request to return a response. Default is `30s`. wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. @@ -63,7 +63,7 @@ You can use the request body to configure some index settings for the target ind Field | Type | Description :--- | :--- | :--- alias | Object | Sets an alias for the target index. Can have the fields `filter`, `index_routing`, `is_hidden`, `is_write_index`, `routing`, or `search_routing`. See [Index Aliases]({{site.url}}{{site.baseurl}}/api-reference/alias/#request-body). -settings | Object | Index settings you can apply to your target index. See [Index Settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/). +settings | Object | Index settings you can apply to your target index. See [Index Settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-settings). [max_shard_size](#the-max_shard_size-parameter) | Bytes | Specifies the maximum size of a primary shard in the target index. Because `max_shard_size` conflicts with the `index.number_of_shards` setting, you cannot set both of them at the same time. ### The `max_shard_size` parameter @@ -78,8 +78,4 @@ The primary shard count of the target index is the smallest factor of the source The maximum number of primary shards for the target index is equal to the number of primary shards in the source index because the shrink operation is used to reduce the primary shard count. As an example, consider a source index with 5 primary shards that occupy a total of 600 GB of storage. If `max_shard_size` is 100 GB, the minimum number of primary shards is 600/100, which is 6. However, because the number of primary shards in the source index is smaller than 6, the number of primary shards in the target index is set to 5. The minimum number of primary shards for the target index is 1. -{: .note} - -## Index codec considerations - -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#splits-and-shrinks). \ No newline at end of file +{: .note} \ No newline at end of file diff --git a/_api-reference/index-apis/split.md b/_api-reference/index-apis/split.md index fcf29998a9..d35030cede 100644 --- a/_api-reference/index-apis/split.md +++ b/_api-reference/index-apis/split.md @@ -2,7 +2,7 @@ layout: default title: Split index parent: Index APIs -nav_order: 70 +nav_order: 80 redirect_from: - /opensearch/rest-api/index-apis/split/ --- @@ -55,14 +55,14 @@ Parameter | Type | Description <source-index> | String | The source index to split. <target-index> | String | The index to create. wait_for_active_shards | String | The number of active shards that must be available before OpenSearch processes the request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the operation to succeed. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for the request to return. Default is `30s`. wait_for_completion | Boolean | When set to `false`, the request returns immediately instead of after the operation is finished. To monitor the operation status, use the [Tasks API]({{site.url}}{{site.baseurl}}/api-reference/tasks/) with the task ID returned by the request. Default is `true`. task_execution_timeout | Time | The explicit task execution timeout. Only useful when wait_for_completion is set to `false`. Default is `1h`. ## Request body -The split index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) to apply to the target index. +The split index API operation creates a new target index, so you can specify any [index settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-settings) and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias) to apply to the target index. ## Response @@ -73,7 +73,3 @@ The split index API operation creates a new target index, so you can specify any "index": "split-index1" } ``` - -## Index codec considerations - -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#splits-and-shrinks). \ No newline at end of file diff --git a/_api-reference/index-apis/stats.md b/_api-reference/index-apis/stats.md deleted file mode 100644 index de887a47d6..0000000000 --- a/_api-reference/index-apis/stats.md +++ /dev/null @@ -1,647 +0,0 @@ ---- -layout: default -title: Stats -parent: Index APIs -nav_order: 72 ---- - -# Index stats - -The Index Stats API provides index statistics. For data streams, the API provides statistics for the stream's backing indexes. By default, the returned statistics are index level. To receive shard-level statistics, set the `level` parameter to `shards`. - -When a shard moves to a different node, the shard-level statistics for the shard are cleared. Although the shard is no longer part of the node, the node preserves any node-level statistics to which the shard contributed. -{: .note} - -## Path and HTTP methods - -```json -GET /_stats -GET //_stats -GET //_stats/ -``` - -## Path parameters - -The following table lists the available path parameters. All path parameters are optional. - -| Parameter | Data type | Description | -| :--- | :--- | :--- | -| `` | String | A comma-separated list of indexes, data streams, or index aliases used to filter results. Supports wildcard expressions. Defaults to `_all` (`*`). -`` | String | A comma-separated list of metric groups that will be included in the response. For valid values, see [Metrics](#metrics). Defaults to all metrics. | - -### Metrics - -The following table lists all available metric groups. - -Metric | Description -:--- |:---- -`_all` | Return all statistics. -`completion` | Completion suggester statistics. -`docs` | Returns the number of documents and the number of deleted documents that have not yet been merged. Index refresh operations can affect this statistic. -`fielddata` | Field data statistics. -`flush` | Flush statistics. -`get` | Get statistics, including missing stats. -`indexing` | Indexing statistics. -`merge` | Merge statistics. -`query_cache` | Query cache statistics. -`refresh` | Refresh statistics. -`request_cache` | Shard request cache statistics. -`search` | Search statistics, including suggest operation statistics. Search operations can be associated with one or more groups. You can include statistics for custom groups by providing a `groups` parameter, which accepts a comma-separated list of group names. To return statistics for all groups, use `_all`. -`segments` | Statistics about memory use of all open segments. If the `include_segment_file_sizes` parameter is `true`, this metric includes the aggregated disk usage of each Lucene index file. -`store` | Size of the index in byte units. -`translog` | Translog statistics. -`warmer` | Warmer statistics. - -## Query parameters - -The following table lists the available query parameters. All query parameters are optional. - -Parameter | Data type | Description -:--- | :--- | :--- -`expand_wildcards` | String | Specifies the type of indexes to which wildcard expressions can expand. Supports comma-separated values. Valid values are:
- `all`: Expand to all open and closed indexes, including hidden indexes.
- `open`: Expand to open indexes.
- `closed`: Expand to closed indexes.
- `hidden`: Include hidden indexes when expanding. Must be combined with `open`, `closed`, or both.
- `none`: Do not accept wildcard expressions.
Default is `open`. -`fields` | String | A comma-separated list or a wildcard expression specifying fields to include in the statistics. Specifies the default field list if neither `completion_fields` nor `fielddata_fields` is provided. -`completion_fields` | String | A comma-separated list or wildcard expression specifying fields to include in field-level `completion` statistics. -`fielddata_fields` | String | A comma-separated list or wildcard expression specifying fields to include in field-level `fielddata` statistics. -`forbid_closed_indices` | Boolean | Specifies not to collect statistics for closed indexes. Default is `true`. -`groups` | String | A comma-separated list of search groups to include in the `search` statistics. -`level` | String | Specifies the level used to aggregate statistics. Valid values are:
- `cluster`: Cluster-level statistics.
- `indices`: Index-level statistics.
- `shards`: Shard-level statistics.
Default is `indices`. -`include_segment_file_sizes` | Boolean | Specifies whether to report the aggregated disk usage of each Lucene index file. Only applies to `segments` statistics. Default is `false`. -`include_unloaded_segments` | Boolean | Specifies whether to include information from segments that are not loaded into memory. Default is `false`. - -#### Example request: One index - -```json -GET /testindex/_stats -``` -{% include copy-curl.html %} - -#### Example response - -By default, the returned statistics are aggregated in the `primaries` and `total` aggregations. The `primaries` aggregation contains statistics for the primary shards. The `total` aggregation contains statistics for both primary and replica shards. The following is an example Index Stats API response: - -
- - Response - - {: .text-delta} - -```json -{ - "_shards": { - "total": 2, - "successful": 1, - "failed": 0 - }, - "_all": { - "primaries": { - "docs": { - "count": 4, - "deleted": 0 - }, - "store": { - "size_in_bytes": 15531, - "reserved_in_bytes": 0 - }, - "indexing": { - "index_total": 4, - "index_time_in_millis": 10, - "index_current": 0, - "index_failed": 0, - "delete_total": 0, - "delete_time_in_millis": 0, - "delete_current": 0, - "noop_update_total": 0, - "is_throttled": false, - "throttle_time_in_millis": 0 - }, - "get": { - "total": 0, - "time_in_millis": 0, - "exists_total": 0, - "exists_time_in_millis": 0, - "missing_total": 0, - "missing_time_in_millis": 0, - "current": 0 - }, - "search": { - "open_contexts": 0, - "query_total": 12, - "query_time_in_millis": 11, - "query_current": 0, - "fetch_total": 12, - "fetch_time_in_millis": 5, - "fetch_current": 0, - "scroll_total": 0, - "scroll_time_in_millis": 0, - "scroll_current": 0, - "point_in_time_total": 0, - "point_in_time_time_in_millis": 0, - "point_in_time_current": 0, - "suggest_total": 0, - "suggest_time_in_millis": 0, - "suggest_current": 0 - }, - "merges": { - "current": 0, - "current_docs": 0, - "current_size_in_bytes": 0, - "total": 0, - "total_time_in_millis": 0, - "total_docs": 0, - "total_size_in_bytes": 0, - "total_stopped_time_in_millis": 0, - "total_throttled_time_in_millis": 0, - "total_auto_throttle_in_bytes": 20971520 - }, - "refresh": { - "total": 8, - "total_time_in_millis": 58, - "external_total": 7, - "external_total_time_in_millis": 60, - "listeners": 0 - }, - "flush": { - "total": 1, - "periodic": 1, - "total_time_in_millis": 21 - }, - "warmer": { - "current": 0, - "total": 6, - "total_time_in_millis": 0 - }, - "query_cache": { - "memory_size_in_bytes": 0, - "total_count": 0, - "hit_count": 0, - "miss_count": 0, - "cache_size": 0, - "cache_count": 0, - "evictions": 0 - }, - "fielddata": { - "memory_size_in_bytes": 0, - "evictions": 0 - }, - "completion": { - "size_in_bytes": 0 - }, - "segments": { - "count": 4, - "memory_in_bytes": 0, - "terms_memory_in_bytes": 0, - "stored_fields_memory_in_bytes": 0, - "term_vectors_memory_in_bytes": 0, - "norms_memory_in_bytes": 0, - "points_memory_in_bytes": 0, - "doc_values_memory_in_bytes": 0, - "index_writer_memory_in_bytes": 0, - "version_map_memory_in_bytes": 0, - "fixed_bit_set_memory_in_bytes": 0, - "max_unsafe_auto_id_timestamp": -1, - "file_sizes": {} - }, - "translog": { - "operations": 0, - "size_in_bytes": 55, - "uncommitted_operations": 0, - "uncommitted_size_in_bytes": 55, - "earliest_last_modified_age": 142622215 - }, - "request_cache": { - "memory_size_in_bytes": 0, - "evictions": 0, - "hit_count": 0, - "miss_count": 0 - }, - "recovery": { - "current_as_source": 0, - "current_as_target": 0, - "throttle_time_in_millis": 0 - } - }, - "total": { - "docs": { - "count": 4, - "deleted": 0 - }, - "store": { - "size_in_bytes": 15531, - "reserved_in_bytes": 0 - }, - "indexing": { - "index_total": 4, - "index_time_in_millis": 10, - "index_current": 0, - "index_failed": 0, - "delete_total": 0, - "delete_time_in_millis": 0, - "delete_current": 0, - "noop_update_total": 0, - "is_throttled": false, - "throttle_time_in_millis": 0 - }, - "get": { - "total": 0, - "time_in_millis": 0, - "exists_total": 0, - "exists_time_in_millis": 0, - "missing_total": 0, - "missing_time_in_millis": 0, - "current": 0 - }, - "search": { - "open_contexts": 0, - "query_total": 12, - "query_time_in_millis": 11, - "query_current": 0, - "fetch_total": 12, - "fetch_time_in_millis": 5, - "fetch_current": 0, - "scroll_total": 0, - "scroll_time_in_millis": 0, - "scroll_current": 0, - "point_in_time_total": 0, - "point_in_time_time_in_millis": 0, - "point_in_time_current": 0, - "suggest_total": 0, - "suggest_time_in_millis": 0, - "suggest_current": 0 - }, - "merges": { - "current": 0, - "current_docs": 0, - "current_size_in_bytes": 0, - "total": 0, - "total_time_in_millis": 0, - "total_docs": 0, - "total_size_in_bytes": 0, - "total_stopped_time_in_millis": 0, - "total_throttled_time_in_millis": 0, - "total_auto_throttle_in_bytes": 20971520 - }, - "refresh": { - "total": 8, - "total_time_in_millis": 58, - "external_total": 7, - "external_total_time_in_millis": 60, - "listeners": 0 - }, - "flush": { - "total": 1, - "periodic": 1, - "total_time_in_millis": 21 - }, - "warmer": { - "current": 0, - "total": 6, - "total_time_in_millis": 0 - }, - "query_cache": { - "memory_size_in_bytes": 0, - "total_count": 0, - "hit_count": 0, - "miss_count": 0, - "cache_size": 0, - "cache_count": 0, - "evictions": 0 - }, - "fielddata": { - "memory_size_in_bytes": 0, - "evictions": 0 - }, - "completion": { - "size_in_bytes": 0 - }, - "segments": { - "count": 4, - "memory_in_bytes": 0, - "terms_memory_in_bytes": 0, - "stored_fields_memory_in_bytes": 0, - "term_vectors_memory_in_bytes": 0, - "norms_memory_in_bytes": 0, - "points_memory_in_bytes": 0, - "doc_values_memory_in_bytes": 0, - "index_writer_memory_in_bytes": 0, - "version_map_memory_in_bytes": 0, - "fixed_bit_set_memory_in_bytes": 0, - "max_unsafe_auto_id_timestamp": -1, - "file_sizes": {} - }, - "translog": { - "operations": 0, - "size_in_bytes": 55, - "uncommitted_operations": 0, - "uncommitted_size_in_bytes": 55, - "earliest_last_modified_age": 142622215 - }, - "request_cache": { - "memory_size_in_bytes": 0, - "evictions": 0, - "hit_count": 0, - "miss_count": 0 - }, - "recovery": { - "current_as_source": 0, - "current_as_target": 0, - "throttle_time_in_millis": 0 - } - } - }, - "indices": { - "testindex": { - "uuid": "0SXXSpe9Rp-FpxXXWLOD8Q", - "primaries": { - "docs": { - "count": 4, - "deleted": 0 - }, - "store": { - "size_in_bytes": 15531, - "reserved_in_bytes": 0 - }, - "indexing": { - "index_total": 4, - "index_time_in_millis": 10, - "index_current": 0, - "index_failed": 0, - "delete_total": 0, - "delete_time_in_millis": 0, - "delete_current": 0, - "noop_update_total": 0, - "is_throttled": false, - "throttle_time_in_millis": 0 - }, - "get": { - "total": 0, - "time_in_millis": 0, - "exists_total": 0, - "exists_time_in_millis": 0, - "missing_total": 0, - "missing_time_in_millis": 0, - "current": 0 - }, - "search": { - "open_contexts": 0, - "query_total": 12, - "query_time_in_millis": 11, - "query_current": 0, - "fetch_total": 12, - "fetch_time_in_millis": 5, - "fetch_current": 0, - "scroll_total": 0, - "scroll_time_in_millis": 0, - "scroll_current": 0, - "point_in_time_total": 0, - "point_in_time_time_in_millis": 0, - "point_in_time_current": 0, - "suggest_total": 0, - "suggest_time_in_millis": 0, - "suggest_current": 0 - }, - "merges": { - "current": 0, - "current_docs": 0, - "current_size_in_bytes": 0, - "total": 0, - "total_time_in_millis": 0, - "total_docs": 0, - "total_size_in_bytes": 0, - "total_stopped_time_in_millis": 0, - "total_throttled_time_in_millis": 0, - "total_auto_throttle_in_bytes": 20971520 - }, - "refresh": { - "total": 8, - "total_time_in_millis": 58, - "external_total": 7, - "external_total_time_in_millis": 60, - "listeners": 0 - }, - "flush": { - "total": 1, - "periodic": 1, - "total_time_in_millis": 21 - }, - "warmer": { - "current": 0, - "total": 6, - "total_time_in_millis": 0 - }, - "query_cache": { - "memory_size_in_bytes": 0, - "total_count": 0, - "hit_count": 0, - "miss_count": 0, - "cache_size": 0, - "cache_count": 0, - "evictions": 0 - }, - "fielddata": { - "memory_size_in_bytes": 0, - "evictions": 0 - }, - "completion": { - "size_in_bytes": 0 - }, - "segments": { - "count": 4, - "memory_in_bytes": 0, - "terms_memory_in_bytes": 0, - "stored_fields_memory_in_bytes": 0, - "term_vectors_memory_in_bytes": 0, - "norms_memory_in_bytes": 0, - "points_memory_in_bytes": 0, - "doc_values_memory_in_bytes": 0, - "index_writer_memory_in_bytes": 0, - "version_map_memory_in_bytes": 0, - "fixed_bit_set_memory_in_bytes": 0, - "max_unsafe_auto_id_timestamp": -1, - "file_sizes": {} - }, - "translog": { - "operations": 0, - "size_in_bytes": 55, - "uncommitted_operations": 0, - "uncommitted_size_in_bytes": 55, - "earliest_last_modified_age": 142622215 - }, - "request_cache": { - "memory_size_in_bytes": 0, - "evictions": 0, - "hit_count": 0, - "miss_count": 0 - }, - "recovery": { - "current_as_source": 0, - "current_as_target": 0, - "throttle_time_in_millis": 0 - } - }, - "total": { - "docs": { - "count": 4, - "deleted": 0 - }, - "store": { - "size_in_bytes": 15531, - "reserved_in_bytes": 0 - }, - "indexing": { - "index_total": 4, - "index_time_in_millis": 10, - "index_current": 0, - "index_failed": 0, - "delete_total": 0, - "delete_time_in_millis": 0, - "delete_current": 0, - "noop_update_total": 0, - "is_throttled": false, - "throttle_time_in_millis": 0 - }, - "get": { - "total": 0, - "time_in_millis": 0, - "exists_total": 0, - "exists_time_in_millis": 0, - "missing_total": 0, - "missing_time_in_millis": 0, - "current": 0 - }, - "search": { - "open_contexts": 0, - "query_total": 12, - "query_time_in_millis": 11, - "query_current": 0, - "fetch_total": 12, - "fetch_time_in_millis": 5, - "fetch_current": 0, - "scroll_total": 0, - "scroll_time_in_millis": 0, - "scroll_current": 0, - "point_in_time_total": 0, - "point_in_time_time_in_millis": 0, - "point_in_time_current": 0, - "suggest_total": 0, - "suggest_time_in_millis": 0, - "suggest_current": 0 - }, - "merges": { - "current": 0, - "current_docs": 0, - "current_size_in_bytes": 0, - "total": 0, - "total_time_in_millis": 0, - "total_docs": 0, - "total_size_in_bytes": 0, - "total_stopped_time_in_millis": 0, - "total_throttled_time_in_millis": 0, - "total_auto_throttle_in_bytes": 20971520 - }, - "refresh": { - "total": 8, - "total_time_in_millis": 58, - "external_total": 7, - "external_total_time_in_millis": 60, - "listeners": 0 - }, - "flush": { - "total": 1, - "periodic": 1, - "total_time_in_millis": 21 - }, - "warmer": { - "current": 0, - "total": 6, - "total_time_in_millis": 0 - }, - "query_cache": { - "memory_size_in_bytes": 0, - "total_count": 0, - "hit_count": 0, - "miss_count": 0, - "cache_size": 0, - "cache_count": 0, - "evictions": 0 - }, - "fielddata": { - "memory_size_in_bytes": 0, - "evictions": 0 - }, - "completion": { - "size_in_bytes": 0 - }, - "segments": { - "count": 4, - "memory_in_bytes": 0, - "terms_memory_in_bytes": 0, - "stored_fields_memory_in_bytes": 0, - "term_vectors_memory_in_bytes": 0, - "norms_memory_in_bytes": 0, - "points_memory_in_bytes": 0, - "doc_values_memory_in_bytes": 0, - "index_writer_memory_in_bytes": 0, - "version_map_memory_in_bytes": 0, - "fixed_bit_set_memory_in_bytes": 0, - "max_unsafe_auto_id_timestamp": -1, - "file_sizes": {} - }, - "translog": { - "operations": 0, - "size_in_bytes": 55, - "uncommitted_operations": 0, - "uncommitted_size_in_bytes": 55, - "earliest_last_modified_age": 142622215 - }, - "request_cache": { - "memory_size_in_bytes": 0, - "evictions": 0, - "hit_count": 0, - "miss_count": 0 - }, - "recovery": { - "current_as_source": 0, - "current_as_target": 0, - "throttle_time_in_millis": 0 - } - } - } - } -} -``` -
- -#### Example request: Comma-separated list of indexes - -```json -GET /testindex1,testindex2/_stats -``` -{% include copy-curl.html %} - -#### Example request: Wildcard expression - -```json -GET /testindex*/_stats -``` -{% include copy-curl.html %} - -#### Example request: Specific stats - -```json -GET /testindex/_stats/refresh,flush -``` -{% include copy-curl.html %} - -#### Example request: Expand wildcards - -```json -GET /testindex*/_stats?expand_wildcards=open,hidden -``` -{% include copy-curl.html %} - -#### Example request: Shard-level statistics - -```json -GET /testindex/_stats?level=shards -``` -{% include copy-curl.html %} \ No newline at end of file diff --git a/_api-reference/index-apis/update-settings.md b/_api-reference/index-apis/update-settings.md index ef20e395f6..16a4a9d32b 100644 --- a/_api-reference/index-apis/update-settings.md +++ b/_api-reference/index-apis/update-settings.md @@ -2,7 +2,7 @@ layout: default title: Update settings parent: Index APIs -nav_order: 75 +nav_order: 120 redirect_from: - /opensearch/rest-api/index-apis/update-settings/ --- @@ -11,7 +11,7 @@ redirect_from: Introduced 1.0 {: .label .label-purple } -You can use the update settings API operation to update index-level settings. You can change dynamic index settings at any time, but static settings cannot be changed after index creation. For more information about static and dynamic index settings, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). +You can use the update settings API operation to update index-level settings. You can change dynamic index settings at any time, but static settings cannot be changed after index creation. For more information about static and dynamic index settings, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index). Aside from the static and dynamic index settings, you can also update individual plugins' settings. To get the full list of updatable settings, run `GET /_settings?include_defaults=true`. @@ -34,7 +34,7 @@ PUT /sample-index1/_settings PUT //_settings ``` -## Query parameters +## URL parameters All update settings parameters are optional. @@ -45,7 +45,7 @@ expand_wildcards | String | Expands wildcard expressions to different indexes. C flat_settings | Boolean | Whether to return settings in the flat form, which can improve readability, especially for heavily nested settings. For example, the flat form of “index”: { “creation_date”: “123456789” } is “index.creation_date”: “123456789”. ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indexes in the response. preserve_existing | Boolean | Whether to preserve existing index settings. Default is false. -cluster_manager_timeout | Time | How long to wait for a connection to the cluster manager node. Default is `30s`. +master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`. timeout | Time | How long to wait for a connection to return. Default is `30s`. ## Request body diff --git a/_api-reference/index.md b/_api-reference/index.md index 5022502c52..b53ad4eae9 100644 --- a/_api-reference/index.md +++ b/_api-reference/index.md @@ -2,8 +2,7 @@ layout: default title: REST API reference nav_order: 1 -has_toc: false -has_children: true +has_toc: true nav_exclude: true redirect_from: - /opensearch/rest-api/index/ @@ -11,52 +10,7 @@ redirect_from: # REST API reference -You can use REST APIs for most operations in OpenSearch. In this reference, we provide a description of the API, and details that include the paths and HTTP methods, supported parameters, and example requests and responses. +OpenSearch uses its REST API for most operations. This _incomplete_ section includes REST API paths, HTTP verbs, supported parameters, request body details, and example responses. -This reference includes the REST APIs supported by OpenSearch. If a REST API is missing, please provide feedback or submit a pull request in GitHub. +In general, the OpenSearch REST API is no different from the Elasticsearch OSS REST API; most client code that worked with Elasticsearch OSS should also work with OpenSearch. {: .tip } - -## Related articles - -- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/) -- [Access control API]({{site.url}}{{site.baseurl}}/security/access-control/api/) -- [Alerting API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/) -- [Anomaly detection API]({{site.url}}{{site.baseurl}}/observing-your-data/ad/api/) -- [CAT APIs]({{site.url}}{{site.baseurl}}/api-reference/cat/index/) -- [Cluster APIs]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/index/) -- [Common REST parameters]({{site.url}}{{site.baseurl}}/api-reference/common-parameters/) -- [Count]({{site.url}}{{site.baseurl}}/api-reference/count/) -- [Cross-cluster replication API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/replication-plugin/api/) -- [Document APIs]({{site.url}}{{site.baseurl}}/api-reference/document-apis/index/) -- [Explain]({{site.url}}{{site.baseurl}}/api-reference/explain/) -- [Index APIs]({{site.url}}{{site.baseurl}}/api-reference/index-apis/index/) -- [Index rollups API]({{site.url}}{{site.baseurl}}/im-plugin/index-rollups/rollup-api/) -- [Index state management API]({{site.url}}{{site.baseurl}}/im-plugin/ism/api/) -- [ISM error prevention API]({{site.url}}{{site.baseurl}}/im-plugin/ism/error-prevention/api/) -- [Ingest APIs]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) -- [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/) -- [ML Commons API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/) -- [Multi-search]({{site.url}}{{site.baseurl}}/api-reference/multi-search/) -- [Nodes APIs]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/index/) -- [Notifications API]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/api/) -- [Performance analyzer API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/api/) -- [Point in Time API]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/) -- [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/) -- [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/) -- [Refresh search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/) -- [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/) -- [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/) -- [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/) -- [Script APIs]({{site.url}}{{site.baseurl}}/api-reference/script-apis/index/) -- [Scroll]({{site.url}}{{site.baseurl}}/api-reference/scroll/) -- [Search]({{site.url}}{{site.baseurl}}/api-reference/search/) -- [Search relevance stats API]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/stats-api/) -- [Security analytics APIs]({{site.url}}{{site.baseurl}}/security-analytics/api-tools/index/) -- [Snapshot APIs]({{site.url}}{{site.baseurl}}/api-reference/snapshots/index/) -- [Stats API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/stats-api/) -- [Supported units]({{site.url}}{{site.baseurl}}/api-reference/units/) -- [Tasks]({{site.url}}{{site.baseurl}}/api-reference/tasks/) -- [Transforms API]({{site.url}}{{site.baseurl}}/im-plugin/index-transforms/transforms-apis/) - - - diff --git a/_api-reference/ingest-apis/create-ingest.md b/_api-reference/ingest-apis/create-ingest.md deleted file mode 100644 index 38e9b32b54..0000000000 --- a/_api-reference/ingest-apis/create-ingest.md +++ /dev/null @@ -1,100 +0,0 @@ ---- -layout: default -title: Create pipeline -parent: Ingest pipelines -grand_parent: Ingest APIs -nav_order: 10 -redirect_from: - - /opensearch/rest-api/ingest-apis/create-update-ingest/ ---- - -# Create pipeline - -Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires you to define at least one processor that specifies how to change the documents. - -## Path and HTTP method - -Replace `` with your pipeline ID: - -```json -PUT _ingest/pipeline/ -``` -#### Example request - -Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the `grad_year` to `2023`, and the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase. - -```json -PUT _ingest/pipeline/my-pipeline -{ - "description": "This pipeline processes student data", - "processors": [ - { - "set": { - "description": "Sets the graduation year to 2023", - "field": "grad_year", - "value": 2023 - } - }, - { - "set": { - "description": "Sets graduated to true", - "field": "graduated", - "value": true - } - }, - { - "uppercase": { - "field": "name" - } - } - ] -} -``` -{% include copy-curl.html %} - -To learn more about error handling, see [Handling pipeline failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/). - -## Request body fields - -The following table lists the request body fields used to create or update a pipeline. - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`processors` | Required | Array of processor objects | An array of processors, each of which transforms documents. Processors are run sequentially in the order specified. -`description` | Optional | String | A description of your ingest pipeline. - -## Path parameters - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline. - -## Query parameters - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`cluster_manager_timeout` | Optional | Time | Period to wait for a connection to the cluster manager node. Defaults to 30 seconds. -`timeout` | Optional | Time | Period to wait for a response. Defaults to 30 seconds. - -## Template snippets - -Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, `{% raw %}{{{field-name}}}{% endraw %}`. - -#### Example: `set` ingest processor using Mustache template snippet - -The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`: - -```json -PUT _ingest/pipeline/my-pipeline -{ - "processors": [ - { - "set": { - "field": "{% raw %}{{{role}}}{% endraw %}", - "value": "{% raw %}{{{tenure}}}{% endraw %}" - } - } - ] -} -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/create-update-ingest.md b/_api-reference/ingest-apis/create-update-ingest.md new file mode 100644 index 0000000000..de2ea4ac77 --- /dev/null +++ b/_api-reference/ingest-apis/create-update-ingest.md @@ -0,0 +1,79 @@ +--- +layout: default +title: Create or update ingest pipeline +parent: Ingest APIs +nav_order: 11 +redirect_from: + - /opensearch/rest-api/ingest-apis/create-update-ingest/ +--- + +# Create and update a pipeline + +The create ingest pipeline API operation creates or updates an ingest pipeline. Each pipeline requires an ingest definition defining how each processor transforms your documents. + +## Example + +``` +PUT _ingest/pipeline/12345 +{ + "description" : "A description for your pipeline", + "processors" : [ + { + "set" : { + "field": "field-name", + "value": "value" + } + } + ] +} +``` +{% include copy-curl.html %} + +## Path and HTTP methods +``` +PUT _ingest/pipeline/{id} +``` + +## Request body fields + +Field | Required | Type | Description +:--- | :--- | :--- | :--- +description | Optional | string | Description of your ingest pipeline. +processors | Required | Array of processor objects | A processor that transforms documents. Runs in the order specified. Appears in index once ran. + +```json +{ + "description" : "A description for your pipeline", + "processors" : [ + { + "set" : { + "field": "field-name", + "value": "value" + } + } + ] +} +``` + +## URL parameters + +All URL parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +master_timeout | time | How long to wait for a connection to the master node. +timeout | time | How long to wait for the request to return. + +## Response + +```json +{ + "acknowledged" : true +} +``` + + + + + + diff --git a/_api-reference/ingest-apis/delete-ingest.md b/_api-reference/ingest-apis/delete-ingest.md index 59383fb0aa..c5065d1e28 100644 --- a/_api-reference/ingest-apis/delete-ingest.md +++ b/_api-reference/ingest-apis/delete-ingest.md @@ -1,27 +1,44 @@ --- layout: default -title: Delete pipeline -parent: Ingest pipelines -grand_parent: Ingest APIs -nav_order: 13 +title: Delete a pipeline +parent: Ingest APIs +nav_order: 14 redirect_from: - /opensearch/rest-api/ingest-apis/delete-ingest/ --- -# Delete pipeline +# Delete a pipeline -Use the following request to delete a pipeline. +If you no longer want to use an ingest pipeline, use the delete ingest pipeline API operation. -To delete a specific pipeline, pass the pipeline ID as a parameter: +## Example -```json -DELETE /_ingest/pipeline/ +``` +DELETE _ingest/pipeline/12345 ``` {% include copy-curl.html %} -To delete all pipelines in a cluster, use the wildcard character (`*`): +## Path and HTTP methods + +Delete an ingest pipeline based on that pipeline's ID. -```json -DELETE /_ingest/pipeline/* ``` -{% include copy-curl.html %} +DELETE _ingest/pipeline/ +``` + +## URL parameters + +All URL parameters are optional. + +Parameter | Type | Description +:--- | :--- | :--- +master_timeout | time | How long to wait for a connection to the master node. +timeout | time | How long to wait for the request to return. + +## Response + +```json +{ + "acknowledged" : true +} +``` \ No newline at end of file diff --git a/_api-reference/ingest-apis/get-ingest.md b/_api-reference/ingest-apis/get-ingest.md index a56d7da584..f8e18f8a56 100644 --- a/_api-reference/ingest-apis/get-ingest.md +++ b/_api-reference/ingest-apis/get-ingest.md @@ -1,62 +1,59 @@ --- layout: default -title: Get pipeline -parent: Ingest pipelines -grand_parent: Ingest APIs -nav_order: 12 +title: Get ingest pipeline +parent: Ingest APIs +nav_order: 10 redirect_from: - /opensearch/rest-api/ingest-apis/get-ingest/ --- -# Get pipeline +## Get ingest pipeline -Use the get ingest pipeline API operation to retrieve all the information about the pipeline. +After you create a pipeline, use the get ingest pipeline API operation to return all the information about a specific ingest pipeline. -## Retrieving information about all pipelines +## Example -The following example request returns information about all ingest pipelines: - -```json -GET _ingest/pipeline/ +``` +GET _ingest/pipeline/12345 ``` {% include copy-curl.html %} -## Retrieving information about a specific pipeline +## Path and HTTP methods -The following example request returns information about a specific pipeline, which for this example is `my-pipeline`: +Return all ingest pipelines. -```json -GET _ingest/pipeline/my-pipeline ``` -{% include copy-curl.html %} +GET _ingest/pipeline +``` + +Returns a single ingest pipeline based on the pipeline's ID. + +``` +GET _ingest/pipeline/{id} +``` + +## URL parameters + +All parameters are optional. -The response contains the pipeline information: +Parameter | Type | Description +:--- | :--- | :--- +master_timeout | time | How long to wait for a connection to the master node. + +## Response ```json { - "my-pipeline": { - "description": "This pipeline processes student data", - "processors": [ + "pipeline-id" : { + "description" : "A description for your pipeline", + "processors" : [ { - "set": { - "description": "Sets the graduation year to 2023", - "field": "grad_year", - "value": 2023 - } - }, - { - "set": { - "description": "Sets graduated to true", - "field": "graduated", - "value": true - } - }, - { - "uppercase": { - "field": "name" + "set" : { + "field" : "field-name", + "value" : "value" } } ] } } -``` +``` \ No newline at end of file diff --git a/_api-reference/ingest-apis/index.md b/_api-reference/ingest-apis/index.md index 462c699fc2..1df68b70cc 100644 --- a/_api-reference/ingest-apis/index.md +++ b/_api-reference/ingest-apis/index.md @@ -9,13 +9,6 @@ redirect_from: # Ingest APIs -Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work together with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to process or transform data from a variety of sources and in a variety of formats. +Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing ingest pipelines. Pipelines consist of **processors**, customizable tasks that run in the order they appear in the request body. The transformed data appears in your index after each of the processor completes. -## Ingest pipeline APIs - -Simplify, secure, and scale your OpenSearch data ingestion with the following APIs: - -- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration. -- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration. -- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/): Use this pipeline to test a pipeline configuration. -- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/): Use this API to delete a pipeline configuration. +Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For more information on setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). diff --git a/_api-reference/ingest-apis/ingest-pipelines.md b/_api-reference/ingest-apis/ingest-pipelines.md deleted file mode 100644 index 38ea3fc7d5..0000000000 --- a/_api-reference/ingest-apis/ingest-pipelines.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -layout: default -title: Ingest pipelines -parent: Ingest APIs -has_children: true -nav_order: 5 ---- - -# Ingest pipelines - -An _ingest pipeline_ is a sequence of _processors_ that are applied to documents as they are ingested into an index. Each [processor]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) in a pipeline performs a specific task, such as filtering, transforming, or enriching data. - -Processors are customizable tasks that run in a sequential order as they appear in the request body. This order is important, as each processor depends on the output of the previous processor. The modified documents appear in your index after the processors are applied. - -Ingest pipelines can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). -{: .note} - -## Prerequisites - -The following are prerequisites for using OpenSearch ingest pipelines: - -- When using ingestion in a production environment, your cluster should contain at least one node with the node roles permission set to `ingest`. For information about setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). -- If the OpenSearch Security plugin is enabled, you must have the `cluster_manage_pipelines` permission to manage ingest pipelines. - -## Define a pipeline - -A _pipeline definition_ describes the sequence of an ingest pipeline and can be written in JSON format. An ingest pipeline consists of the following: - -```json -{ - "description" : "..." - "processors" : [...] -} -``` - -### Request body fields - -Field | Required | Type | Description -:--- | :--- | :--- | :--- -`processors` | Required | Array of processor objects | A component that performs a specific data processing task as the data is being ingested into OpenSearch. -`description` | Optional | String | A description of the ingest pipeline. - -## Next steps - -Learn how to: - -- [Create a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/). -- [Test a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/). -- [Retrieve information about a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/). -- [Delete a pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/). diff --git a/_api-reference/ingest-apis/ingest-processors.md b/_api-reference/ingest-apis/ingest-processors.md deleted file mode 100644 index 5a9a5e0d41..0000000000 --- a/_api-reference/ingest-apis/ingest-processors.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -layout: default -title: Ingest processors -parent: Ingest APIs -nav_order: 10 -has_children: true ---- - -# Ingest processors - -Ingest processors are a core component of [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) because they preprocess documents before indexing. For example, you can remove fields, extract values from text, convert data formats, or append additional information. - -OpenSearch provides a standard set of ingest processors within your OpenSearch installation. For a list of processors available in OpenSearch, use the [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/) API operation: - -```json -GET /_nodes/ingest?filter_path=nodes.*.ingest.processors -``` -{% include copy-curl.html %} - -To set up and deploy ingest processors, make sure you have the necessary permissions and access rights. See [Security plugin REST API]({{site.url}}{{site.baseurl}}/security/access-control/api/) to learn more. -{:.note} - -Processor types and their required or optional parameters vary depending on your specific use case. See the [Ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) section to learn more about the processor types and defining and configuring them within a pipeline. diff --git a/_api-reference/ingest-apis/ip2geo.md b/_api-reference/ingest-apis/ip2geo.md new file mode 100644 index 0000000000..b12c106ae5 --- /dev/null +++ b/_api-reference/ingest-apis/ip2geo.md @@ -0,0 +1,249 @@ +--- +layout: default +title: IP2Geo +parent: Ingest processors +grand_parent: Ingest APIs +nav_order: 130 +--- + +# IP2Geo +Introduced 2.10 +{: .label .label-purple } + +The `ip2geo` processor adds information about the geographical location of an IPv4 or IPv6 address. The `ip2geo` processor uses IP geolocation (GeoIP) data from an external endpoint and therefore requires an additional component, `datasource`, that defines from where to download GeoIP data and how frequently to update the data. + +The `ip2geo` processor maintains the GeoIP data mapping in system indexes. The GeoIP mapping is retrieved from these indexes during data ingestion to perform the IP-to-geolocation conversion on the incoming data. For optimal performance, it is preferable to have a node with both ingest and data roles, as this configuration avoids internode calls reducing latency. Also, as the `ip2geo` processor searches GeoIP mapping data from the indexes, search performance is impacted. +{: .note} + +## Getting started + +To get started with the `ip2geo` processor, the `opensearch-geospatial` plugin must be installed. See [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) to learn more. + +## Cluster settings + +The IP2Geo data source and `ip2geo` processor node settings are listed in the following table. + +| Key | Description | Default | +|--------------------|-------------|---------| +| plugins.geospatial.ip2geo.datasource.endpoint | Default endpoint for creating the data source API. | Defaults to https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json. | +| plugins.geospatial.ip2geo.datasource.update_interval_in_days | Default update interval for creating the data source API. | Defaults to 3. | +| plugins.geospatial.ip2geo.datasource.batch_size | Maximum number of documents to ingest in a bulk request during the IP2Geo data source creation process. | Defaults to 10,000. | +| plugins.geospatial.ip2geo.processor.cache_size | Maximum number of results that can be cached. There is only one cache used for all IP2Geo processors in each node | Defaults to 1,000. | +|-------------------|-------------|---------| + +## Creating the IP2Geo data source + +Before creating the pipeline that uses the `ip2geo` processor, create the IP2Geo data source. The data source defines the endpoint value that will download GeoIP data and specifies the update interval. + +OpenSearch provides the following endpoints for GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN databases from [MaxMind](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data), which is shared under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license: + +* GeoLite2 City: https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json +* GeoLite2 Country: https://geoip.maps.opensearch.org/v1/geolite2-country/manifest.json +* GeoLite2 ASN: https://geoip.maps.opensearch.org/v1/geolite2-asn/manifest.json + +If an OpenSearch cluster cannot update a data source from the endpoints within 30 days, the cluster does not add GeoIP data to the documents and instead adds `"error":"ip2geo_data_expired"`. + +### Data source options + +The following table lists the data source options for the `ip2geo` processor. + +| Name | Required | Default | Description | +|------|----------|---------|-------------| +| `endpoint` | Optional | https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json | The endpoint that downloads the GeoIP data. | +| `update_interval_in_days` | Optional | 3 | How frequently, in days, the GeoIP data is updated. The minimum value is 1. | + +To create an IP2Geo data source, run the following query: + +```json +PUT /_plugins/geospatial/ip2geo/datasource/my-datasource +{ + "endpoint" : "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json", + "update_interval_in_days" : 3 +} +``` +{% include copy-curl.html %} + +A `true` response means that the request was successful and that the server was able to process the request. A `false` response indicates that you should check the request to make sure it is valid, check the URL to make sure it is correct, or try again. + +### Sending a GET request + +To get information about one or more IP2Geo data sources, send a GET request: + +```json +GET /_plugins/geospatial/ip2geo/datasource/my-datasource +``` +{% include copy-curl.html %} + +You'll receive the following response: + +```json +{ + "datasources": [ + { + "name": "my-datasource", + "state": "AVAILABLE", + "endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json", + "update_interval_in_days": 3, + "next_update_at_in_epoch_millis": 1685125612373, + "database": { + "provider": "maxmind", + "sha256_hash": "0SmTZgtTRjWa5lXR+XFCqrZcT495jL5XUcJlpMj0uEA=", + "updated_at_in_epoch_millis": 1684429230000, + "valid_for_in_days": 30, + "fields": [ + "country_iso_code", + "country_name", + "continent_name", + "region_iso_code", + "region_name", + "city_name", + "time_zone", + "location" + ] + }, + "update_stats": { + "last_succeeded_at_in_epoch_millis": 1684866730192, + "last_processing_time_in_millis": 317640, + "last_failed_at_in_epoch_millis": 1684866730492, + "last_skipped_at_in_epoch_millis": 1684866730292 + } + } + ] +} +``` + +### Updating an IP2Geo data source + +See the Creating the IP2Geo data source section for a list of endpoints and request field descriptions. + +To update the date source, run the following query: + +```json +PUT /_plugins/geospatial/ip2geo/datasource/my-datasource/_settings +{ + "endpoint": https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json, + "update_interval_in_days": 10 +} +``` +{% include copy-curl.html %} + +### Deleting the IP2Geo data source + +To delete the IP2Geo data source, you must first delete all processors associated with the data source. Otherwise, the request fails. + +To delete the data source, run the following query: + +```json +DELETE /_plugins/geospatial/ip2geo/datasource/my-datasource +``` +{% include copy-curl.html %} + +## Creating the pipeline + +Once the data source is created, you can create the pipeline. The following is the syntax for the `ip2geo` processor: + +```json +{ + "ip2geo": { + "field":"ip", + "datasource":"my-datasource" + } +} +``` +{% include copy-curl.html %} + +### Configuration parameters + +The following table lists the required and optional parameters for the `ip2geo` processor. + +| Name | Required | Default | Description | +|------|----------|---------|-------------| +| `datasource` | Required | - | The data source name to use to retrieve geographical information. | +| `field` | Required | - | The field that contains the IP address for geographical lookup. | +| `ignore_missing` | Optional | false | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | +| `properties` | Optional | All fields in `datasource`. | The field that controls which properties are added to `target_field` from `datasource`. | +| `target_field` | Optional | ip2geo | The field that contains the geographical information retrieved from the data source. | + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +**Step 1: Create a pipeline.** + +The following query creates a pipeline, named `my-pipeline`, that converts the IP address to geographical information: + +```json +PUT /_ingest/pipeline/my-pipeline +{ + "description":"convert ip to geo", + "processors":[ + { + "ip2geo":{ + "field":"ip", + "datasource":"my-datasource" + } + } + ] +} +``` +{% include copy-curl.html %} + +**Step 2 (Optional): Test the pipeline.** + +It is recommended that you test your pipeline before you ingest documents. +{: .tip} + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/my-id/_simulate +{ + "docs": [ + { + "_index":"my-index", + "_id":"my-id", + "_source":{ + "my_ip_field":"172.0.0.1", + "ip2geo":{ + "continent_name":"North America", + "region_iso_code":"AL", + "city_name":"Calera", + "country_iso_code":"US", + "country_name":"United States", + "region_name":"Alabama", + "location":"33.1063,-86.7583", + "time_zone":"America/Chicago" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +#### Response + +The following response confirms that the pipeline is working as expected: + + + +**Step 3: Ingest a document.** + +The following query ingests a document into an index named `my-index`: + +```json +PUT /my-index/_doc/my-id?pipeline=ip2geo +{ + "ip": "172.0.0.1" +} +``` +{% include copy-curl.html %} + +**Step 4 (Optional): Retrieve the document.** + +To retrieve the document, run the following query: + +```json +GET /my-index/_doc/my-id +``` +{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/pipeline-failures.md b/_api-reference/ingest-apis/pipeline-failures.md deleted file mode 100644 index f8814f39c2..0000000000 --- a/_api-reference/ingest-apis/pipeline-failures.md +++ /dev/null @@ -1,134 +0,0 @@ ---- -layout: default -title: Handling pipeline failures -parent: Ingest pipelines -grand_parent: Ingest APIs -nav_order: 15 ---- - -# Handling pipeline failures - -Each ingest pipeline consists of a series of processors that are applied to the documents in sequence. If a processor fails, the entire pipeline will fail. You have two options for handling failures: - -- **Fail the entire pipeline:** If a processor fails, the entire pipeline will fail and the document will not be indexed. -- **Fail the current processor and continue with the next processor:** This can be useful if you want to continue processing the document even if one of the processors fails. - -By default, an ingest pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: - -```json -PUT _ingest/pipeline/my-pipeline/ -{ - "description": "Rename 'provider' field to 'cloud.provider'", - "processors": [ - { - "rename": { - "field": "provider", - "target_field": "cloud.provider", - "ignore_failure": true - } - } - ] -} -``` -{% include copy-curl.html %} - -You can specify the `on_failure` parameter to run immediately after a processor fails. If you have specified `on_failure`, OpenSearch will run the other processors in the pipeline even if the `on_failure` configuration is empty: - -```json -PUT _ingest/pipeline/my-pipeline/ -{ - "description": "Add timestamp to the document", - "processors": [ - { - "date": { - "field": "timestamp_field", - "formats": ["yyyy-MM-dd HH:mm:ss"], - "target_field": "@timestamp", - "on_failure": [ - { - "set": { - "field": "ingest_error", - "value": "failed" - } - } - ] - } - } - ] -} -``` -{% include copy-curl.html %} - -If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [ingest pipeline metrics]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/#ingest-pipeline-metrics). -{: tip} - -## Ingest pipeline metrics - -To view ingest pipeline metrics, use the [Nodes Stats API]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-stats/): - -```json -GET /_nodes/stats/ingest?filter_path=nodes.*.ingest -``` -{% include copy-curl.html %} - -The response contains statistics for all ingest pipelines, for example: - -```json - { - "nodes": { - "iFPgpdjPQ-uzTdyPLwQVnQ": { - "ingest": { - "total": { - "count": 28, - "time_in_millis": 82, - "current": 0, - "failed": 9 - }, - "pipelines": { - "user-behavior": { - "count": 5, - "time_in_millis": 0, - "current": 0, - "failed": 0, - "processors": [ - { - "append": { - "type": "append", - "stats": { - "count": 5, - "time_in_millis": 0, - "current": 0, - "failed": 0 - } - } - } - ] - }, - "remove_ip": { - "count": 5, - "time_in_millis": 9, - "current": 0, - "failed": 2, - "processors": [ - { - "remove": { - "type": "remove", - "stats": { - "count": 5, - "time_in_millis": 8, - "current": 0, - "failed": 2 - } - } - } - ] - } - } - } - } - } -} -``` - -**Troubleshooting ingest pipeline failures:** The first thing you should do is check the logs to see whether there are any errors or warnings that can help you identify the cause of the failure. OpenSearch logs contain information about the ingest pipeline that failed, including the processor that failed and the reason for the failure. -{: .tip} diff --git a/_api-reference/ingest-apis/processors/append.md b/_api-reference/ingest-apis/processors/append.md deleted file mode 100644 index dee484a6aa..0000000000 --- a/_api-reference/ingest-apis/processors/append.md +++ /dev/null @@ -1,147 +0,0 @@ ---- -layout: default -title: Append -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 10 ---- - -# Append - -The `append` processor is used to add values to a field: -- If the field is an array, the `append` processor appends the specified values to that array. -- If the field is a scalar field, the `append` processor converts it to an array and appends the specified values to that array. -- If the field does not exist, the `append` processor creates an array with the specified values. - -The following is the syntax for the `append` processor: - -```json -{ - "append": { - "field": "your_target_field", - "value": ["your_appended_value"] - } -} -``` -{% include copy-curl.html %} - -## Configuration parameters - -The following table lists the required and optional parameters for the `append` processor. - -Parameter | Required | Description | -|-----------|-----------|-----------| -`field` | Required | The name of the field to which the data should be appended. Supports template snippets.| -`value` | Required | The value to be appended. This can be a static value or a dynamic value derived from existing fields. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `user-behavior`, that has one append processor. It appends the `page_view` of each new document ingested into OpenSearch to an array field named `event_types`: - -```json -PUT _ingest/pipeline/user-behavior -{ - "description": "Pipeline that appends event type", - "processors": [ - { - "append": { - "field": "event_types", - "value": ["page_view"] - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/user-behavior/_simulate -{ - "docs":[ - { - "_source":{ - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "_index", - "_id": "_id", - "_source": { - "event_types": [ - "page_view" - ] - }, - "_ingest": { - "timestamp": "2023-08-28T16:55:10.621805166Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=user-behavior -{ -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} - -Because the document does not contain an `event_types` field, an array field is created and the event is appended to the array: - -```json -{ - "_index": "testindex1", - "_id": "1", - "_version": 2, - "_seq_no": 1, - "_primary_term": 1, - "found": true, - "_source": { - "event_types": [ - "page_view" - ] - } -} -``` diff --git a/_api-reference/ingest-apis/processors/bytes.md b/_api-reference/ingest-apis/processors/bytes.md deleted file mode 100644 index 7d07766cbd..0000000000 --- a/_api-reference/ingest-apis/processors/bytes.md +++ /dev/null @@ -1,134 +0,0 @@ ---- -layout: default -title: Bytes -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 20 ---- - -# Bytes - -The `bytes` processor converts a human-readable byte value to its equivalent value in bytes. The field can be a scalar or an array. If the field is a scalar, the value is converted and stored in the field. If the field is an array, all values of the array are converted. - -The following is the syntax for the `bytes` processor: - -```json -{ - "bytes": { - "field": "your_field_name" - } -} -``` -{% include copy-curl.html %} - -## Configuration parameters - -The following table lists the required and optional parameters for the `bytes` processor. - -Parameter | Required | Description | -|-----------|-----------|-----------| -`field` | Required | The name of the field where the data should be converted. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in place in the `field` field. Default is `field`. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `file_upload`, that has one `bytes` processor. It converts the `file_size` to its byte equivalent and stores it in a new field named `file_size_bytes`: - -```json -PUT _ingest/pipeline/file_upload -{ - "description": "Pipeline that converts file size to bytes", - "processors": [ - { - "bytes": { - "field": "file_size", - "target_field": "file_size_bytes" - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/file_upload/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "file_size_bytes": "10485760", - "file_size": - "10MB" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "event_types": [ - "event_type" - ], - "file_size_bytes": "10485760", - "file_size": "10MB" - }, - "_ingest": { - "timestamp": "2023-08-22T16:09:42.771569211Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=file_upload -{ - "file_size": "10MB" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/convert.md b/_api-reference/ingest-apis/processors/convert.md deleted file mode 100644 index 5b12c8e931..0000000000 --- a/_api-reference/ingest-apis/processors/convert.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -layout: default -title: Convert -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 30 ---- - -# Convert - -The `convert` processor converts a field in a document to a different type, for example, a string to an integer or an integer to a string. For an array field, all values in the array are converted. The following is the syntax for the `convert` processor: - -```json -{ - "convert": { - "field": "field_name", - "type": "type-value" - } -} -``` -{% include copy-curl.html %} - -## Configuration parameters - -The following table lists the required and optional parameters for the `convert` processor. - -Parameter | Required | Description | -|-----------|-----------|-----------| -`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | -`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`ignore_missing` | Optional | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. If not specified, the value will be stored in the `field` field. Default is `field`. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `convert-price`, that converts `price` to a floating-point number, stores the converted value in the `price_float` field, and sets the value to `0` if it is less than `0`: - -```json -PUT _ingest/pipeline/convert-price -{ - "description": "Pipeline that converts price to floating-point number and sets value to zero if price less than zero", - "processors": [ - { - "convert": { - "field": "price", - "type": "float", - "target_field": "price_float" - } - }, - { - "set": { - "field": "price", - "value": "0", - "if": "ctx.price_float < 0" - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/convert-price/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "price": "-10.5" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "price_float": -10.5, - "price": "0" - }, - "_ingest": { - "timestamp": "2023-08-22T15:38:21.180688799Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=convert-price -{ - "price": "10.5" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/csv.md b/_api-reference/ingest-apis/processors/csv.md deleted file mode 100644 index e4009e162b..0000000000 --- a/_api-reference/ingest-apis/processors/csv.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -layout: default -title: CSV -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 40 ---- - -# CSV - -The `csv` processor is used to parse CSVs and store them as individual fields in a document. The processor ignores empty fields. The following is the syntax for the `csv` processor: - -```json -{ - "csv": { - "field": "field_name", - "target_fields": ["field1, field2, ..."] - } -} -``` -{% include copy-curl.html %} - -## Configuration parameters - -The following table lists the required and optional parameters for the `csv` processor. - -Parameter | Required | Description | -|-----------|-----------|-----------| -`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | -`target_fields` | Required | The name of the field in which to store the parsed data. | -`description` | Optional | A brief description of the processor. | -`empty_value` | Optional | Represents optional parameters that are not required or are not applicable. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`ignore_missing` | Optional | If set to `true`, the processor will not fail if the field does not exist. Default is `true`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`quote` | Optional | The character used to quote fields in the CSV data. Default is `"`. | -`separator` | Optional | The delimiter used to separate the fields in the CSV data. Default is `,`. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`trim` | Optional | If set to `true`, the processor trims white space from the beginning and end of the text. Default is `false`. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `csv-processor`, that splits `resource_usage` into three new fields named `cpu_usage`, `memory_usage`, and `disk_usage`: - -```json -PUT _ingest/pipeline/csv-processor -{ - "description": "Split resource usage into individual fields", - "processors": [ - { - "csv": { - "field": "resource_usage", - "target_fields": ["cpu_usage", "memory_usage", "disk_usage"], - "separator": "," - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/csv-processor/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "resource_usage": "25,4096,10", - "memory_usage": "4096", - "disk_usage": "10", - "cpu_usage": "25" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "memory_usage": "4096", - "disk_usage": "10", - "resource_usage": "25,4096,10", - "cpu_usage": "25" - }, - "_ingest": { - "timestamp": "2023-08-22T16:40:45.024796379Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=csv-processor -{ - "resource_usage": "25,4096,10" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/date.md b/_api-reference/ingest-apis/processors/date.md deleted file mode 100644 index 46e9b9115f..0000000000 --- a/_api-reference/ingest-apis/processors/date.md +++ /dev/null @@ -1,135 +0,0 @@ ---- -layout: default -title: Date -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 50 ---- - -# Date - -The `date` processor is used to parse dates from document fields and to add the parsed data to a new field. By default, the parsed data is stored in the `@timestamp` field. The following is the syntax for the `date` processor: - -```json -{ - "date": { - "field": "date_field", - "formats": ["yyyy-MM-dd'T'HH:mm:ss.SSSZZ"] - } -} -``` -{% include copy-curl.html %} - -## Configuration parameters - -The following table lists the required and optional parameters for the `date` processor. - -Parameter | Required | Description | -|-----------|-----------|-----------| -`field` | Required | The name of the field to which the data should be converted. Supports template snippets. | -`formats` | Required | An array of the expected date formats. Can be a [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`locale` | Optional | The locale to use when parsing the date. Default is `ENGLISH`. Supports template snippets. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`output_format` | Optional | The [date format]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/#formats) to use for the target field. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSZZ`. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. Default target field is `@timestamp`. | -`timezone` | Optional | The time zone to use when parsing the date. Default is `UTC`. Supports template snippets. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `date-output-format`, that uses the `date` processor to convert from European date format to US date format, adding the new field `date_us` with the desired `output_format`: - -```json -PUT /_ingest/pipeline/date-output-format -{ - "description": "Pipeline that converts European date format to US date format", - "processors": [ - { - "date": { - "field" : "date_european", - "formats" : ["dd/MM/yyyy", "UNIX"], - "target_field": "date_us", - "output_format": "MM/dd/yyy", - "timezone" : "UTC" - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/date-output-format/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "date_us": "06/30/2023", - "date_european": "30/06/2023" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "date_us": "06/30/2023", - "date_european": "30/06/2023" - }, - "_ingest": { - "timestamp": "2023-08-22T17:08:46.275195504Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=date-output-format -{ - "date_european": "30/06/2023" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/lowercase.md b/_api-reference/ingest-apis/processors/lowercase.md deleted file mode 100644 index 535875ff7d..0000000000 --- a/_api-reference/ingest-apis/processors/lowercase.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -layout: default -title: Lowercase -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 210 ---- - -# Lowercase - -The `lowercase` processor converts all the text in a specific field to lowercase letters. The following is the syntax for the `lowercase` processor: - -```json -{ - "lowercase": { - "field": "field_name" - } -} -``` -{% include copy-curl.html %} - -#### Configuration parameters - -The following table lists the required and optional parameters for the `lowercase` processor. - -| Name | Required | Description | -|---|---|---| -`field` | Required | The name of the field that contains the data to be converted. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. Default is `field`. By default, `field` is updated in place. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `lowercase-title`, that uses the `lowercase` processor to lowercase the `title` field of a document: - -```json -PUT _ingest/pipeline/lowercase-title -{ - "description" : "Pipeline that lowercases the title field", - "processors" : [ - { - "lowercase" : { - "field" : "title" - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/lowercase-title/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "title": "WAR AND PEACE" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "title": "war and peace" - }, - "_ingest": { - "timestamp": "2023-08-22T17:39:39.872671834Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=lowercase-title -{ - "title": "WAR AND PEACE" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/remove.md b/_api-reference/ingest-apis/processors/remove.md deleted file mode 100644 index db233a0b08..0000000000 --- a/_api-reference/ingest-apis/processors/remove.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -layout: default -title: Remove -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 230 ---- - -# Remove - -The `remove` processor is used to remove a field from a document. The following is the syntax for the `remove` processor: - -```json -{ - "remove": { - "field": "field_name" - } -} -``` -{% include copy-curl.html %} - -#### Configuration parameters - -The following table lists the required and optional parameters for the `remove` processor. - -| Name | Required | Description | -|---|---|---| -`field` | Required | The name of the field to which the data should be appended. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `remove_ip`, that removes the `ip_address` field from a document: - -```json -PUT /_ingest/pipeline/remove_ip -{ - "description": "Pipeline that excludes the ip_address field.", - "processors": [ - { - "remove": { - "field": "ip_address" - } - } - ] -} -``` -{% include copy-curl.html %} - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/remove_ip/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source":{ - "ip_address": "203.0.113.1", - "name": "John Doe" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "name": "John Doe" - }, - "_ingest": { - "timestamp": "2023-08-24T18:02:13.218986756Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PPUT testindex1/_doc/1?pipeline=remove_ip -{ - "ip_address": "203.0.113.1", - "name": "John Doe" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/processors/uppercase.md b/_api-reference/ingest-apis/processors/uppercase.md deleted file mode 100644 index 6ea5ebb137..0000000000 --- a/_api-reference/ingest-apis/processors/uppercase.md +++ /dev/null @@ -1,125 +0,0 @@ ---- -layout: default -title: Uppercase -parent: Ingest processors -grand_parent: Ingest APIs -nav_order: 310 ---- - -# Uppercase - -The `uppercase` processor converts all the text in a specific field to uppercase letters. The following is the syntax for the `uppercase` processor: - -```json -{ - "uppercase": { - "field": "field_name" - } -} -``` -{% include copy-curl.html %} - -#### Configuration parameters - -The following table lists the required and optional parameters for the `uppercase` processor. - -| Name | Required | Description | -|---|---|---| -`field` | Required | The name of the field to which the data should be appended. Supports template snippets. | -`description` | Optional | A brief description of the processor. | -`if` | Optional | A condition for running this processor. | -`ignore_failure` | Optional | If set to `true`, failures are ignored. Default is `false`. | -`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not have the specified field. Default is `false`. | -`on_failure` | Optional | A list of processors to run if the processor fails. | -`tag` | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. | -`target_field` | Optional | The name of the field in which to store the parsed data. Default is `field`. By default, `field` is updated in place. | - -## Using the processor - -Follow these steps to use the processor in a pipeline. - -**Step 1: Create a pipeline.** - -The following query creates a pipeline, named `uppercase`, that converts the text in the `field` field to uppercase: - -```json -PUT _ingest/pipeline/uppercase -{ - "processors": [ - { - "uppercase": { - "field": "name" - } - } - ] -} -``` -{% include copy-curl.html %} - - -**Step 2 (Optional): Test the pipeline.** - -It is recommended that you test your pipeline before you ingest documents. -{: .tip} - -To test the pipeline, run the following query: - -```json -POST _ingest/pipeline/uppercase/_simulate -{ - "docs": [ - { - "_index": "testindex1", - "_id": "1", - "_source": { - "name": "John" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The following example response confirms that the pipeline is working as expected: - -```json -{ - "docs": [ - { - "doc": { - "_index": "testindex1", - "_id": "1", - "_source": { - "name": "JOHN" - }, - "_ingest": { - "timestamp": "2023-08-28T19:54:42.289624792Z" - } - } - } - ] -} -``` - -**Step 3: Ingest a document.** - -The following query ingests a document into an index named `testindex1`: - -```json -PUT testindex1/_doc/1?pipeline=uppercase -{ - "name": "John" -} -``` -{% include copy-curl.html %} - -**Step 4 (Optional): Retrieve the document.** - -To retrieve the document, run the following query: - -```json -GET testindex1/_doc/1 -``` -{% include copy-curl.html %} diff --git a/_api-reference/ingest-apis/simulate-ingest.md b/_api-reference/ingest-apis/simulate-ingest.md index 9ca40b791c..e8d858134f 100644 --- a/_api-reference/ingest-apis/simulate-ingest.md +++ b/_api-reference/ingest-apis/simulate-ingest.md @@ -1,119 +1,147 @@ --- layout: default -title: Simulate pipeline -parent: Ingest pipelines -grand_parent: Ingest APIs -nav_order: 11 +title: Simulate an ingest pipeline +parent: Ingest APIs +nav_order: 13 redirect_from: - /opensearch/rest-api/ingest-apis/simulate-ingest/ --- -# Simulate pipeline +# Simulate a pipeline -Use the simulate ingest pipeline API operation to run or test the pipeline. +Simulates an ingest pipeline with any example documents you specify. + +## Example + +``` +POST /_ingest/pipeline/35678/_simulate +{ + "docs": [ + { + "_index": "index", + "_id": "id", + "_source": { + "location": "document-name" + } + }, + { + "_index": "index", + "_id": "id", + "_source": { + "location": "document-name" + } + } + ] +} +``` +{% include copy-curl.html %} ## Path and HTTP methods -The following requests **simulate the latest ingest pipeline created**: +Simulate the last ingest pipeline created. ``` GET _ingest/pipeline/_simulate POST _ingest/pipeline/_simulate ``` -{% include copy-curl.html %} -The following requests **simulate a single pipeline based on the pipeline ID**: +Simulate a single pipeline based on the pipeline's ID. ``` -GET _ingest/pipeline//_simulate -POST _ingest/pipeline//_simulate +GET _ingest/pipeline/{id}/_simulate +POST _ingest/pipeline/{id}/_simulate ``` -{% include copy-curl.html %} -## Request body fields +## URL parameters -The following table lists the request body fields used to run a pipeline. +All URL parameters are optional. -Field | Required | Type | Description -:--- | :--- | :--- | :--- -`docs` | Required | Array | The documents to be used to test the pipeline. -`pipeline` | Optional | Object | The pipeline to be simulated. If the pipeline identifier is not included, then the response simulates the latest pipeline created. +Parameter | Type | Description +:--- | :--- | :--- +verbose | boolean | Verbose mode. Display data output for each processor in executed pipeline. -The `docs` field can include subfields listed in the following table. +## Request body fields Field | Required | Type | Description :--- | :--- | :--- | :--- -`source` | Required | Object | The document's JSON body. -`id` | Optional | String | A unique document identifier. The identifier cannot be used elsewhere in the index. -`index` | Optional | String | The index where the document's transformed data appears. - -## Query parameters +`pipeline` | Optional | object | The pipeline you want to simulate. When included without the pipeline `{id}` inside the request path, the response simulates the last pipeline created. +`docs` | Required | array of objects | The documents you want to use to test the pipeline. -The following table lists the query parameters for running a pipeline. +The `docs` field can include the following subfields: -Parameter | Type | Description +Field | Required | Type | Description :--- | :--- | :--- -`verbose` | Boolean | Verbose mode. Display data output for each processor in the executed pipeline. +`id` | Optional |string | An optional identifier for the document. The identifier cannot be used elsewhere in the index. +`index` | Optional | string | The index where the document's transformed data appears. +`source` | Required | object | The document's JSON body. + +## Response -#### Example: Specify a pipeline in the path +Responses vary based on which path and HTTP method you choose. + +### Specify pipeline in request body ```json -POST /_ingest/pipeline/my-pipeline/_simulate { - "docs": [ + "docs" : [ { - "_index": "my-index", - "_id": "1", - "_source": { - "grad_year": 2024, - "graduated": false, - "name": "John Doe" + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "location" : "new-new", + "field2" : "_value" + }, + "_ingest" : { + "timestamp" : "2022-02-07T18:47:57.479230835Z" + } } }, { - "_index": "my-index", - "_id": "2", - "_source": { - "grad_year": 2025, - "graduated": false, - "name": "Jane Doe" + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "location" : "new-new", + "field2" : "_value" + }, + "_ingest" : { + "timestamp" : "2022-02-07T18:47:57.47933496Z" + } } } ] } ``` -{% include copy-curl.html %} -The request returns the following response: +### Specify pipeline ID inside HTTP path ```json { - "docs": [ + "docs" : [ { - "doc": { - "_index": "my-index", - "_id": "1", - "_source": { - "name": "JOHN DOE", - "grad_year": 2023, - "graduated": true + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "field-name" : "value", + "location" : "document-name" }, - "_ingest": { - "timestamp": "2023-06-20T23:19:54.635306588Z" + "_ingest" : { + "timestamp" : "2022-02-03T21:47:05.382744877Z" } } }, { - "doc": { - "_index": "my-index", - "_id": "2", - "_source": { - "name": "JANE DOE", - "grad_year": 2023, - "graduated": true + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "field-name" : "value", + "location" : "document-name" }, - "_ingest": { - "timestamp": "2023-06-20T23:19:54.635746046Z" + "_ingest" : { + "timestamp" : "2022-02-03T21:47:05.382803544Z" } } } @@ -121,65 +149,48 @@ The request returns the following response: } ``` -### Example: Verbose mode +### Receive verbose response -When the previous request is run with the `verbose` parameter set to `true`, the response shows the sequence of transformations for each document. For example, for the document with the ID `1`, the response contains the results of applying each processor in the pipeline in sequence: +With the `verbose` parameter set to `true`, the response shows how each processor transforms the specified document. ```json { - "docs": [ + "docs" : [ { - "processor_results": [ + "processor_results" : [ { - "processor_type": "set", - "status": "success", - "description": "Sets the graduation year to 2023", - "doc": { - "_index": "my-index", - "_id": "1", - "_source": { - "name": "John Doe", - "grad_year": 2023, - "graduated": false + "processor_type" : "set", + "status" : "success", + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "field-name" : "value", + "location" : "document-name" }, - "_ingest": { - "pipeline": "my-pipeline", - "timestamp": "2023-06-20T23:23:26.656564631Z" + "_ingest" : { + "pipeline" : "35678", + "timestamp" : "2022-02-03T21:45:09.414049004Z" } } - }, - { - "processor_type": "set", - "status": "success", - "description": "Sets 'graduated' to true", - "doc": { - "_index": "my-index", - "_id": "1", - "_source": { - "name": "John Doe", - "grad_year": 2023, - "graduated": true - }, - "_ingest": { - "pipeline": "my-pipeline", - "timestamp": "2023-06-20T23:23:26.656564631Z" - } - } - }, + } + ] + }, + { + "processor_results" : [ { - "processor_type": "uppercase", - "status": "success", - "doc": { - "_index": "my-index", - "_id": "1", - "_source": { - "name": "JOHN DOE", - "grad_year": 2023, - "graduated": true + "processor_type" : "set", + "status" : "success", + "doc" : { + "_index" : "index", + "_id" : "id", + "_source" : { + "field-name" : "value", + "location" : "document-name" }, - "_ingest": { - "pipeline": "my-pipeline", - "timestamp": "2023-06-20T23:23:26.656564631Z" + "_ingest" : { + "pipeline" : "35678", + "timestamp" : "2022-02-03T21:45:09.414093212Z" } } } @@ -187,89 +198,4 @@ When the previous request is run with the `verbose` parameter set to `true`, the } ] } -``` - -### Example: Specify a pipeline in the request body - -Alternatively, you can specify a pipeline directly in the request body without first creating a pipeline: - -```json -POST /_ingest/pipeline/_simulate -{ - "pipeline" : - { - "description": "Splits text on whitespace characters", - "processors": [ - { - "csv" : { - "field" : "name", - "separator": ",", - "target_fields": ["last_name", "first_name"], - "trim": true - } - }, - { - "uppercase": { - "field": "last_name" - } - } - ] - }, - "docs": [ - { - "_index": "second-index", - "_id": "1", - "_source": { - "name": "Doe,John" - } - }, - { - "_index": "second-index", - "_id": "2", - "_source": { - "name": "Doe, Jane" - } - } - ] -} -``` -{% include copy-curl.html %} - -#### Response - -The request returns the following response: - -```json -{ - "docs": [ - { - "doc": { - "_index": "second-index", - "_id": "1", - "_source": { - "name": "Doe,John", - "last_name": "DOE", - "first_name": "John" - }, - "_ingest": { - "timestamp": "2023-08-24T19:20:44.816219673Z" - } - } - }, - { - "doc": { - "_index": "second-index", - "_id": "2", - "_source": { - "name": "Doe, Jane", - "last_name": "DOE", - "first_name": "Jane" - }, - "_ingest": { - "timestamp": "2023-08-24T19:20:44.816492381Z" - } - } - } - ] -} -``` +``` \ No newline at end of file diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md index 77e5f0d095..eefd2bf555 100644 --- a/_api-reference/nodes-apis/nodes-stats.md +++ b/_api-reference/nodes-apis/nodes-stats.md @@ -106,14 +106,6 @@ GET _nodes/stats/ #### Example response -Select the arrow to view the example response. - -
- - Response - - {: .text-delta} - ```json { "_nodes" : { @@ -517,64 +509,6 @@ Select the arrow to view the example response. }, "pipelines" : { } }, - "search_pipeline" : { - "total_request" : { - "count" : 5, - "time_in_millis" : 158, - "current" : 0, - "failed" : 0 - }, - "total_response" : { - "count" : 2, - "time_in_millis" : 1, - "current" : 0, - "failed" : 0 - }, - "pipelines" : { - "public_info" : { - "request" : { - "count" : 3, - "time_in_millis" : 71, - "current" : 0, - "failed" : 0 - }, - "response" : { - "count" : 0, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - }, - "request_processors" : [ - { - "filter_query:abc" : { - "type" : "filter_query", - "stats" : { - "count" : 1, - "time_in_millis" : 0, - "current" : 0, - "failed" : 0 - } - } - }, - ] - ... - "response_processors" : [ - { - "rename_field" : { - "type" : "rename_field", - "stats" : { - "count" : 2, - "time_in_millis" : 1, - "current" : 0, - "failed" : 0 - } - } - } - ] - }, - ... - } - }, "adaptive_selection" : { "F-ByTQzVQ3GQeYzQJArJGQ" : { "outgoing_searches" : 0, @@ -642,7 +576,6 @@ Select the arrow to view the example response. } } ``` -
## Response fields @@ -685,7 +618,6 @@ http.total_opened | Integer | The total number of HTTP connections the node has [script_cache](#script-and-script_cache)| Object | Script cache statistics for the node. [discovery](#discovery) | Object | Node discovery statistics for the node. [ingest](#ingest) | Object | Ingest statistics for the node. -[search_pipeline](#search_pipeline) | Object | Statistics related to [search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/). [adaptive_selection](#adaptive_selection) | Object | Statistics about adaptive selections for the node. [indexing_pressure](#indexing_pressure) | Object | Statistics related to the node's indexing pressure. [shard_indexing_pressure](#shard_indexing_pressure) | Object | Statistics related to indexing pressure at the shard level. @@ -1004,34 +936,6 @@ pipelines._id_.time_in_millis | Integer | The total amount of time for preproces pipelines._id_.failed | Integer | The total number of failed ingestions for the ingest pipeline. pipelines._id_.processors | Array of objects | Statistics for the ingest processors. Includes the number of documents that are currently transformed, the total number of transformed documents, the number of failed transformations, and the time spent transforming documents. -### `search_pipeline` - -The `search_pipeline` object contains the statistics related to [search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) and has the following properties. - -Field | Field type | Description -:--- | :--- | :--- -total_request | Object | Cumulative statistics related to all search request processors. -total_request.count | Integer | The total number of search request processor executions. -total_request.time_in_millis | Integer | The total amount of time for all search request processor executions, in milliseconds. -total_request.current | Integer | The total number of search request processor executions currently in progress. -total_request.failed | Integer | The total number of failed search request processor executions. -total_response | Object | Cumulative statistics related to all search response processors. -total_response.count | Integer | The total number of search response processor executions. -total_response.time_in_millis | Integer | The total amount of time for all search response processor executions, in milliseconds. -total_response.current | Integer | The total number of search response processor executions currently in progress. -total_response.failed | Integer | The total number of failed search response processor executions. -pipelines | Object | Search pipeline statistics. Each pipeline is a nested object specified by its ID, with the properties listed in the following rows. If a processor has a `tag`, statistics for the processor are provided in the object with the name `:` (for example, `filter_query:abc`). Statistics for all processors of the same type that do not have a `tag` are aggregated and provided in the object with the name `` (for example, `filter_query`). -pipelines._id_.request.count | Integer | The number of search request processor executions performed by the search pipeline. -pipelines._id_.request.time_in_millis | Integer | The total amount of time for search request processor executions in the search pipeline, in milliseconds. -pipelines._id_.request.current | Integer | The number of search request processor executions currently in progress for the search pipeline. -pipelines._id_.request.failed | Integer | The number of failed search request processor executions for the search pipeline. -pipelines._id_.request_processors | Array of objects | Statistics for the search request processors. Includes the total number of executions, the total amount of time of executions, the total number of executions currently in progress, and the number of failed executions. -pipelines._id_.response.count | Integer | The number of search response processor executions performed by the search pipeline. -pipelines._id_.response.time_in_millis | Integer | The total amount of time for search response processor executions in the search pipeline, in milliseconds. -pipelines._id_.response.current | Integer | The number of search response processor executions currently in progress for the search pipeline. -pipelines._id_.response.failed | Integer | The number of failed search response processor executions for the search pipeline. -pipelines._id_.response_processors | Array of objects | Statistics for the search response processors. Includes the total number of executions, the total amount of time of executions, the total number of executions currently in progress, and the number of failed executions. - ### `adaptive_selection` The `adaptive_selection` object contains the adaptive selection statistics. Each entry is specified by the node ID and has the properties listed below. diff --git a/_api-reference/profile.md b/_api-reference/profile.md deleted file mode 100644 index a09b5b8753..0000000000 --- a/_api-reference/profile.md +++ /dev/null @@ -1,756 +0,0 @@ ---- -layout: default -title: Profile -nav_order: 55 ---- - -# Profile - -The Profile API provides timing information about the execution of individual components of a search request. Using the Profile API, you can debug slow requests and understand how to improve their performance. The Profile API does not measure the following: - -- Network latency -- Time spent in the search fetch phase -- Amount of time a request spends in queues -- Idle time while merging shard responses on the coordinating node - -The Profile API is a resource-consuming operation that adds overhead to search operations. -{: .warning} - -#### Example request - -To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint: - -```json -GET /testindex/_search -{ - "profile": true, - "query" : { - "match" : { "title" : "wind" } - } -} -``` -{% include copy-curl.html %} - -To turn on human-readable format, include the `?human=true` query parameter in the request: - -```json -GET /testindex/_search?human=true -{ - "profile": true, - "query" : { - "match" : { "title" : "wind" } - } -} -``` -{% include copy-curl.html %} - -The response contains an additional `time` field with human-readable units, for example: - -```json -"collector": [ - { - "name": "SimpleTopScoreDocCollector", - "reason": "search_top_hits", - "time": "113.7micros", - "time_in_nanos": 113711 - } -] -``` - -The Profile API response is verbose, so if you're running the request through the `curl` command, include the `?pretty` query parameter to make the response easier to understand. -{: .tip} - -#### Example response - -The response contains profiling information: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 21, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 0.19363807, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 0.19363807, - "_source": { - "title": "The wind rises" - } - }, - { - "_index": "testindex", - "_id": "2", - "_score": 0.17225474, - "_source": { - "title": "Gone with the wind", - "description": "A 1939 American epic historical film" - } - } - ] - }, - "profile": { - "shards": [ - { - "id": "[LidyZ1HVS-u93-73Z49dQg][testindex][0]", - "inbound_network_time_in_millis": 0, - "outbound_network_time_in_millis": 0, - "searches": [ - { - "query": [ - { - "type": "BooleanQuery", - "description": "title:wind title:rise", - "time_in_nanos": 2473919, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 5209, - "match": 0, - "next_doc_count": 2, - "score_count": 2, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 9209, - "advance_count": 2, - "score": 20751, - "build_scorer_count": 4, - "create_weight": 1404458, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 1034292 - }, - "children": [ - { - "type": "TermQuery", - "description": "title:wind", - "time_in_nanos": 813581, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 3291, - "match": 0, - "next_doc_count": 2, - "score_count": 2, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 7208, - "advance_count": 2, - "score": 18666, - "build_scorer_count": 6, - "create_weight": 616375, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 168041 - } - }, - { - "type": "TermQuery", - "description": "title:rise", - "time_in_nanos": 191083, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 0, - "match": 0, - "next_doc_count": 0, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 0, - "advance_count": 0, - "score": 0, - "build_scorer_count": 2, - "create_weight": 188625, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 2458 - } - } - ] - } - ], - "rewrite_time": 192417, - "collector": [ - { - "name": "SimpleTopScoreDocCollector", - "reason": "search_top_hits", - "time_in_nanos": 77291 - } - ] - } - ], - "aggregations": [] - } - ] - } -} -``` -
- -## Response fields - -The response includes the following fields. - -Field | Data type | Description -:--- | :--- | :--- -`profile` | Object | Contains profiling information. -`profile.shards` | Array of objects | A search request can be executed against one or more shards in the index, and a search may involve one or more indexes. Thus, the `profile.shards` array contains profiling information for each shard that was involved in the search. -`profile.shards.id` | String | The shard ID of the shard in the `[node-ID][index-name][shard-ID]` format. -`profile.shards.searches` | Array of objects | A search represents a query executed against the underlying Lucene index. Most search requests execute a single search against a Lucene index, but some search requests can execute more than one search. For example, including a global aggregation results in a secondary `match_all` query for the global context. The `profile.shards` array contains profiling information about each search execution. -[`profile.shards.searches.query`](#the-query-object) | Array of objects | Profiling information about the query execution. -`profile.shards.searches.rewrite_time` | Integer | All Lucene queries are rewritten. A query and its children may be rewritten more than once, until the query stops changing. The rewriting process involves performing optimizations, such as removing redundant clauses or replacing a query path with a more efficient one. After the rewriting process, the original query may change significantly. The `rewrite_time` field contains the cumulative total rewrite time for the query and all its children, in nanoseconds. -[`profile.shards.searches.collector`](#the-collector-array) | Array of objects | Profiling information about the Lucene collectors that ran the search. -[`profile.shards.aggregations`](#aggregations) | Array of objects | Profiling information about the aggregation execution. - -### The `query` object - -The `query` object contains the following fields. - -Field | Data type | Description -:--- | :--- | :--- -`type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch). -`description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type. -`time_in_nanos` | Long | The amount of time the query took to execute, in nanoseconds. In a parent query, the time is inclusive of the execution times of all the child queries. -[`breakdown`](#the-breakdown-object) | Object | Contains timing statistics about low-level Lucene execution. -`children` | Array of objects | If a query has subqueries (children), this field contains information about the subqueries. - -### The `breakdown` object - -The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object comprises the following fields. All fields contain integer values. - -Field | Description -:--- | :--- -`create_weight` | A `Query` object in Lucene is immutable. Yet, Lucene should be able to reuse `Query` objects in multiple `IndexSearcher` objects. Thus, `Query` objects need to keep temporary state and statistics associated with the index in which the query is executed. To achieve reuse, every `Query` object generates a `Weight` object, which keeps the temporary context (state) associated with the `` tuple. The `create_weight` field contains the amount of time spent creating the `Weight` object. -`build_scorer` | A `Scorer` iterates over matching documents and generates a score for each document. The `build_scorer` field contains the amount of time spent generating the `Scorer` object. This does not include the time spent scoring the documents. The `Scorer` initialization time depends on the optimization and complexity of a particular query. The `build_scorer` parameter also includes the amount of time associated with caching, if caching is applicable and enabled for the query. -`next_doc` | The `next_doc` Lucene method returns the document ID of the next document that matches the query. This method is a special type of the `advance` method and is equivalent to `advance(docId() + 1)`. The `next_doc` method is more convenient for many Lucene queries. The `next_doc` field contains the amount of time required to determine the next matching document, which varies depending on the query type. -`advance` | The `advance` method is a lower-level version of the `next_doc` method in Lucene. It also finds the next matching document but necessitates that the calling query perform additional tasks, such as identifying skips. Some queries, such as conjunctions (`must` clauses in Boolean queries), cannot use `next_doc`. For those queries, `advance` is timed. -`match` | For some queries, document matching is performed in two steps. First, the document is matched approximately. Second, those documents that are approximately matched are examined through a more comprehensive process. For example, a phrase query first checks whether a document contains all terms in the phrase. Next, it verifies that the terms are in order (which is a more expensive process). The `match` field is non-zero only for those queries that use the two-step verification process. -`score` | Contains the time taken for a `Scorer` to score a particular document. -`shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method. -`compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method. -`set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method. -`_count` | Contains the number of invocations of a ``. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. - -### The `collector` array - -The `collector` array contains information about Lucene Collectors. A Collector is responsible for coordinating document traversal and scoring and collecting matching documents. Using Collectors, individual queries can record aggregation results and execute global queries or post-query filters. - -Field | Description -:--- | :--- -`name` | The collector name. In the [example response](#example-response), the `collector` is a single `SimpleTopScoreDocCollector`---the default scoring and sorting collector. -`reason` | Contains a description of the collector. For possible field values, see [Collector reasons](#collector-reasons). -`time_in_nanos` | A wall-clock time, including timing for all children. -`children` | If a collector has subcollectors (children), this field contains information about the subcollectors. - -Collector times are calculated, combined, and normalized independently, so they are independent of query times. -{: .note} - -#### Collector reasons - -The following table describes all available collector reasons. - -Reason | Description -:--- | :--- -`search_sorted` | A collector that scores and sorts documents. Present in most simple searches. -`search_count` | A collector that counts the number of matching documents but does not fetch the source. Present when `size: 0` is specified. -`search_terminate_after_count` | A collector that searches for matching documents and terminates the search when it finds a specified number of documents. Present when the `terminate_after_count` query parameter is specified. -`search_min_score` | A collector that returns matching documents that have a score greater than a minimum score. Present when the `min_score` parameter is specified. -`search_multi` | A wrapper collector for other collectors. Present when search, aggregations, global aggregations, and post filters are combined in a single search. -`search_timeout` | A collector that stops running after a specified period of time. Present when a `timeout` parameter is specified. -`aggregation` | A collector for aggregations that is run against the specified query scope. OpenSearch uses a single `aggregation` collector to collect documents for all aggregations. -`global_aggregation` | A collector that is run against the global query scope. Global scope is different from a specified query scope, so in order to collect the entire dataset, a `match_all` query must be run. - -## Aggregations - -To profile aggregations, send an aggregation request and provide the `profile` parameter set to `true`. - -#### Example request: Global aggregation - -```json -GET /opensearch_dashboards_sample_data_ecommerce/_search -{ - "profile": "true", - "size": 0, - "query": { - "match": { "manufacturer": "Elitelligence" } - }, - "aggs": { - "all_products": { - "global": {}, - "aggs": { - "avg_price": { "avg": { "field": "taxful_total_price" } } - } - }, - "elitelligence_products": { "avg": { "field": "taxful_total_price" } } - } -} -``` -{% include copy-curl.html %} - -#### Example response: Global aggregation - -The response contains profiling information: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 10, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1370, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "all_products": { - "doc_count": 4675, - "avg_price": { - "value": 75.05542864304813 - } - }, - "elitelligence_products": { - "value": 68.4430200729927 - } - }, - "profile": { - "shards": [ - { - "id": "[LidyZ1HVS-u93-73Z49dQg][opensearch_dashboards_sample_data_ecommerce][0]", - "inbound_network_time_in_millis": 0, - "outbound_network_time_in_millis": 0, - "searches": [ - { - "query": [ - { - "type": "ConstantScoreQuery", - "description": "ConstantScore(manufacturer:elitelligence)", - "time_in_nanos": 1367487, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 634321, - "match": 0, - "next_doc_count": 1370, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 173250, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 132458, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 427458 - }, - "children": [ - { - "type": "TermQuery", - "description": "manufacturer:elitelligence", - "time_in_nanos": 1174794, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 470918, - "match": 0, - "next_doc_count": 1370, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 172084, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 114041, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 417751 - } - } - ] - } - ], - "rewrite_time": 42542, - "collector": [ - { - "name": "MultiCollector", - "reason": "search_multi", - "time_in_nanos": 778406, - "children": [ - { - "name": "EarlyTerminatingCollector", - "reason": "search_count", - "time_in_nanos": 70290 - }, - { - "name": "ProfilingAggregator: [elitelligence_products]", - "reason": "aggregation", - "time_in_nanos": 502780 - } - ] - } - ] - }, - { - "query": [ - { - "type": "ConstantScoreQuery", - "description": "ConstantScore(*:*)", - "time_in_nanos": 995345, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 930803, - "match": 0, - "next_doc_count": 4675, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 2209, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 23875, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 38458 - }, - "children": [ - { - "type": "MatchAllDocsQuery", - "description": "*:*", - "time_in_nanos": 431375, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 389875, - "match": 0, - "next_doc_count": 4675, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 1167, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 9458, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 30875 - } - } - ] - } - ], - "rewrite_time": 8792, - "collector": [ - { - "name": "ProfilingAggregator: [all_products]", - "reason": "aggregation_global", - "time_in_nanos": 1310536 - } - ] - } - ], - "aggregations": [ - { - "type": "AvgAggregator", - "description": "elitelligence_products", - "time_in_nanos": 319918, - "breakdown": { - "reduce": 0, - "post_collection_count": 1, - "build_leaf_collector": 130709, - "build_aggregation": 2709, - "build_aggregation_count": 1, - "build_leaf_collector_count": 2, - "post_collection": 584, - "initialize": 4750, - "initialize_count": 1, - "reduce_count": 0, - "collect": 181166, - "collect_count": 1370 - } - }, - { - "type": "GlobalAggregator", - "description": "all_products", - "time_in_nanos": 1519340, - "breakdown": { - "reduce": 0, - "post_collection_count": 1, - "build_leaf_collector": 134625, - "build_aggregation": 59291, - "build_aggregation_count": 1, - "build_leaf_collector_count": 2, - "post_collection": 5041, - "initialize": 24500, - "initialize_count": 1, - "reduce_count": 0, - "collect": 1295883, - "collect_count": 4675 - }, - "children": [ - { - "type": "AvgAggregator", - "description": "avg_price", - "time_in_nanos": 775967, - "breakdown": { - "reduce": 0, - "post_collection_count": 1, - "build_leaf_collector": 98999, - "build_aggregation": 33083, - "build_aggregation_count": 1, - "build_leaf_collector_count": 2, - "post_collection": 2209, - "initialize": 1708, - "initialize_count": 1, - "reduce_count": 0, - "collect": 639968, - "collect_count": 4675 - } - } - ] - } - ] - } - ] - } -} -``` -
- -#### Example request: Non-global aggregation - -```json -GET /opensearch_dashboards_sample_data_ecommerce/_search -{ - "size": 0, - "aggs": { - "avg_taxful_total_price": { - "avg": { - "field": "taxful_total_price" - } - } - } -} -``` -{% include copy-curl.html %} - -#### Example response: Non-global aggregation - -The response contains profiling information: - -
- - Response - - {: .text-delta} - -```json -{ - "took": 13, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 4675, - "relation": "eq" - }, - "max_score": null, - "hits": [] - }, - "aggregations": { - "avg_taxful_total_price": { - "value": 75.05542864304813 - } - }, - "profile": { - "shards": [ - { - "id": "[LidyZ1HVS-u93-73Z49dQg][opensearch_dashboards_sample_data_ecommerce][0]", - "inbound_network_time_in_millis": 0, - "outbound_network_time_in_millis": 0, - "searches": [ - { - "query": [ - { - "type": "ConstantScoreQuery", - "description": "ConstantScore(*:*)", - "time_in_nanos": 1690820, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 1614112, - "match": 0, - "next_doc_count": 4675, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 2708, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 20250, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 53750 - }, - "children": [ - { - "type": "MatchAllDocsQuery", - "description": "*:*", - "time_in_nanos": 770902, - "breakdown": { - "set_min_competitive_score_count": 0, - "match_count": 0, - "shallow_advance_count": 0, - "set_min_competitive_score": 0, - "next_doc": 721943, - "match": 0, - "next_doc_count": 4675, - "score_count": 0, - "compute_max_score_count": 0, - "compute_max_score": 0, - "advance": 1042, - "advance_count": 2, - "score": 0, - "build_scorer_count": 4, - "create_weight": 5041, - "shallow_advance": 0, - "create_weight_count": 1, - "build_scorer": 42876 - } - } - ] - } - ], - "rewrite_time": 22000, - "collector": [ - { - "name": "MultiCollector", - "reason": "search_multi", - "time_in_nanos": 3672676, - "children": [ - { - "name": "EarlyTerminatingCollector", - "reason": "search_count", - "time_in_nanos": 78626 - }, - { - "name": "ProfilingAggregator: [avg_taxful_total_price]", - "reason": "aggregation", - "time_in_nanos": 2834566 - } - ] - } - ] - } - ], - "aggregations": [ - { - "type": "AvgAggregator", - "description": "avg_taxful_total_price", - "time_in_nanos": 1973702, - "breakdown": { - "reduce": 0, - "post_collection_count": 1, - "build_leaf_collector": 199292, - "build_aggregation": 13584, - "build_aggregation_count": 1, - "build_leaf_collector_count": 2, - "post_collection": 6125, - "initialize": 6916, - "initialize_count": 1, - "reduce_count": 0, - "collect": 1747785, - "collect_count": 4675 - } - } - ] - } - ] - } -} -``` -
- -### Response fields - -The `aggregations` array contains aggregation objects with the following fields. - -Field | Data type | Description -:--- | :--- | :--- -`type` | String | The aggregator type. In the [non-global aggregation example response](#example-response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#example-request-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child. -`description` | String | Contains a Lucene explanation of the aggregation. Helps differentiate aggregations with the same type. -`time_in_nanos` | Long | The amount of time taken to execute the aggregation, in nanoseconds. In a parent aggregation, the time is inclusive of the execution times of all the child aggregations. -[`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution. -`children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations. -`debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution. - -### The `breakdown` object - -The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Each field in the `breakdown` object represents an internal Lucene method executed within the aggregation. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object is comprised of the following fields. All fields contain integer values. - -Field | Description -:--- | :--- -`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. -`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. -`collect`| Contains the time spent collecting the documents into buckets. -`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. -`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. -`reduce`| Contains the time spent in the `reduce` phase. -`_count` | Contains the number of invocations of a ``. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method. \ No newline at end of file diff --git a/_api-reference/reload-search-analyzer.md b/_api-reference/reload-search-analyzer.md new file mode 100644 index 0000000000..a07267d619 --- /dev/null +++ b/_api-reference/reload-search-analyzer.md @@ -0,0 +1,60 @@ +--- +layout: default +title: Reload search analyzer +nav_order: 65 +--- + +# Reload search analyzer + +The reload search analyzer API operation detects any changes to [synonym]({{site.url}}{{site.baseurl}}/opensearch/ux/) files for any configured [search analyzers]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/index/). The reload search analyzer request needs to be run on all nodes. Additionally, the synonym token filter must be set to `true`. + +## Path and HTTP methods + +``` +POST //_reload_search_analyzers +GET //_reload_search_analyzers +``` + +## Request body fields + +Request body parameters are optional. + +Field Type | Data type | Description +:--- | :--- | :--- +allow_no_indices | Boolean | When set to `false`, an error is returned for indexes that are closed or missing and match any wildcard expression. Default is set to `true`. +expand_wildcards | String | Allows you to set the wildcards that can be matched to a type of index. Available options are `open`, `closed`, `all`, `none`, and `hidden`. Default is set to `open`. +ignore_unavailable | Boolean | If an index is closed or missing, an error is returned when ignore_unavailable is set to `false`. Default is set to `false`. + +## Examples + +The following are an example request and response. + +#### Example request + +````json +POST /shakespeare/_reload_search_analyzers +```` +{% include copy-curl.html %} + +#### Example response + +````json +{ + "_shards": { + "total": 1, + "successful": 1, + "failed": 0 + }, + "reload_details": [ + { + "index": "shakespeare", + "reloaded_analyzers": [ + "analyzers-synonyms-test" + ], + "reloaded_node_ids": [ + "opensearch-node1" + ] + } + ] +} +```` \ No newline at end of file diff --git a/_api-reference/script-apis/index.md b/_api-reference/script-apis/index.md index 836728894c..650bc68dda 100644 --- a/_api-reference/script-apis/index.md +++ b/_api-reference/script-apis/index.md @@ -10,3 +10,11 @@ redirect_from: # Script APIs The script APIs allow you to work with stored scripts. Stored scripts are part of the cluster state and reduce compilation time and enhance search speed. The default scripting language is Painless. + +You can perform the following operations on stored scripts: +* [Create or update stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/create-stored-script/) +* [Execute Painless stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/exec-stored-script/) +* [Get stored script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/get-stored-script/) +* [Delete script]({{site.url}}{{site.baseurl}}/api-reference/script-apis/delete-script/) +* [Get stored script contexts]({{site.url}}{{site.baseurl}}/api-reference/script-apis/get-script-contexts/). +* [Get script language]({{site.url}}{{site.baseurl}}/api-reference/script-apis/get-script-language/) diff --git a/_api-reference/scroll.md b/_api-reference/scroll.md index f8bf82e598..4f373627c5 100644 --- a/_api-reference/scroll.md +++ b/_api-reference/scroll.md @@ -2,8 +2,6 @@ layout: default title: Scroll nav_order: 71 -redirect_from: - - /opensearch/rest-api/scroll/ --- # Scroll diff --git a/_api-reference/search.md b/_api-reference/search.md index 71aa10e2c8..a449ee4246 100644 --- a/_api-reference/search.md +++ b/_api-reference/search.md @@ -60,7 +60,7 @@ ignore_unavailable | Boolean | Specifies whether to include missing or closed in lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false. max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5. pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards. -preference | String | Specifies the shards or nodes on which OpenSearch should perform the search. For valid values, see [The `preference` query parameter](#the-preference-query-parameter). +preference | String | Specifies which shard or node OpenSearch should perform the count operation on. q | String | Lucene query string’s query. request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether it’s enabled in the index’s settings. rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false. @@ -86,20 +86,6 @@ track_total_hits | Boolean or Integer | Whether to return how many documents mat typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true. version | Boolean | Whether to include the document version as a match. -### The `preference` query parameter - -The `preference` query parameter specifies the shards or nodes on which OpenSearch should perform the search. The following are valid values: - -- `_primary`: Perform the search only on primary shards. -- `_replica`: Perform the search only on replica shards. -- `_primary_first`: Perform the search on primary shards but fail over to other available shards if primary shards are not available. -- `_replica_first`: Perform the search on replica shards but fail over to other available shards if replica shards are not available. -- `_local`: If possible, perform the search on the local node's shards. -- `_prefer_nodes:,`: If possible, perform the search on the specified nodes. Use a comma-separated list to specify multiple nodes. -- `_shards:,`: Perform the search only on the specified shards. Use a comma-separated list to specify multiple shards. When combined with other preferences, the `_shards` preference must be listed first. For example, `_shards:1,2|_replica`. -- `_only_nodes:,`: Perform the search only on the specified nodes. Use a comma-separated list to specify multiple nodes. -- ``: Specifies a custom string to use for the search. The string cannot start with an underscore character (`_`). Searches with the same custom string are routed to the same shards. - ## Request body All fields are optional. diff --git a/_api-reference/tasks.md b/_api-reference/tasks.md index 72f88f0fc5..fd8099bcb8 100644 --- a/_api-reference/tasks.md +++ b/_api-reference/tasks.md @@ -28,7 +28,7 @@ GET _tasks/ Note that if a task finishes running, it won't be returned as part of your request. For an example of a task that takes a little longer to finish, you can run the [`_reindex`]({{site.url}}{{site.baseurl}}/opensearch/reindex-data) API operation on a larger document, and then run `tasks`. -#### Example response +**Sample Response** ```json { "nodes": { @@ -93,18 +93,18 @@ Parameter | Data type | Description | `wait_for_completion` | Boolean | Waits for the matching tasks to complete. (Default: false) `group_by` | Enum | Groups tasks by parent/child relationships or nodes. (Default: nodes) `timeout` | Time | An explicit operation timeout. (Default: 30 seconds) -`cluster_manager_timeout` | Time | The time to wait for a connection to the primary node. (Default: 30 seconds) +`master_timeout` | Time | The time to wait for a connection to the primary node. (Default: 30 seconds) For example, this request returns tasks currently running on a node named `opensearch-node1`: -#### Example request +**Sample Request** -```json +``` GET /_tasks?nodes=opensearch-node1 ``` {% include copy-curl.html %} -#### Example response +**Sample Response** ```json { @@ -150,14 +150,14 @@ GET /_tasks?nodes=opensearch-node1 The following request returns detailed information about active search tasks: -#### Example request +**Sample Request** ```bash curl -XGET "localhost:9200/_tasks?actions=*search&detailed ``` {% include copy.html %} -#### Example response +**Sample Response** ```json { @@ -190,25 +190,9 @@ curl -XGET "localhost:9200/_tasks?actions=*search&detailed "cancelled" : false, "headers" : { }, "resource_stats" : { - "average" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, "total" : { "cpu_time_in_nanos" : 0, "memory_in_bytes" : 0 - }, - "min" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "max" : { - "cpu_time_in_nanos" : 0, - "memory_in_bytes" : 0 - }, - "thread_info" : { - "thread_executions" : 0, - "active_threads" : 0 } } } @@ -219,22 +203,6 @@ curl -XGET "localhost:9200/_tasks?actions=*search&detailed ``` -### The `resource_stats` object - -The `resource_stats` object is only updated for tasks that support resource tracking. These stats are computed based on scheduled thread executions, including both threads that have finished working on the task and threads currently working on the task. Because the same thread may be scheduled to work on the same task multiple times, each instance of a given thread being scheduled to work on a given task is considered to be a single thread execution. - -The following table lists all response fields in the `resource_stats` object. - -Response field | Description | -:--- | :--- | -`average` | The average resource usage across all scheduled thread executions. | -`total` | The sum of resource usages across all scheduled thread executions. | -`min` | The minimum resource usage across all scheduled thread executions. | -`max` | The maximum resource usage across all scheduled thread executions. | -`thread_info` | Thread-count-related stats.| -`thread_info.active_threads` | The number of threads currently working on the task. | -`thread_info.thread_executions` | The number of threads that have been scheduled to work on the task. | - ## Task canceling After getting a list of tasks, you can cancel all cancelable tasks with the following request: @@ -328,4 +296,4 @@ This operation supports the same parameters as the `tasks` operation. The follow ```bash curl -i -H "X-Opaque-Id: 123456" "https://localhost:9200/_tasks?nodes=opensearch-node1" -u 'admin:admin' --insecure ``` -{% include copy.html %} +{% include copy.html %} \ No newline at end of file diff --git a/_benchmark/commands/compare.md b/_benchmark/commands/compare.md deleted file mode 100644 index c2326fe0a6..0000000000 --- a/_benchmark/commands/compare.md +++ /dev/null @@ -1,132 +0,0 @@ ---- -layout: default -title: compare -nav_order: 55 -parent: Command reference ---- - -# compare - -The `compare` command helps you analyze the difference between two benchmark tests. This can help you analyze the performance impact of changes made from a previous test based on a specific Git revision. - -## Usage - -You can compare two different workload tests using their `TestExecution IDs`. To find a list of tests run from a specific workload, use `opensearch-benchmark list test_executions`. You should receive an output similar to the following: - - -``` - ____ _____ __ ____ __ __ - / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ - / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ -/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< -\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| - /_/ -Recent test-executions: - -Recent test_executions: - -TestExecution ID TestExecution Timestamp Workload Workload Parameters TestProcedure ProvisionConfigInstance User Tags workload Revision Provision Config Revision ------------------------------------- ------------------------- ---------- --------------------- ------------------- ------------------------- ----------- ------------------- --------------------------- -729291a0-ee87-44e5-9b75-cc6d50c89702 20230524T181718Z geonames append-no-conflicts 4gheap 30260cf -f91c33d0-ec93-48e1-975e-37476a5c9fe5 20230524T170134Z geonames append-no-conflicts 4gheap 30260cf -d942b7f9-6506-451d-9dcf-ef502ab3e574 20230524T144827Z geonames append-no-conflicts 4gheap 30260cf -a33845cc-c2e5-4488-a2db-b0670741ff9b 20230523T213145Z geonames append-no-conflicts - -``` - -Then, use `compare` to call a `--baseline` test and a `--contender` test for comparison. - -``` -opensearch-benchmark compare --baseline=417ed42-6671-9i79-11a1-e367636068ce --contender=beb154e4-0a05-4f45-ad9f-e34f9a9e51f7 -``` - -You should receive the following response comparing the final benchmark metrics for both tests: - -``` - ____ _____ __ ____ __ __ - / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ - / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ -/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< -\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| - /_/ - -Comparing baseline - TestExecution ID: 729291a0-ee87-44e5-9b75-cc6d50c89702 - TestExecution timestamp: 2023-05-24 18:17:18 - -with contender - TestExecution ID: a33845cc-c2e5-4488-a2db-b0670741ff9b - TestExecution timestamp: 2023-05-23 21:31:45 - - ------------------------------------------------------- - _______ __ _____ - / ____(_)___ ____ _/ / / ___/_________ ________ - / /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \ - / __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/ -/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/ ------------------------------------------------------- - Metric Baseline Contender Diff --------------------------------------------------------- ---------- ----------- ----------------- - Min Indexing Throughput [docs/s] 19501 19118 -383.00000 - Median Indexing Throughput [docs/s] 20232 19927.5 -304.45833 - Max Indexing Throughput [docs/s] 21172 20849 -323.00000 - Total indexing time [min] 55.7989 56.335 +0.53603 - Total merge time [min] 12.9766 13.3115 +0.33495 - Total refresh time [min] 5.20067 5.20097 +0.00030 - Total flush time [min] 0.0648667 0.0681833 +0.00332 - Total merge throttle time [min] 0.796417 0.879267 +0.08285 - Query latency term (50.0 percentile) [ms] 2.10049 2.15421 +0.05372 - Query latency term (90.0 percentile) [ms] 2.77537 2.84168 +0.06630 - Query latency term (100.0 percentile) [ms] 4.52081 5.15368 +0.63287 - Query latency country_agg (50.0 percentile) [ms] 112.049 110.385 -1.66392 - Query latency country_agg (90.0 percentile) [ms] 128.426 124.005 -4.42138 - Query latency country_agg (100.0 percentile) [ms] 155.989 133.797 -22.19185 - Query latency scroll (50.0 percentile) [ms] 16.1226 14.4974 -1.62519 - Query latency scroll (90.0 percentile) [ms] 17.2383 15.4079 -1.83043 - Query latency scroll (100.0 percentile) [ms] 18.8419 18.4241 -0.41784 - Query latency country_agg_cached (50.0 percentile) [ms] 1.70223 1.64502 -0.05721 - Query latency country_agg_cached (90.0 percentile) [ms] 2.34819 2.04318 -0.30500 -Query latency country_agg_cached (100.0 percentile) [ms] 3.42547 2.86814 -0.55732 - Query latency default (50.0 percentile) [ms] 5.89058 5.83409 -0.05648 - Query latency default (90.0 percentile) [ms] 6.71282 6.64662 -0.06620 - Query latency default (100.0 percentile) [ms] 7.65307 7.3701 -0.28297 - Query latency phrase (50.0 percentile) [ms] 1.82687 1.83193 +0.00506 - Query latency phrase (90.0 percentile) [ms] 2.63714 2.46286 -0.17428 - Query latency phrase (100.0 percentile) [ms] 5.39892 4.22367 -1.17525 - Median CPU usage (index) [%] 668.025 679.15 +11.12499 - Median CPU usage (stats) [%] 143.75 162.4 +18.64999 - Median CPU usage (search) [%] 223.1 229.2 +6.10000 - Total Young Gen GC time [s] 39.447 40.456 +1.00900 - Total Young Gen GC count 10 11 +1.00000 - Total Old Gen GC time [s] 7.108 7.703 +0.59500 - Total Old Gen GC count 10 11 +1.00000 - Index size [GB] 3.25475 3.25098 -0.00377 - Total written [GB] 17.8434 18.3143 +0.47083 - Heap used for segments [MB] 21.7504 21.5901 -0.16037 - Heap used for doc values [MB] 0.16436 0.13905 -0.02531 - Heap used for terms [MB] 20.0293 19.9159 -0.11345 - Heap used for norms [MB] 0.105469 0.0935669 -0.01190 - Heap used for points [MB] 0.773487 0.772155 -0.00133 - Heap used for points [MB] 0.677795 0.669426 -0.00837 - Segment count 136 121 -15.00000 - Indices Stats(90.0 percentile) [ms] 3.16053 3.21023 +0.04969 - Indices Stats(99.0 percentile) [ms] 5.29526 3.94132 -1.35393 - Indices Stats(100.0 percentile) [ms] 5.64971 7.02374 +1.37403 - Nodes Stats(90.0 percentile) [ms] 3.19611 3.15251 -0.04360 - Nodes Stats(99.0 percentile) [ms] 4.44111 4.87003 +0.42892 - Nodes Stats(100.0 percentile) [ms] 5.22527 5.66977 +0.44450 -``` - -## Options - -You can use the following options to customize the results of your test comparison: - -- `--baseline`: The baseline TestExecution ID used to compare the contender TestExecution. -- `--contender`: The TestExecution ID for the contender being compared to the baseline. -- `--results-format`: Defines the output format for the command line results, either `markdown` or `csv`. Default is `markdown`. -- `--results-number-align`: Defines the column number alignment for when the `compare` command outputs results. Default is `right`. -- `--results-file`: When provided a file path, writes the compare results to the file indicated in the path. -- `--show-in-results`: Determines whether or not to include the comparison in the results file. - - diff --git a/_benchmark/commands/download.md b/_benchmark/commands/download.md deleted file mode 100644 index 3afd9dfea4..0000000000 --- a/_benchmark/commands/download.md +++ /dev/null @@ -1,40 +0,0 @@ ---- -layout: default -title: download -nav_order: 60 -parent: Command reference ---- - -# download - -Use the `download` command to select which OpenSearch distribution version to download. - -## Usage - -The following example downloads OpenSearch version 2.7.0: - -``` -opensearch-benchmark download --distribution-version=2.7.0 -``` - -Benchmark then returns the location of the OpenSearch artifact: - -``` -{ - "opensearch": "/Users/.benchmark/benchmarks/distributions/opensearch-2.7.0.tar.gz" -} -``` - -## Options - -Use the following options to customize how OpenSearch Benchmark downloads OpenSearch: - -- `--provision-config-repository`: Defines the repository from which OpenSearch Benchmark loads `provision-configs` and `provision-config-instances`. -- `--provision-config-revision`: Defines a specific Git revision in the `provision-config` that OpenSearch Benchmark should use. -- `--provision-config-path`: Defines the path to the `--provision-config-instance` and any OpenSearch plugin configurations to use. -- `--distribution-version`: Downloads the specified OpenSearch distribution based on version number. For a list of released OpenSearch versions, see [Version history](https://opensearch.org/docs/version-history/). -- `--distribution-repository`: Defines the repository from where the OpenSearch distribution should be downloaded. Default is `release`. -- `--provision-config-instance`: Defines the `--provision-config-instance` to use. You can view possible configuration instances using the command `opensearch-benchmark list provision-config-instances`. -- `--provision-config-instance-params`: A comma-separated list of key-value pairs injected verbatim as variables for the `provision-config-instance`. -- `--target-os`: The target operating system (OS) for which the OpenSearch artifact should be downloaded. Default is the current OS. -- `--target-arch`: The name of the CPU architecture for which an artifact should be downloaded. diff --git a/_benchmark/commands/execute-test.md b/_benchmark/commands/execute-test.md deleted file mode 100644 index e307cfa6b2..0000000000 --- a/_benchmark/commands/execute-test.md +++ /dev/null @@ -1,178 +0,0 @@ ---- -layout: default -title: execute-test -nav_order: 65 -parent: Command reference ---- - -# execute-test - -Whether you're using the included [OpenSearch Benchmark workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) or a [custom workload]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/), use the `execute-test` command to gather data about the performance of your OpenSearch cluster according to the selected workload. - -## Usage - -The following example executes a test using the `geonames` workload in test mode: - -``` -opensearch-benchmark execute-test --workload=geonames --test-mode -``` - -After the test runs, OpenSearch Benchmark responds with a summary of the benchmark metrics: - -``` ------------------------------------------------------- - _______ __ _____ - / ____(_)___ ____ _/ / / ___/_________ ________ - / /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \ - / __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/ -/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/ ------------------------------------------------------- - -| Metric | Task | Value | Unit | -|-------------------------------:|---------------------:|----------:|-------:| -| Total indexing time | | 28.0997 | min | -| Total merge time | | 6.84378 | min | -| Total refresh time | | 3.06045 | min | -| Total flush time | | 0.106517 | min | -| Total merge throttle time | | 1.28193 | min | -| Median CPU usage | | 471.6 | % | -| Total Young Gen GC | | 16.237 | s | -| Total Old Gen GC | | 1.796 | s | -| Index size | | 2.60124 | GB | -| Total written | | 11.8144 | GB | -| Heap used for segments | | 14.7326 | MB | -| Heap used for doc values | | 0.115917 | MB | -| Heap used for terms | | 13.3203 | MB | -| Heap used for norms | | 0.0734253 | MB | -| Heap used for points | | 0.5793 | MB | -| Heap used for stored fields | | 0.643608 | MB | -| Segment count | | 97 | | -| Min Throughput | index-append | 31925.2 | docs/s | -| Median Throughput | index-append | 39137.5 | docs/s | -| Max Throughput | index-append | 39633.6 | docs/s | -| 50.0th percentile latency | index-append | 872.513 | ms | -| 90.0th percentile latency | index-append | 1457.13 | ms | -| 99.0th percentile latency | index-append | 1874.89 | ms | -| 100th percentile latency | index-append | 2711.71 | ms | -| 50.0th percentile service time | index-append | 872.513 | ms | -| 90.0th percentile service time | index-append | 1457.13 | ms | -| 99.0th percentile service time | index-append | 1874.89 | ms | -| 100th percentile service time | index-append | 2711.71 | ms | -| ... | ... | ... | ... | -| ... | ... | ... | ... | -| Min Throughput | painless_dynamic | 2.53292 | ops/s | -| Median Throughput | painless_dynamic | 2.53813 | ops/s | -| Max Throughput | painless_dynamic | 2.54401 | ops/s | -| 50.0th percentile latency | painless_dynamic | 172208 | ms | -| 90.0th percentile latency | painless_dynamic | 310401 | ms | -| 99.0th percentile latency | painless_dynamic | 341341 | ms | -| 99.9th percentile latency | painless_dynamic | 344404 | ms | -| 100th percentile latency | painless_dynamic | 344754 | ms | -| 50.0th percentile service time | painless_dynamic | 393.02 | ms | -| 90.0th percentile service time | painless_dynamic | 407.579 | ms | -| 99.0th percentile service time | painless_dynamic | 430.806 | ms | -| 99.9th percentile service time | painless_dynamic | 457.352 | ms | -| 100th percentile service time | painless_dynamic | 459.474 | ms | - ----------------------------------- -[INFO] SUCCESS (took 2634 seconds) ----------------------------------- -``` - -## Options - -Use the following options to customize the `execute-test` command for your use case. Options in this section are categorized by their use case. - -## General settings - -The following options shape how each test runs and how results appear: - -- `--test-mode`: Runs the given workload in test mode, which is useful when checking a workload for errors. -- `--user-tag`: Defines user-specific key-value pairs to be used in metric record as meta information, for example, `intention:baseline-ticket-12345`. -- `--results-format`: Defines the output format for the command line results, either `markdown` or `csv`. Default is `markdown`. -- `--results-number-align`: Defines the column number alignment for when the `compare` command outputs results. Default is `right`. -- `--results-file`: When provided a file path, writes the compare results to the file indicated in the path. -- `--show-in-results`: Determines whether or not to include the comparison in the results file. - - -### Distributions - -The following options set which version of OpenSearch and the OpenSearch plugins the benchmark test uses: - -- `--distribution-version`: Downloads the specified OpenSearch distribution based on version number. For a list of released OpenSearch versions, see [Version history](https://opensearch.org/docs/version-history/). -- `--distribution-repository`: Defines the repository from where the OpenSearch distribution should be downloaded. Default is `release`. -- `--revision`: Defines the current source code revision to use for running a benchmark test. Default is `current`. - - `current`: Uses the source tree's current revision based on your OpenSearch distribution. - - `latest`: Fetches the latest revision from the main branch of the source tree. - - You can also use a timestamp or commit ID from the source tree. When using a timestamp, specify `@ts`, where "ts" is a valid ISO 8601 timestamp, for example, `@2013-07-27T10:37:00Z`. -- `--opensearch-plugins`: Defines which [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) to install. By default, no plugins are installed. -- `--plugin-params:` Defines a comma-separated list of key:value pairs that are injected verbatim into all plugins as variables. -- `--runtime-jdk`: The major version of JDK to use. -- `--client-options`: Defines a comma-separated list of clients to use. All options are passed to the OpenSearch Python client. Default is `timeout:60`. - -### Cluster - -The following option relates to the target cluster of the benchmark. - -- `--target-hosts`: Defines a comma-separated list of host-port pairs that should be targeted if using the pipeline `benchmark-only`. Default is `localhost:9200`. - - -### Distributed workload generation - -The following options help those who want to use multiple hosts to generate load to the benchmark cluster: - -- `--load-worker-coordinator-hosts`: Defines a comma-separated list of hosts that coordinate loads. Default is `localhost`. -- `--enable-worker-coordinator-profiling`: Enables an analysis of the performance of OpenSearch Benchmark's worker coordinator. Default is `false`. - -### Provisioning - -The following options help customize how OpenSearch Benchmark provisions OpenSearch and workloads: - -- `--provision-config-repository`: Defines the repository from which OpenSearch Benchmark loads `provision-configs` and `provision-config-instances`. -- `--provision-config-path`: Defines the path to the `--provision-config-instance` and any OpenSearch plugin configurations to use. -- `--provision-config-revision`: Defines a specific Git revision in the `provision-config` that OpenSearch Benchmark should use. -- `--provision-config-instance`: Defines the `--provision-config-instance` to use. You can see possible configuration instances using the command `opensearch-benchmark list provision-config-instances`. -- `--provision-config-instance-params`: A comma-separated list of key-value pairs injected verbatim as variables for the `provision-config-instance`. - - -### Workload - -The following options determine which workload is used to run the test: - -- `--workload-repository`: Defines the repository from which OpenSearch Benchmark loads workloads. -- `--workload-path`: Defines the path to a downloaded or custom workload. -- `--workload-revision`: Defines a specific revision from the workload source tree that OpenSearch Benchmark should use. -- `--workload`: Defines the workload to use based on the workload's name. You can find a list of preloaded workloads using `opensearch-benchmark list workloads`. - -### Test procedures - -The following options define what test procedures the test uses and which operations are contained inside the procedure: - -- `--test-execution-id`: Defines a unique ID for this test run. -- `--test-procedure`: Defines a test procedure to use. You can find a list of test procedures using `opensearch-benchmark list test-procedures`. -- `--include-tasks`: Defines a comma-separated list of test procedure tasks to run. By default, all tasks listed in a test procedure array are run. -- `--exclude-tasks`: Defines a comma-separated list of test procedure tasks not to run. -- `--enable-assertions`: Enables assertion checks for tasks. Default is `false`. - -### Pipelines - -The `--pipeline` option selects a pipeline to run. You can find a list of pipelines supported by OpenSearch Benchmark by running `opensearch-benchmark list pipelines`. - - -### Telemetry - -The following options enable telemetry devices on OpenSearch Benchmark: - -- `--telemetry`: Enables the provided telemetry devices when the devices are provided using a comma-separated list. You can find a list of possible telemetry devices by using `opensearch-benchmark list telemetry`. -- `--telemetry-params`: Defines a comma-separated list of key-value pairs that are injected verbatim into the telemetry devices as parameters. - - -### Errors - -The following options set how OpenSearch Benchmark handles errors when running tests: - -- `--on-error`: Controls how OpenSearch Benchmark responds to errors. Default is `continue`. - - `continue`: Continues to run the test despite the error. - - `abort`: Aborts the test when an error occurs. -- `--preserve-install`: Keeps the Benchmark candidate and its index. Default is `false`. -- `--kill-running-processes`: When set to `true`, stops any OpenSearch Benchmark processes currently running and allows OpenSearch Benchmark to continue to run. Default is `false`. diff --git a/_benchmark/commands/generate.md b/_benchmark/commands/generate.md deleted file mode 100644 index 040982ad0c..0000000000 --- a/_benchmark/commands/generate.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -layout: default -title: generate -nav_order: 70 -parent: Command reference ---- - -The `generate` command generates visualizations based on benchmark results. - -## Usage - -The following example generates a time-series chart, which outputs into the `.benchmark` directory: - -``` -opensearch-benchmark generate --chart-type="time-series" -``` - -## Options - -The following options customize the visualization produced by the `generate` command: - -- `--chart-spec-path`: Sets the path to the JSON files containing chart specifications that can be used to generate charts. -- `--chart-type`: Generates the indicated chart type, either `time-series` or `bar`. Default is `time-series`. -- `--output-path`: The path and name where the chart outputs. Default is `stdout`. diff --git a/_benchmark/commands/index.md b/_benchmark/commands/index.md deleted file mode 100644 index e5272b4383..0000000000 --- a/_benchmark/commands/index.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -layout: default -title: Command reference -nav_order: 50 -has_children: true ---- - -# OpenSearch Benchmark command reference - -This section provides a list of commands supported by OpenSearch Benchmark, including commonly used commands such as `execute-test` and `list`. - -- [compare]({{site.url}}{{site.baseurl}}/benchmark/commands/compare/) -- [download]({{site.url}}{{site.baseurl}}/benchmark/commands/download/) -- [execute-test]({{site.url}}{{site.baseurl}}/benchmark/commands/execute-test/) -- [generate]({{site.url}}{{site.baseurl}}/benchmark/commands/generate/) -- [info]({{site.url}}{{site.baseurl}}/benchmark/commands/info/) -- [list]({{site.url}}{{site.baseurl}}/benchmark/commands/list/) - -## List of common options - -All OpenSearch Benchmark commands support the following options: - -- `--h` or `--help`: Provides options and other useful information about each command. -- `--quiet`: Hides as much of the results output as possible. Default is `false`. -- `--offline`: Indicates whether OpenSearch Benchmark has a connection to the internet. Default is `false`. - diff --git a/_benchmark/commands/info.md b/_benchmark/commands/info.md deleted file mode 100644 index d0be33209c..0000000000 --- a/_benchmark/commands/info.md +++ /dev/null @@ -1,158 +0,0 @@ ---- -layout: default -title: info -nav_order: 75 -parent: Command reference ---- - -# info - -The `info` command prints details about an OpenSearch Benchmark component. - -## Usage - -The following example returns information about a workload named `nyc_taxis`: - -``` -opensearch-benchmark info --workload=nyc_taxis -``` - -OpenSearch Benchmark returns information about the workload, as shown in the following example response: - -``` - ____ _____ __ ____ __ __ - / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ - / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ -/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< -\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| - /_/ - -Showing details for workload [nyc_taxis]: - -* Description: Taxi rides in New York in 2015 -* Documents: 165,346,692 -* Compressed Size: 4.5 GB -* Uncompressed Size: 74.3 GB - -=================================== -TestProcedure [searchable-snapshot] -=================================== - -Measuring performance for Searchable Snapshot feature. Based on the default test procedure 'append-no-conflicts'. - -Schedule: ----------- - -1. delete-index -2. create-index -3. check-cluster-health -4. index (8 clients) -5. refresh-after-index -6. force-merge -7. refresh-after-force-merge -8. wait-until-merges-finish -9. create-snapshot-repository -10. delete-snapshot -11. create-snapshot -12. wait-for-snapshot-creation -13. delete-local-index -14. restore-snapshot -15. default -16. range -17. distance_amount_agg -18. autohisto_agg -19. date_histogram_agg - -==================================================== -TestProcedure [append-no-conflicts] (run by default) -==================================================== - -Indexes the entire document corpus using a setup that will lead to a larger indexing throughput than the default settings and produce a smaller index (higher compression rate). Document IDs are unique, so all index operations are append only. After that, a couple of queries are run. - -Schedule: ----------- - -1. delete-index -2. create-index -3. check-cluster-health -4. index (8 clients) -5. refresh-after-index -6. force-merge -7. refresh-after-force-merge -8. wait-until-merges-finish -9. default -10. range -11. distance_amount_agg -12. autohisto_agg -13. date_histogram_agg - -============================================== -TestProcedure [append-no-conflicts-index-only] -============================================== - -Indexes the whole document corpus using a setup that will lead to a larger indexing throughput than the default settings and produce a smaller index (higher compression rate). Document ids are unique so all index operations are append only. - -Schedule: ----------- - -1. delete-index -2. create-index -3. check-cluster-health -4. index (8 clients) -5. refresh-after-index -6. force-merge -7. refresh-after-force-merge -8. wait-until-merges-finish - -===================================================== -TestProcedure [append-sorted-no-conflicts-index-only] -===================================================== - -Indexes the whole document corpus in an index sorted by pickup_datetime field in descending order (most recent first) and using a setup that will lead to a larger indexing throughput than the default settings and produce a smaller index (higher compression rate). Document ids are unique so all index operations are append only. - -Schedule: ----------- - -1. delete-index -2. create-index -3. check-cluster-health -4. index (8 clients) -5. refresh-after-index -6. force-merge -7. refresh-after-force-merge -8. wait-until-merges-finish - -====================== -TestProcedure [update] -====================== - -Schedule: ----------- - -1. delete-index -2. create-index -3. check-cluster-health -4. update (8 clients) -5. refresh-after-index -6. force-merge -7. refresh-after-force-merge -8. wait-until-merges-finish - - -------------------------------- -[INFO] SUCCESS (took 2 seconds) -------------------------------- -``` - -## Options - -You can use the following options with the `info` command: - - -- `--workload-repository`: Defines the repository from where OpenSearch Benchmark loads workloads. -- `--workload-path`: Defines the path to a downloaded or custom workload. -- `--workload-revision`: Defines a specific revision from the workload source tree that OpenSearch Benchmark should use. -- `--workload`: Defines the workload to use based on the workload's name. You can find a list of preloaded workloads using `opensearch-benchmark list workloads`. -- `--test-procedure`: Defines a test procedure to use. You can find a list of test procedures using `opensearch-benchmark list test_procedures`. -- `--include-tasks`: Defines a comma-separated list of test procedure tasks to run. By default, all tasks listed in a test procedure array are run. -- `--exclude-tasks`: Defines a comma-separated list of test procedure tasks not to run. diff --git a/_benchmark/commands/list.md b/_benchmark/commands/list.md deleted file mode 100644 index fffcc17c04..0000000000 --- a/_benchmark/commands/list.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -layout: default -title: list -nav_order: 80 -parent: Command reference ---- - -# list - -The `list` command lists the following elements used by OpenSearch Benchmark: - -- `telemetry`: Telemetry devices -- `workloads`: Workloads -- `pipelines`: Pipelines -- `test_executions`: Single run of a workload -- `provision_config_instances`: Provisioned configuration instances -- `opensearch-plugins`: OpenSearch plugins - - -## Usage - -The following example lists any workload test runs and detailed information about each test: - -``` -`opensearch-benchmark list test_executions -``` - -OpenSearch Benchmark returns information about each test. - -``` -benchmark list test_executions - - ____ _____ __ ____ __ __ - / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ - / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ -/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< -\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| - /_/ - - -Recent test_executions: - -TestExecution ID TestExecution Timestamp Workload Workload Parameters TestProcedure ProvisionConfigInstance User Tags workload Revision Provision Config Revision ------------------------------------- ------------------------- ---------- --------------------- ------------------- ------------------------- ----------- ------------------- --------------------------- -729291a0-ee87-44e5-9b75-cc6d50c89702 20230524T181718Z geonames append-no-conflicts 4gheap 30260cf -f91c33d0-ec93-48e1-975e-37476a5c9fe5 20230524T170134Z geonames append-no-conflicts 4gheap 30260cf -d942b7f9-6506-451d-9dcf-ef502ab3e574 20230524T144827Z geonames append-no-conflicts 4gheap 30260cf -a33845cc-c2e5-4488-a2db-b0670741ff9b 20230523T213145Z geonames append-no-conflicts 4gheap 30260cf -ba643ed3-0db5-452e-a680-2b0dc0350cf2 20230522T224450Z geonames append-no-conflicts external 30260cf -8d366ec5-3322-4e09-b041-a4b02e870033 20230519T201514Z geonames append-no-conflicts external 30260cf -4574c13e-8742-41af-a4fa-79480629ecf0 20230519T195617Z geonames append-no-conflicts external 30260cf -3e240d18-fc87-4c49-9712-863196efcef4 20230519T195412Z geonames append-no-conflicts external 30260cf -90f066ae-3d83-41e9-bbeb-17cb0480d578 20230519T194448Z geonames append-no-conflicts external 30260cf -78602e07-0ff8-4f00-9a0e-746fb64e4129 20230519T193258Z geonames append-no-conflicts external 30260cf - -------------------------------- -[INFO] SUCCESS (took 0 seconds) -------------------------------- -``` - -## Options - -You can use the following options with the `test` command: - -- `--limit`: Limits the number of search results for recent test runs. Default is `10`. -- `--workload-repository`: Defines the repository from where OpenSearch Benchmark loads workloads. -- `--workload-path`: Defines the path to a downloaded or custom workload. -- `--workload-revision`: Defines a specific revision from the workload source tree that OpenSearch Benchmark should use. - - diff --git a/_benchmark/configuring-benchmark.md b/_benchmark/configuring-benchmark.md deleted file mode 100644 index aa097f55ae..0000000000 --- a/_benchmark/configuring-benchmark.md +++ /dev/null @@ -1,198 +0,0 @@ ---- -layout: default -title: Configuring OpenSearch Benchmark -nav_order: 7 -has_children: false ---- - -# Configuring OpenSearch Benchmark - -OpenSearch Benchmark configuration data is stored in `~/.benchmark/benchmark.ini`, which is automatically created the first time OpenSearch Benchmark runs. - -The file is separated into the following sections, which you can customize based on the needs of your cluster. - -## meta - -This section contains meta information about the configuration file. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `config.version` | Integer | The version of the configuration file format. This property is managed by OpenSearch Benchmark and should not be changed. | - -## system - -This section contains global information for the current benchmark environment. This information should be identical on all machines on which OpenSearch Benchmark is installed. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `env.name` | String | The name of the benchmark environment used as metadata in metrics documents when an OpenSearch metrics store is configured. Only alphanumeric characters are allowed. Default is `local`. | -| `available.cores` | Integer | Determines the number of available CPU cores. OpenSearch Benchmark aims to create one asyncio event loop per core and distributes it to clients evenly across event loops. Defaults to the number of logical CPU cores for your cluster. | -| `async.debug` | Boolean | Enables debug mode on OpenSearch Benchmark's asyncio event loop. Default is `false`. | -| `passenv` | String | A comma-separated list of environment variable names that should be passed to OpenSearch for processing. | - -## node - -This section contains node-specific information that can be customized according to the needs of your cluster. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `root.dir` | String | The directory that stores all OpenSearch Benchmark data. OpenSearch Benchmark assumes control over this directory and all its subdirectories. | -| `src.root.dir` | String | The directory from which the OpenSearch source code and any OpenSearch plugins are called. Only relevant for benchmarks from [sources](#source). | - -## source - -This section contains more details about the OpenSearch source tree. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `remote.repo.url` | URL | The URL from which to check out OpenSearch. Default is `https://github.com/opensearch-project/OpenSearch.git`. -| `opensearch.src.subdir` | String | The local path relative to the `src.root.dir` of the OpenSearch search tree. Default is `OpenSearch`. -| `cache` | Boolean | Enables OpenSearch's internal source artifact cache, `opensearch*.tar.gz`, and any plugin zip files. Artifacts are cached based on their Git revision. Default is `true`. | -| `cache.days` | Integer | The number of days that an artifact should be kept in the source artifact cache. Default is `7`. | - -## benchmarks - -This section contains the settings that can be customized in the OpenSearch Benchmark data directory. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `local.dataset.cache` | String | The directory in which benchmark datasets are stored. Depending on the benchmarks that are run, this directory may contain hundreds of GB of data. Default path is `$HOME/.benchmark/benchmarks/data`. | - -## results_publishing - -This section defines how benchmark metrics are stored. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `datastore.type` | String | If set to `in-memory` all metrics are kept in memory while running the benchmark. If set to `opensearch` all metrics are instead written to a persistent metrics store and the data is made available for further analysis. Default is `in-memory`. | -| `sample.queue.size` | Function | The number of metrics samples that can be stored in OpenSearch Benchmark’s in-memory queue. Default is `2^20`. | -| metrics.request.downsample.factor | Integer| (default: 1): Determines how many service time and latency samples are saved in the metrics store. By default, all values are saved. If you want to, for example. keep only every 100th sample, specify `100`. This is useful to avoid overwhelming the metrics store in benchmarks with many clients. Default is `1`. | -| `output.processingtime` | Boolean | If set to `true`, OpenSearch shows the additional metric processing time in the command line report. Default is `false`. | - -### `datastore.type` parameters - -When `datastore.type` is set to `opensearch`, the following reporting settings can be customized. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `datastore.host` | IP address | The hostname of the metrics store, for example, `124.340.200.22`. | -| datastore.port| Port | The port number of the metrics store, for example, `9200`. | -| `datastore.secure` | Boolean | If set to `false`, OpenSearch assumes an HTTP connection. If set to true, it assumes an HTTPS connection. | -| `datastore.ssl.verification_mode` | String | When set to the default `full`, the metrics store’s SSL certificate is checked. To disable certificate verification, set this value to `none`. | -| `datastore.ssl.certificate_authorities` | String | Determines the local file system path to the certificate authority’s signing certificate. -| `datastore.user` | Username | Sets the username for the metrics store | -| `datastore.password` | String | Sets the password for the metrics store. Alternatively, this password can be configured using the `OSB_DATASTORE_PASSWORD` environment variable, which avoids storing credentials in a plain text file. The environment variable takes precedence over the config file if both define a password. | -| `datastore.probe.cluster_version` | String | Enables automatic detection of the metrics store’s version. Default is `true`. | -| `datastore.number_of_shards` | Integer | The number of primary shards that the `opensearch-*` indexes should have. Any updates to this setting after initial index creation will only be applied to new `opensearch-*` indexes. Default is the [OpenSearch static index value]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/#static-index-settings). | -| `datastore.number_of_replicas` | Integer | The number of replicas each primary shard in the datastore contains. Any updates to this setting after initial index creation will only be applied to new `opensearch-* `indexes. Default is the [OpenSearch static index value]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/#static-index-settings). | - -### Examples - -You can use the following examples to set reporting values in your cluster. - -This example defines an unprotected metrics store in the local network: - -``` -[results_publishing] -datastore.type = opensearch -datastore.host = 192.168.10.17 -datastore.port = 9200 -datastore.secure = false -datastore.user = -datastore.password = -``` - -This example defines a secure connection to a metrics store in the local network with a self-signed certificate: - -``` -[results_publishing] -datastore.type = opensearch -datastore.host = 192.168.10.22 -datastore.port = 9200 -datastore.secure = true -datastore.ssl.verification_mode = none -datastore.user = user-name -datastore.password = the-password-to-your-cluster -``` - -## workloads - -This section defines how workloads are retrieved. All keys are read by OpenSearch using the syntax `<>.url`, which you can select using the OpenSearch Benchmark CLI `--workload-repository=workload-repository-name"` option. By default, OpenSearch chooses the workload repository using the `default.url` `https://github.com/opensearch-project/opensearch-benchmark-workloads`. - - -## defaults - -This section defines the default values of certain OpenSearch Benchmark CLI parameters. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `preserve_benchmark_candidate` | Boolean | Determines whether OpenSearch installations are preserved or wiped by default after a benchmark. To preserve an installation for a single benchmark, use the command line flag `--preserve-install`. Default is `false`. - -## distributions - -This section defines how OpenSearch versions are distributed. - -| Parameter | Type | Description | -| :---- | :---- | :---- | -| `release.cache` | Boolean | Determines whether newly released OpenSearch versions should be cached locally. | - -## Proxy configurations - -OpenSearch automatically downloads all the necessary proxy data for you, including: - -- OpenSearch distributions, when you specify `--distribution-version=`. -- OpenSearch source code, when you specify a Git revision number, for example, `--revision=1e04b2w`. -- Any metadata tracked from the [OpenSearch GitHub repository](https://github.com/opensearch-project/OpenSearch). - -As of OpenSearch Benchmark 0.5.0, only `http_proxy` is supported. -{: .warning} - -You can use an `http_proxy` to connect OpenSearch Benchmark to a specific proxy and connect the proxy to a benchmark workload. To add the proxy: - - -1. Add your proxy URL to your shell profile: - - ``` - export http_proxy=http://proxy.proxy.org:4444/ - ``` - -2. Source your shell profile and verify that the proxy URL is set correctly: - - ``` - source ~/.bash_profile ; echo $http_proxy - ``` - -3. Configure Git to connect to your proxy by using the following command. For more information, see the [Git documentation](https://git-scm.com/docs/git-config). - - ``` - git config --global http_proxy $http_proxy - ``` - -4. Use `git clone` to clone the workloads repository by using the following command. If the proxy configured correctly, the clone is successful. - - ``` - git clone http://github.com/opensearch-project/opensearch-benchmark-workloads.git - ``` - -5. Lastly, verify that OpenSearch Benchmark can connect to the proxy server by checking the `/.benchmark/logs/benchmark.log` log. When OpenSearch Benchmark starts, you should see the following at the top of the log: - - ``` - Connecting via proxy URL [http://proxy.proxy.org:4444/] to the Internet (picked up from the environment variable [http_proxy]). - ``` - -## Logging - -Logs from OpenSearch Benchmark can be configured in the `~/.benchmark/logging.json` file. For more information about how to format the log file, see the following Python documentation: - -- For general tips and tricks, use the [Python Logging Cookbook](https://docs.python.org/3/howto/logging-cookbook.html). -- For the file format, see the Python [logging configuration schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema). -- For instructions on how to customize where the log output is written, see the [logging handlers documentation](https://docs.python.org/3/library/logging.handlers.html). - -By default, OpenSearch Benchmark logs all output to `~/.benchmark/logs/benchmark.log`. - - - - - - - diff --git a/_benchmark/creating-custom-workloads.md b/_benchmark/creating-custom-workloads.md deleted file mode 100644 index b5474fafe6..0000000000 --- a/_benchmark/creating-custom-workloads.md +++ /dev/null @@ -1,374 +0,0 @@ ---- -layout: default -title: Creating custom workloads -nav_order: 10 -has_children: false ---- - -# Creating custom workloads - -OpenSearch Benchmark includes a set of [workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) that you can use to benchmark data from your cluster. Additionally, if you want to create a workload that is tailored to your own data, you can create a custom workload using one of the following options: - -- [Creating a workload from an existing cluster](#creating-a-workload-from-an-existing-cluster) -- [Creating a workload without an existing cluster](#creating-a-workload-without-an-existing-cluster) - -## Creating a workload from an existing cluster - -If you already have an OpenSearch cluster with indexed data, use the following steps to create a custom workload for your cluster. - -### Prerequisites - -Before creating a custom workload, make sure you have the following prerequisites: - -- An OpenSearch cluster with an index that contains 1000 or more documents. If your cluster's index does not contain at least 1000 documents, the workload can still run tests, however, you cannot run workloads using `--test-mode`. -- You must have the correct permissions to access your OpenSearch cluster. For more information about cluster permissions, see [Permissions]({{site.url}}{{site.baseurl}}/security/access-control/permissions/). - -### Customizing the workload - -To begin creating a custom workload, use the `opensearch-benchmark create-workload` command. - -``` -opensearch-benchmark create-workload \ ---workload="" \ ---target-hosts="" \ ---client-options="basic_auth_user:'',basic_auth_password:''" \ ---indices="" \ ---output-path="" -``` - -Replace the following options in the preceding example with information specific to your existing cluster: - -- `--workload`: A custom name for your custom workload. -- `--target-hosts:` A comma-separated list of host:port pairs from which the cluster extracts data. -- `--client-options`: The basic authentication client options that OpenSearch Benchmark uses to access the cluster. -- `--indices`: One or more indexes inside your OpenSearch cluster that contain data. -- `--output-path`: The directory in which OpenSearch Benchmark creates the workload and its configuration files. - -The following example response creates a workload named `movies` from a cluster with an index named `movies-info`. The `movies-info` index contains over 2,000 documents. - -``` - ____ _____ __ ____ __ __ - / __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__ - / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/ -/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,< -\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_| - /_/ - -[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds. -[INFO] Connected to OpenSearch cluster [380d8fd64dd85b5f77c0ad81b0799e1e] version [1.1.0]. - -Extracting documents for index [movies] for test mode... 1000/1000 docs [100.0% done] -Extracting documents for index [movies]... 2000/2000 docs [100.0% done] - -[INFO] Workload movies has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/workloads/movies - -------------------------------- -[INFO] SUCCESS (took 2 seconds) -------------------------------- -``` - -As part of workload creation, OpenSearch Benchmark generates the following files. You can access them in the directory specified by the `--output-path` option. - -- `workload.json`: Contains general workload specifications. -- `.json`: Contains mappings and settings for the extracted indexes. -- `-documents.json`: Contains the sources of every document from the extracted indexes. Any sources suffixed with `-1k` encompass only a fraction of the document corpus of the workload and are only used when running the workload in test mode. - -By default, OpenSearch Benchmark does not contain a reference to generate queries. Because you have the best understanding of your data, we recommend adding a query to `workload.json` that matches your index's specifications. Use the following `match_all` query as an example of a query added to your workload: - -```json -{ - "operation": { - "name": "query-match-all", - "operation-type": "search", - "body": { - "query": { - "match_all": {} - } - } - }, - "clients": 8, - "warmup-iterations": 1000, - "iterations": 1000, - "target-throughput": 100 - } -``` - -### Creating a workload without an existing cluster - -If you want to create a custom workload but do not have an existing OpenSearch cluster with indexed data, you can create the workload by building the workload source files directly. All you need is data that can be exported into a JSON format. - -To build a workload with source files, create a directory for your workload and perform the following steps: - -1. Build a `-documents.json` file that contains rows of documents that comprise the document corpora of the workload and houses all data to be ingested and queried into the cluster. The following example shows the first few rows of a `movies-documents.json` file that contains rows of documents about famous movies: - - ```json - # First few rows of movies-documents.json - {"title": "Back to the Future", "director": "Robert Zemeckis", "revenue": "$212,259,762 USD", "rating": "8.5 out of 10", "image_url": "https://imdb.com/images/32"} - {"title": "Avengers: Endgame", "director": "Anthony and Joe Russo", "revenue": "$2,800,000,000 USD", "rating": "8.4 out of 10", "image_url": "https://imdb.com/images/2"} - {"title": "The Grand Budapest Hotel", "director": "Wes Anderson", "revenue": "$173,000,000 USD", "rating": "8.1 out of 10", "image_url": "https://imdb.com/images/65"} - {"title": "The Godfather: Part II", "director": "Francis Ford Coppola", "revenue": "$48,000,000 USD", "rating": "9 out of 10", "image_url": "https://imdb.com/images/7"} - ``` - -2. In the same directory, build a `index.json` file. The workload uses this file as a reference for data mappings and index settings for the documents contained in `-documents.json`. The following example creates mappings and settings specific to the `movie-documents.json` data from the previous step: - - ```json - { - "settings": { - "index.number_of_replicas": 0 - }, - "mappings": { - "dynamic": "strict", - "properties": { - "title": { - "type": "text" - }, - "director": { - "type": "text" - }, - "revenue": { - "type": "text" - }, - "rating": { - "type": "text" - }, - "image_url": { - "type": "text" - } - } - } - } - ``` - -3. Next, build a `workload.json` file that provides a high-level overview of your workload and determines how your workload runs benchmark tests. The `workload.json` file contains the following sections: - - - `indices`: Defines the name of the index to be created in your OpenSearch cluster using the mappings from the workload's `index.json` file created in the previous step. - - `corpora`: Defines the corpora and the source file, including the: - - `document-count`: The number of documents in `-documents.json`. To get an accurate number of documents, run `wc -l -documents.json`. - - `uncompressed-bytes`: The number of bytes inside the index. To get an accurate number of bytes, run `stat -f %z -documents.json` on macOS or `stat -c %s -documents.json` on GNU/Linux. Alternatively, run `ls -lrt | grep -documents.json`. - - `schedule`: Defines the sequence of operations and available test procedures for the workload. - - The following example `workload.json` file provides the entry point for the `movies` workload. The `indices` section creates an index called `movies`. The corpora section refers to the source file created in step one, `movie-documents.json`, and provides the document count and the amount of uncompressed bytes. Lastly, the schedule section defines a few operations the workload performs when invoked, including: - - - Deleting any current index named `movies`. - - Creating an index named `movies` based on data from `movie-documents.json` and the mappings from `index.json`. - - Verifying that the cluster is in good health and can ingest the new index. - - Ingesting the data corpora from `workload.json` into the cluster. - - Querying the results. - - ```json - { - "version": 2, - "description": "Tutorial benchmark for OpenSearch Benchmark", - "indices": [ - { - "name": "movies", - "body": "index.json" - } - ], - "corpora": [ - { - "name": "movies", - "documents": [ - { - "source-file": "movies-documents.json", - "document-count": 11658903, # Fetch document count from command line - "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line - } - ] - } - ], - "schedule": [ - { - "operation": { - "operation-type": "delete-index" - } - }, - { - "operation": { - "operation-type": "create-index" - } - }, - { - "operation": { - "operation-type": "cluster-health", - "request-params": { - "wait_for_status": "green" - }, - "retry-until-success": true - } - }, - { - "operation": { - "operation-type": "bulk", - "bulk-size": 5000 - }, - "warmup-time-period": 120, - "clients": 8 - }, - { - "operation": { - "operation-type": "force-merge" - } - }, - { - "operation": { - "name": "query-match-all", - "operation-type": "search", - "body": { - "query": { - "match_all": {} - } - } - }, - "clients": 8, - "warmup-iterations": 1000, - "iterations": 1000, - "target-throughput": 100 - } - ] - } - ``` - -4. For all the workload files created, verify that the workload is functional by running a test. To verify the workload, run the following command, replacing `--workload-path` with a path to your workload directory: - - ``` - opensearch-benchmark list workloads --workload-path= - ``` - -## Invoking your custom workload - -Use the `opensearch-benchmark execute-test` command to invoke your new workload and run a benchmark test against your OpenSearch cluster, as shown in the following example. Replace `--workload-path` with the path to your custom workload, `--target-host` with the `host:port` pairs for your cluster, and `--client-options` with any authorization options required to access the cluster. - -``` -opensearch-benchmark execute_test \ ---pipeline="benchmark-only" \ ---workload-path="" \ ---target-host="" \ ---client-options="basic_auth_user:'',basic_auth_password:''" -``` - -Results from the test appear in the directory set by `--output-path` option in `workloads.json`. - -## Advanced options - -You can enhance your custom workload's functionality with the following advanced options. - -### Test mode - -If you want run the test in test mode to make sure your workload operates as intended, add the `--test-mode` option to the `execute-test` command. Test mode ingests only the first 1000 documents from each index provided and runs query operations against them. - -To use test mode, create a `-documents-1k.json` file that contains the first 1000 documents from `-documents.json` using the following command: - -``` -head -n 1000 -documents.json > -documents-1k.json -``` - -Then, run `opensearch-benchmark execute-test` with the option `--test-mode`. Test mode runs a quick version of the workload test. - -``` -opensearch-benchmark execute_test \ ---pipeline="benchmark-only" \ ---workload-path="" \ ---target-host="" \ ---client-options"basic_auth_user:'',basic_auth_password:''" \ ---test-mode -``` - -### Adding variance to test procedures - -After using your custom workload several times, you might want to use the same workload but perform the workload's operations in a different order. Instead of creating a new workload or reorganizing the procedures directly, you can provide test procedures to vary workload operations. - -To add variance to your workload operations, go to your `workload.json` file and replace the `schedule` section with a `test_procedures` array, as shown in the following example. Each item in the array contains the following: - -- `name`: The name of the test procedure. -- `default`: When set to `true`, OpenSearch Benchmark defaults to the test procedure specified as `default` in the workload if no other test procedures are specified. -- `schedule`: All the operations the test procedure will run. - - -```json -"test_procedures": [ - { - "name": "index-and-query", - "default": true, - "schedule": [ - { - "operation": { - "operation-type": "delete-index" - } - }, - { - "operation": { - "operation-type": "create-index" - } - }, - { - "operation": { - "operation-type": "cluster-health", - "request-params": { - "wait_for_status": "green" - }, - "retry-until-success": true - } - }, - { - "operation": { - "operation-type": "bulk", - "bulk-size": 5000 - }, - "warmup-time-period": 120, - "clients": 8 - }, - { - "operation": { - "operation-type": "force-merge" - } - }, - { - "operation": { - "name": "query-match-all", - "operation-type": "search", - "body": { - "query": { - "match_all": {} - } - } - }, - "clients": 8, - "warmup-iterations": 1000, - "iterations": 1000, - "target-throughput": 100 - } - ] - } - ] -} -``` - -### Separate operations and test procedures - -If you want to make your `workload.json` file more readable, you can separate your operations and test procedures into different directories and reference the path to each in `workload.json`. To separate operations and procedures, perform the following steps: - -1. Add all test procedures to a single file. You can give the file any name. Because the `movies` workload in the preceding contains and index task and queries, this step names the test procedures file `index-and-query.json`. -2. Add all operations to a file named `operations.json`. -3. Reference the new files in `workloads.json` by adding the following syntax, replacing `parts` with the relative path to each file, as shown in the following example: - - ```json - "operations": [ - {% raw %}{{ benchmark.collect(parts="operations/*.json") }}{% endraw %} - ] - # Reference test procedure files in workload.json - "test_procedures": [ - {% raw %}{{ benchmark.collect(parts="test_procedures/*.json") }}{% endraw %} - ] - ``` - -## Next steps - -- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). -- To show a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. - - - - - - diff --git a/_benchmark/index.md b/_benchmark/index.md deleted file mode 100644 index dcfa629c5d..0000000000 --- a/_benchmark/index.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -layout: default -title: OpenSearch Benchmark -nav_order: 1 -has_children: false -nav_exclude: true -has_toc: false ---- - -# OpenSearch Benchmark - -OpenSearch Benchmark is a macrobenchmark utility provided by the [OpenSearch Project](https://github.com/opensearch-project). You can use OpenSearch Benchmark to gather performance metrics from an OpenSearch cluster for a variety of purposes, including: - -- Tracking the overall performance of an OpenSearch cluster. -- Informing decisions about when to upgrade your cluster to a new version. -- Determining how changes to your workflow---such as modifying mappings or queries---might impact your cluster. - -OpenSearch Benchmark can be installed directly on a compatible host running Linux and macOS. You can also run OpenSearch Benchmark in a Docker container. See [Installing OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) for more information. - -## Concepts - -Before using OpenSearch Benchmark, familiarize yourself with the following concepts: - -- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus from which to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads inside the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/). - -- **Pipeline**: A series of steps before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines: - - `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results. - - `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results. - - `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results. - -- **Test**: A single invocation of the OpenSearch Benchmark binary. - - diff --git a/_benchmark/installing-benchmark.md b/_benchmark/installing-benchmark.md deleted file mode 100644 index 4749160796..0000000000 --- a/_benchmark/installing-benchmark.md +++ /dev/null @@ -1,156 +0,0 @@ ---- -layout: default -title: Installing OpenSearch Benchmark -nav_order: 5 -has_children: false ---- - -# Installing OpenSearch Benchmark - -You can install OpenSearch Benchmark directly on a host running Linux or macOS, or you can run OpenSearch Benchmark in a Docker container on any compatible host. This page provides general considerations for your OpenSearch Benchmark host as well as instructions for installing OpenSearch Benchmark. - - -## Choosing appropriate hardware - -OpenSearch Benchmark can be used to provision OpenSearch nodes for testing. If you intend to use OpenSearch Benchmark to provision nodes in your environment, then install OpenSearch Benchmark directly on each host in the cluster. Additionally, you must configure each host in the cluster for OpenSearch. See [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/) for guidance on important host settings. - -Remember that OpenSearch Benchmark cannot be used to provision OpenSearch nodes when you run OpenSearch Benchmark in a Docker container. If you want to use OpenSearch Benchmark to provision nodes, or if you want to distribute the benchmark workload with the OpenSearch Benchmark daemon, then you must install OpenSearch Benchmark directly on each host using Python and pip. -{: .important} - -When you select a host, you should also think about which workloads you want to run. To see a list of default benchmark workloads, visit the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository on GitHub. As a general rule, make sure that the OpenSearch Benchmark host has enough free storage space to store the compressed data and the fully decompressed data corpus once OpenSearch Benchmark is installed. - -If you want to benchmark with a default workload, then use the following table to determine the approximate minimum amount of required free space needed by adding the compressed size with the uncompressed size. - -| Workload name | Document count | Compressed size | Uncompressed size | -| :----: | :----: | :----: | :----: | -| eventdata | 20,000,000 | 756.0 MB | 15.3 GB | -| geonames | 11,396,503 | 252.9 MB | 3.3 GB | -| geopoint | 60,844,404 | 482.1 MB | 2.3 GB | -| geopointshape | 60,844,404 | 470.8 MB | 2.6 GB | -| geoshape | 60,523,283 | 13.4 GB | 45.4 GB | -| http_logs | 247,249,096 | 1.2 GB | 31.1 GB | -| nested | 11,203,029 | 663.3 MB | 3.4 GB | -| noaa | 33,659,481 | 949.4 MB | 9.0 GB | -| nyc_taxis | 165,346,692 | 4.5 GB | 74.3 GB | -| percolator | 2,000,000 | 121.1 kB | 104.9 MB | -| pmc | 574,199 | 5.5 GB | 21.7 GB | -| so | 36,062,278 | 8.9 GB | 33.1 GB | - -Your OpenSearch Benchmark host should use solid-state drives (SSDs) for storage because they perform read and write operations significantly faster than traditional spinning-disk hard drives. Spinning-disk hard drives can introduce performance bottlenecks, which can make benchmark results unreliable and inconsistent. -{: .tip} - -## Installing on Linux and macOS - -If you want to run OpenSearch Benchmark in a Docker container, see [Installing with Docker](#installing-with-docker). The OpenSearch Benchmark Docker image includes all of the required software, so there are no additional steps required. -{: .important} - -To install OpenSearch Benchmark directly on a UNIX host, such as Linux or macOS, make sure you have **Python 3.8 or later** installed. - -If you need help installing Python, refer to the official [Python Setup and Usage](https://docs.python.org/3/using/index.html) documentation. - -### Checking software dependencies - -Before you begin installing OpenSearch Benchmark, check the following software dependencies. - -Use [pyenv](https://github.com/pyenv/pyenv) to manage multiple versions of Python on your host. This is especially useful if your "system" version of Python is earlier than version 3.8. -{: .tip} - -- Check that Python 3.8 or later is installed: - - ```bash - python3 --version - ``` - {% include copy.html %} - -- Check that `pip` is installed and functional: - - ```bash - pip --version - ``` - {% include copy.html %} - -- _Optional_: Check that your installed version of `git` is **Git 1.9 or later** using the following command. `git` is not required for OpenSearch Benchmark installation, but it is required in order to fetch benchmark workload resources from a repository when you want to perform tests. See the official Git [Documentation](https://git-scm.com/doc) for help installing Git. - - ```bash - git --version - ``` - {% include copy.html %} - -### Completing the installation - -After the required software is installed, you can install OpenSearch Benchmark using the following command: - -```bash -pip install opensearch-benchmark -``` -{% include copy.html %} - -After the installation completes, you can use the following command to display help information: - -```bash -opensearch-benchmark -h -``` -{% include copy.html %} - - -Now that OpenSearch Benchmark is installed on your host, you can learn about [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). - -## Installing with Docker - -You can find the official Docker images for OpenSearch Benchmark on [Docker Hub](https://hub.docker.com/r/opensearchproject/opensearch-benchmark) or on the [Amazon ECR Public Gallery](https://gallery.ecr.aws/opensearchproject/opensearch-benchmark). - - -### Docker limitations - -Some OpenSearch Benchmark functionality is unavailable when you run OpenSearch Benchmark in a Docker container. Specifically, the following restrictions apply: - -- OpenSearch Benchmark cannot distribute load from multiple hosts, such as load worker coordinator hosts. -- OpenSearch Benchmark cannot provision OpenSearch nodes and can only run tests on previously existing clusters. You can only invoke OpenSearch Benchmark commands using the `benchmark-only` pipeline. - -### Pulling the Docker images - -To pull the image from Docker Hub, run the following command: - -```bash -docker pull opensearchproject/opensearch-benchmark:latest -``` -{% include copy.html %} - -To pull the image from Amazon Elastic Container Registry (Amazon ECR): - -```bash -docker pull public.ecr.aws/opensearchproject/opensearch-benchmark:latest -``` -{% include copy.html %} - -### Running Benchmark with Docker - -To run OpenSearch Benchmark, use `docker run` to launch a container. OpenSearch Benchmark subcommands are passed as arguments when you start the container. OpenSearch Benchmark then processes the command and stops the container after the requested operation completes. - -For example, the following command prints the help text for OpenSearch Benchmark to the command line and then stops the container: - -```bash -docker run opensearchproject/opensearch-benchmark -h -``` -{% include copy.html %} - - -### Establishing volume persistence in a Docker container - -To make sure your benchmark data and logs persist after your Docker container stops, specify a Docker volume to mount to the image when you work with OpenSearch Benchmark. - -Use the `-v` option to specify a local directory to mount and a directory in the container where the volume is attached. - -The following example command creates a volume in a user's home directory, mounts the volume to the OpenSearch Benchmark container at `/opensearch-benchmark/.benchmark`, and then runs a test benchmark using the geonames workload. Some client options are also specified: - -```bash -run -v $HOME/benchmarks:/opensearch-benchmark/.benchmark opensearchproject/opensearch-benchmark execute_test --target-hosts https://198.51.100.25:9200 --pipeline benchmark-only --workload geonames --client-options basic_auth_user:admin,basic_auth_password:admin,verify_certs:false --test-mode -``` -{% include copy.html %} - -See [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to learn more about the files and subdirectories located in `/opensearch-benchmark/.benchmark`. - -## Next steps - -- [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) -- [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/) \ No newline at end of file diff --git a/_benchmark/workloads/corpora.md b/_benchmark/workloads/corpora.md deleted file mode 100644 index 930baa5cad..0000000000 --- a/_benchmark/workloads/corpora.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -layout: default -title: corpora -parent: Workload reference -nav_order: 70 ---- - -# corpora - -The `corpora` element contains all the document corpora used by the workload. You can use document corpora across workloads by copying and pasting any corpora definitions. - -## Example - -The following example defines a single corpus called `movies` with `11658903` documents and `1544799789` uncompressed bytes: - -```json - "corpora": [ - { - "name": "movies", - "documents": [ - { - "source-file": "movies-documents.json", - "document-count": 11658903, # Fetch document count from command line - "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line - } - ] - } - ] -``` - -## Configuration options - -Use the following options with `corpora`. - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`name` | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces. -`documents` | Yes | JSON array | An array of document files. -`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. - - -Each entry in the `documents` array consists of the following options. - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`source-file` | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. -`document-count` | Yes | Integer | The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. -`base-url` | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. -`source-format` | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in `source-file`. Only `bulk` is supported. -`compressed-bytes` | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. -`uncompressed-bytes` | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. -`target-index` | No | String | Defines the name of the index that the `bulk` operation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element. The value of `target-index` is ignored when the `includes-action-and-meta-data` setting is `true`. -`target-type` | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the `indices` element and the index has only one type. The value of `target-type` is ignored when the `includes-action-and-meta-data` setting is `true`. -`includes-action-and-meta-data` | No | Boolean | When set to `true`, indicates that the document's file already contains an `action` line and a `meta-data` line. When `false`, indicates that the document's file contains only documents. Default is `false`. -`meta` | No | String | A mapping of key-value pairs with additional metadata for a corpus. - diff --git a/_benchmark/workloads/index.md b/_benchmark/workloads/index.md deleted file mode 100644 index 771e98309f..0000000000 --- a/_benchmark/workloads/index.md +++ /dev/null @@ -1,250 +0,0 @@ ---- -layout: default -title: Workload reference -nav_order: 60 -has_children: true ---- - -# OpenSearch Benchmark workload reference - -A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following: - -- One or more data streams that are ingested into indices -- A set of queries and operations that are invoked as part of the benchmark - -## Anatomy of a workload - -The following example workload shows all of the essential elements needed to create a workload.json file. You can run this workload in your own benchmark configuration in order to understand how all of the elements work together: - -```json -{ - "description": "Tutorial benchmark for OpenSearch Benchmark", - "indices": [ - { - "name": "movies", - "body": "index.json" - } - ], - "corpora": [ - { - "name": "movies", - "documents": [ - { - "source-file": "movies-documents.json", - "document-count": 11658903, # Fetch document count from command line - "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line - } - ] - } - ], - "schedule": [ - { - "operation": { - "operation-type": "create-index" - } - }, - { - "operation": { - "operation-type": "cluster-health", - "request-params": { - "wait_for_status": "green" - }, - "retry-until-success": true - } - }, - { - "operation": { - "operation-type": "bulk", - "bulk-size": 5000 - }, - "warmup-time-period": 120, - "clients": 8 - }, - { - "operation": { - "name": "query-match-all", - "operation-type": "search", - "body": { - "query": { - "match_all": {} - } - } - }, - "iterations": 1000, - "target-throughput": 100 - } - ] -} -``` - -A workload usually consists of the following elements: - -- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indices and index templates used for the workload. -- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload. -- `schedule`: Defines operations and in what order the operations run in-line. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations. -- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized. - -### Indices - -To create an index, specify its `name`. To add definitions to your index, use the `body` option and point it to the JSON file containing the index definitions. For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/). For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/). - -### Corpora - -The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters: - -- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. -- `document-count`: The number of documents in the `source-file`, which determines which client indices correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. -- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. -- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents. - -### Operations - -The `operations` element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to `create-index`, which creates an index in the test cluster that OpenSearch Benchmark can write documents into. Operations are usually listed inside of `schedule`. - -### Schedule - -The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`: - -```json - "schedule": [ - { - "operation": { - "operation-type": "create-index" - } - }, - { - "operation": { - "operation-type": "cluster-health", - "request-params": { - "wait_for_status": "green" - }, - "retry-until-success": true - } - }, - { - "operation": { - "operation-type": "bulk", - "bulk-size": 5000 - }, - "warmup-time-period": 120, - "clients": 8 - }, - { - "operation": { - "name": "query-match-all", - "operation-type": "search", - "body": { - "query": { - "match_all": {} - } - } - }, - "iterations": 1000, - "target-throughput": 100 - } - ] -} -``` - -According to this schedule, the actions will run in the following order: - -1. The `create-index` operation creates an index. The index remains empty until the `bulk` operation adds documents with benchmarked data. -2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`. - - The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously. - - Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds. -5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently. -6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified. - - The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations. - - Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second. - - -## More workload examples - -If you want to try certain workloads before creating your own, use the following examples. - -### Running unthrottled - -In the following example, OpenSearch Benchmark runs an unthrottled bulk index operation for 1 hour against the `movies` index: - -```json -{ - "description": "Tutorial benchmark for OpenSearch Benchmark", - "indices": [ - { - "name": "movies", - "body": "index.json" - } - ], - "corpora": [ - { - "name": "movies", - "documents": [ - { - "source-file": "movies-documents.json", - "document-count": 11658903, # Fetch document count from command line - "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line - } - ] - } - ], - "schedule": [ - { - "operation": "bulk", - "warmup-time-period": 120, - "time-period": 3600, - "clients": 8 - } -] -} -``` - -### Workload with a single task - -The following workload runs a benchmark with a single task: a `match_all` query. Because no `clients` are indicated, only one client is used. According to the `schedule`, the workload runs the `match_all` query at 10 operations per second with 1 client, uses 100 iterations to warm up, and uses the next 100 iterations to measure the benchmark: - -```json -{ - "description": "Tutorial benchmark for OpenSearch Benchmark", - "indices": [ - { - "name": "movies", - "body": "index.json" - } - ], - "corpora": [ - { - "name": "movies", - "documents": [ - { - "source-file": "movies-documents.json", - "document-count": 11658903, # Fetch document count from command line - "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line - } - ] - } - ], -{ - "schedule": [ - { - "operation": { - "operation-type": "search", - "index": "_all", - "body": { - "query": { - "match_all": {} - } - } - }, - "warmup-iterations": 100, - "iterations": 100, - "target-throughput": 10 - } - ] -} -} -``` - -## Next steps - -- For more information about configuring OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/). -- For a list of prepackaged workloads for OpenSearch Benchmark, see the [opensearch-benchmark-workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads) repository. diff --git a/_benchmark/workloads/indices.md b/_benchmark/workloads/indices.md deleted file mode 100644 index 1aae3e536e..0000000000 --- a/_benchmark/workloads/indices.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -layout: default -title: indices -parent: Workload reference -nav_order: 65 ---- - -# indices - -The `indices` element contains a list of all indices used in the workload. - -## Example - -```json -"indices": [ - { - "name": "geonames", - "body": "geonames-index.json", - } -] -``` - -## Configuration options - -Use the following options with `indices`: - -Parameter | Required | Type | Description -:--- | :--- | :--- | :--- -`name` | Yes | String | The name of the index template. -`body` | No | String | The file name corresponding to the index definition used in the body of the Create Index API. diff --git a/_clients/OSC-dot-net.md b/_clients/OSC-dot-net.md index 9af9d42866..98a0b8e8fe 100644 --- a/_clients/OSC-dot-net.md +++ b/_clients/OSC-dot-net.md @@ -299,19 +299,6 @@ internal class Program PrintResponse(searchResponse); } - private static void SearchForAllStudentsWithANonEmptyLastName() - { - var searchResponse = osClient.Search(s => s - .Index("students") - .Query(q => q - .Bool(b => b - .Must(m => m.Exists(fld => fld.LastName)) - .MustNot(m => m.Term(t => t.Verbatim().Field(fld => fld.LastName).Value(string.Empty))) - ))); - - PrintResponse(searchResponse); - } - private static void SearchLowLevel() { // Search for the student using the low-level client diff --git a/_clients/OpenSearch-dot-net.md b/_clients/OpenSearch-dot-net.md index 9e41fffe18..df8f7b06ae 100644 --- a/_clients/OpenSearch-dot-net.md +++ b/_clients/OpenSearch-dot-net.md @@ -12,10 +12,6 @@ OpenSearch.Net is a low-level .NET client that provides the foundational layer o This getting started guide illustrates how to connect to OpenSearch, index documents, and run queries. For the client source code, see the [opensearch-net repo](https://github.com/opensearch-project/opensearch-net). -## Stable Release - -This documentation reflects the latest updates available in the [GitHub repository](https://github.com/opensearch-project/opensearch-net) and may include changes unavailable in the current stable release. The current stable release in NuGet is [1.2.0](https://www.nuget.org/packages/OpenSearch.Net.Auth.AwsSigV4/1.2.0). - ## Example The following example illustrates connecting to OpenSearch, indexing documents, and sending queries on the data. It uses the Student class to represent one student, which is equivalent to one document in the index. @@ -471,4 +467,4 @@ internal class Program } } ``` -{% include copy.html %} +{% include copy.html %} \ No newline at end of file diff --git a/_clients/java-rest-high-level.md b/_clients/java-rest-high-level.md index e4364994e5..23e28791a0 100644 --- a/_clients/java-rest-high-level.md +++ b/_clients/java-rest-high-level.md @@ -6,10 +6,10 @@ nav_order: 20 # Java high-level REST client -The OpenSearch Java high-level REST client is deprecated. Support will be removed in OpenSearch version 3.0.0. We recommend switching to the [Java client]({{site.url}}{{site.baseurl}}/clients/java/) instead. +The OpenSearch Java high-level REST client will be deprecated starting with OpenSearch version 3.0.0 and will be removed in a future release. We recommend switching to the [Java client]({{site.url}}{{site.baseurl}}/clients/java/) instead. {: .warning} -The OpenSearch Java high-level REST client lets you interact with your OpenSearch clusters and indexes through Java methods and data structures rather than HTTP methods and JSON. +The OpenSearch Java high-level REST client lets you interact with your OpenSearch clusters and indices through Java methods and data structures rather than HTTP methods and JSON. ## Setup diff --git a/_clients/java.md b/_clients/java.md index 2b8c776d23..e345f9d053 100644 --- a/_clients/java.md +++ b/_clients/java.md @@ -18,7 +18,7 @@ To start using the OpenSearch Java client, you need to provide a transport. The org.opensearch.client opensearch-java - 2.6.0 + 2.4.0 ``` {% include copy.html %} @@ -27,7 +27,7 @@ If you're using Gradle, add the following dependencies to your project: ``` dependencies { - implementation 'org.opensearch.client:opensearch-java:2.6.0' + implementation 'org.opensearch.client:opensearch-java:2.4.0' } ``` {% include copy.html %} @@ -48,7 +48,7 @@ Alternatively, you can create a Java client by using the `RestClient`-based tran org.opensearch.client opensearch-java - 2.6.0 + 2.4.0 ``` {% include copy.html %} @@ -57,8 +57,8 @@ If you're using Gradle, add the following dependencies to your project" ``` dependencies { - implementation 'org.opensearch.client:opensearch-rest-client:{{site.opensearch_version}}' - implementation 'org.opensearch.client:opensearch-java:2.6.0' + implementation 'org.opensearch.client:opensearch-rest-client: {{site.opensearch_version}}' + implementation 'org.opensearch.client:opensearch-java:2.4.0' } ``` {% include copy.html %} @@ -198,11 +198,11 @@ This code example uses basic credentials that come with the default OpenSearch c The following sample code initializes a client with SSL and TLS enabled: ```java -import org.apache.http.HttpHost; -import org.apache.http.auth.AuthScope; -import org.apache.http.auth.UsernamePasswordCredentials; -import org.apache.http.impl.nio.client.HttpAsyncClientBuilder; -import org.apache.http.impl.client.BasicCredentialsProvider; +import org.apache.hc.client5.http.auth.AuthScope; +import org.apache.hc.client5.http.auth.UsernamePasswordCredentials; +import org.apache.hc.client5.http.impl.async.HttpAsyncClientBuilder; +import org.apache.hc.client5.http.impl.auth.BasicCredentialsProvider; +import org.apache.hc.core5.http.HttpHost; import org.opensearch.client.RestClient; import org.opensearch.client.RestClientBuilder; import org.opensearch.client.json.jackson.JacksonJsonpMapper; @@ -215,10 +215,10 @@ public class OpenSearchClientExample { System.setProperty("javax.net.ssl.trustStore", "/full/path/to/keystore"); System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore"); - final HttpHost host = new HttpHost("https", 9200, "localhost"); + final HttpHost host = new HttpHost("https", "localhost", 9200); final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider(); //Only for demo purposes. Don't specify your credentials in code. - credentialsProvider.setCredentials(new AuthScope(host), new UsernamePasswordCredentials("admin", "admin")); + credentialsProvider.setCredentials(new AuthScope(host), new UsernamePasswordCredentials("admin", "admin".toCharArray())); //Initialize the client with SSL and TLS enabled final RestClient restClient = RestClient.builder(host). @@ -291,7 +291,7 @@ You can create an index with non-default settings using the following code: ```java String index = "sample-index"; -CreateIndexRequest createIndexRequest = new CreateIndexRequest.Builder().index(index).build(); +CreateRequest createIndexRequest = new CreateRequest.Builder().index(index).build(); client.indices().create(createIndexRequest); IndexSettings indexSettings = new IndexSettings.Builder().autoExpandReplicas("0-all").build(); @@ -338,8 +338,22 @@ client.delete(b -> b.index(index).id("1")); The following sample code deletes an index: ```java -DeleteIndexRequest deleteIndexRequest = new DeleteRequest.Builder().index(index).build(); -DeleteIndexResponse deleteIndexResponse = client.indices().delete(deleteIndexRequest); +DeleteRequest deleteRequest = new DeleteRequest.Builder().index(index).build(); +DeleteResponse deleteResponse = client.indices().delete(deleteRequest); + +} catch (IOException e){ + System.out.println(e.toString()); +} finally { + try { + if (restClient != null) { + restClient.close(); + } + } catch (IOException e) { + System.out.println(e.toString()); + } + } + } +} ``` {% include copy.html %} @@ -372,53 +386,54 @@ public class OpenSearchClientExample { public static void main(String[] args) { RestClient restClient = null; try{ - System.setProperty("javax.net.ssl.trustStore", "/full/path/to/keystore"); - System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore"); + System.setProperty("javax.net.ssl.trustStore", "/full/path/to/keystore"); + System.setProperty("javax.net.ssl.trustStorePassword", "password-to-keystore"); - //Only for demo purposes. Don't specify your credentials in code. - final CredentialsProvider credentialsProvider = new BasicCredentialsProvider(); - credentialsProvider.setCredentials(AuthScope.ANY, - new UsernamePasswordCredentials("admin", "admin")); + //Only for demo purposes. Don't specify your credentials in code. + final CredentialsProvider credentialsProvider = new BasicCredentialsProvider(); + credentialsProvider.setCredentials(AuthScope.ANY, + new UsernamePasswordCredentials("admin", "admin")); - //Initialize the client with SSL and TLS enabled - restClient = RestClient.builder(new HttpHost("localhost", 9200, "https")). - setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() { - @Override - public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) { - return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider); - } - }).build(); - Transport transport = new RestClientTransport(restClient, new JacksonJsonpMapper()); - OpenSearchClient client = new OpenSearchClient(transport); - - //Create the index - String index = "sample-index"; - CreateIndexRequest createIndexRequest = new CreateIndexRequest.Builder().index(index).build(); - client.indices().create(createIndexRequest); - - //Add some settings to the index - IndexSettings indexSettings = new IndexSettings.Builder().autoExpandReplicas("0-all").build(); - IndexSettingsBody settingsBody = new IndexSettingsBody.Builder().settings(indexSettings).build(); - PutSettingsRequest putSettingsRequest = new PutSettingsRequest.Builder().index(index).value(settingsBody).build(); - client.indices().putSettings(putSettingsRequest); - - //Index some data - IndexData indexData = new IndexData("first_name", "Bruce"); - IndexRequest indexRequest = new IndexRequest.Builder().index(index).id("1").document(indexData).build(); - client.index(indexRequest); - - //Search for the document - SearchResponse searchResponse = client.search(s -> s.index(index), IndexData.class); - for (int i = 0; i< searchResponse.hits().hits().size(); i++) { - System.out.println(searchResponse.hits().hits().get(i).source()); - } + //Initialize the client with SSL and TLS enabled + restClient = RestClient.builder(new HttpHost("localhost", 9200, "https")). + setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() { + @Override + public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) { + return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider); + } + }).build(); + Transport transport = new RestClientTransport(restClient, new JacksonJsonpMapper()); + OpenSearchClient client = new OpenSearchClient(transport); + + //Create the index + String index = "sample-index"; + CreateRequest createIndexRequest = new CreateRequest.Builder().index(index).build(); + client.indices().create(createIndexRequest); + + //Add some settings to the index + IndexSettings indexSettings = new IndexSettings.Builder().autoExpandReplicas("0-all").build(); + IndexSettingsBody settingsBody = new IndexSettingsBody.Builder().settings(indexSettings).build(); + PutSettingsRequest putSettingsRequest = new PutSettingsRequest.Builder().index(index).value(settingsBody).build(); + client.indices().putSettings(putSettingsRequest); + + //Index some data + IndexData indexData = new IndexData("first_name", "Bruce"); + IndexRequest indexRequest = new IndexRequest.Builder().index(index).id("1").document(indexData).build(); + client.index(indexRequest); + + //Search for the document + SearchResponse searchResponse = client.search(s -> s.index(index), IndexData.class); + for (int i = 0; i< searchResponse.hits().hits().size(); i++) { + System.out.println(searchResponse.hits().hits().get(i).source()); + } + + //Delete the document + client.delete(b -> b.index(index).id("1")); - //Delete the document - client.delete(b -> b.index(index).id("1")); + // Delete the index + DeleteRequest deleteRequest = new DeleteRequest.Builder().index(index).build(); + DeleteResponse deleteResponse = client.indices().delete(deleteRequest); - // Delete the index - DeleteIndexRequest deleteIndexRequest = new DeleteRequest.Builder().index(index).build(); - DeleteIndexResponse deleteIndexResponse = client.indices().delete(deleteIndexRequest); } catch (IOException e){ System.out.println(e.toString()); } finally { diff --git a/_clients/javascript/helpers.md b/_clients/javascript/helpers.md index b03af21f94..9efd74d305 100644 --- a/_clients/javascript/helpers.md +++ b/_clients/javascript/helpers.md @@ -7,7 +7,7 @@ nav_order: 2 # Helper methods -Helper methods simplify the use of complicated API tasks. For the client's complete API documentation and additional examples, see the [JS client API documentation](https://opensearch-project.github.io/opensearch-js/2.2/index.html). +Helper methods simplify the use of complicated API tasks. ## Bulk helper @@ -68,7 +68,7 @@ When creating a new bulk helper instance, you can use the following configuratio ### Examples -The following examples illustrate the index, create, update, and delete bulk helper operations. For more information and advanced index actions, see the [`opensearch-js` guides](https://github.com/opensearch-project/opensearch-js/tree/main/guides) in GitHub. +The following examples illustrate the index, create, update, and delete bulk helper operations. #### Index diff --git a/_clients/javascript/index.md b/_clients/javascript/index.md index 9adcd65be7..ba58ad04f4 100644 --- a/_clients/javascript/index.md +++ b/_clients/javascript/index.md @@ -9,11 +9,9 @@ redirect_from: # JavaScript client -The OpenSearch JavaScript (JS) client provides a safer and easier way to interact with your OpenSearch cluster. Rather than using OpenSearch from the browser and potentially exposing your data to the public, you can build an OpenSearch client that takes care of sending requests to your cluster. For the client's complete API documentation and additional examples, see the [JS client API documentation](https://opensearch-project.github.io/opensearch-js/2.2/index.html). +The OpenSearch JavaScript (JS) client provides a safer and easier way to interact with your OpenSearch cluster. Rather than using OpenSearch from the browser and potentially exposing your data to the public, you can build an OpenSearch client that takes care of sending requests to your cluster. For the client's complete API documentation and additional examples, see the [JS client API documentation](https://opensearch-project.github.io/opensearch-js/2.1/index.html). -The client contains a library of APIs that let you perform different operations on your cluster and return a standard response body. The example here demonstrates some basic operations like creating an index, adding documents, and searching your data. - -You can use helper methods to simplify the use of complicated API tasks. For more information, see [Helper methods]({{site.url}}{{site.baseurl}}/clients/javascript/helpers/). For more advanced index actions, see the [`opensearch-js` guides](https://github.com/opensearch-project/opensearch-js/tree/main/guides) in GitHub. +The client contains a library of APIs that let you perform different operations on your cluster and return a standard response body. The example here demonstrates some basic operations like creating an index, adding documents, and searching your data. ## Setup diff --git a/_clients/ruby.md b/_clients/ruby.md index 7d582927c6..59fa413a6c 100644 --- a/_clients/ruby.md +++ b/_clients/ruby.md @@ -634,7 +634,7 @@ puts MultiJson.dump(response, pretty: "true") # Ruby AWS Sigv4 Client -The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: +The [opensearch-aws-sigv4](https://github.com/opensearch-project/opensearch-ruby/tree/main/opensearch-aws-sigv4) gem provides the `OpenSearch::Aws::Sigv4Client` class, which has all features of `OpenSearch::Client`. The only difference between these two clients is that `OpenSearch::Aws::Sigv4Client` requires an instance of `Aws::Sigv4::Signer` during instantiation to authenticate with AWS: ```ruby require 'opensearch-aws-sigv4' diff --git a/_config.yml b/_config.yml index f8b9bd4cac..25a612a07f 100644 --- a/_config.yml +++ b/_config.yml @@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.9.0' -opensearch_dashboards_version: '2.9.0' -opensearch_major_minor_version: '2.9' -lucene_version: '9_7_0' +opensearch_version: '2.7.0' +opensearch_dashboards_version: '2.7.0' +opensearch_major_minor_version: '2.7' +lucene_version: '9_5_0' # Build settings markdown: kramdown @@ -40,9 +40,6 @@ collections: dashboards: permalink: /:collection/:path/ output: true - integrations: - permalink: /:collection/:path/ - output: true tuning-your-cluster: permalink: /:collection/:path/ output: true @@ -67,27 +64,15 @@ collections: observing-your-data: permalink: /:collection/:path/ output: true - reporting: - permalink: /:collection/:path/ - output: true - analyzers: - permalink: /:collection/:path/ - output: true query-dsl: permalink: /:collection/:path/ output: true - aggregations: - permalink: /:collection/:path/ - output: true field-types: permalink: /:collection/:path/ output: true clients: permalink: /:collection/:path/ output: true - benchmark: - permalink: /:collection/:path/ - output: true data-prepper: permalink: /:collection/:path/ output: true @@ -103,9 +88,6 @@ collections: external_links: permalink: /:collection/:path/ output: true - developer-documentation: - permalink: /:collection/:path/ - output: true just_the_docs: # Define the collections used in the theme @@ -124,11 +106,8 @@ just_the_docs: dashboards: name: OpenSearch Dashboards nav_fold: true - integrations: - name: OpenSearch Integrations - nav_fold: true tuning-your-cluster: - name: Creating and tuning your cluster + name: Tuning your cluster nav_fold: true security: name: Security in OpenSearch @@ -136,39 +115,30 @@ just_the_docs: security-analytics: name: Security analytics nav_fold: true - field-types: - name: Mappings and field types - nav_fold: true - analyzers: - name: Text analysis - nav_fold: true - query-dsl: - name: Query DSL - nav_fold: true - aggregations: - name: Aggregations - nav_fold: true search-plugins: name: Search nav_fold: true ml-commons-plugin: name: Machine learning nav_fold: true + tuning-your-cluster: + name: Creating and tuning your cluster + nav_fold: true monitoring-your-cluster: name: Monitoring your cluster nav_fold: true observing-your-data: name: Observability nav_fold: true - reporting: - name: Reporting + query-dsl: + name: Query DSL, Aggregations, and Analyzers + nav_fold: true + field-types: + name: Mappings and field types nav_fold: true clients: name: Clients nav_fold: true - benchmark: - name: OpenSearch Benchmark - nav_fold: true data-prepper: name: Data Prepper nav_fold: true @@ -181,9 +151,6 @@ just_the_docs: troubleshoot: name: Troubleshooting nav_fold: true - developer-documentation: - name: Developer documentation - nav_fold: true # Enable or disable the site search @@ -256,5 +223,4 @@ exclude: - vendor/gems/ - vendor/ruby/ - README.md - - .idea - - templates + - .idea \ No newline at end of file diff --git a/_dashboards/dev-tools/index-dev.md b/_dashboards/dev-tools/index-dev.md index 4941f49eee..24695524a7 100644 --- a/_dashboards/dev-tools/index-dev.md +++ b/_dashboards/dev-tools/index-dev.md @@ -7,14 +7,8 @@ has_children: true # Dev Tools -Interact directly with OpenSearch by using **Dev Tools** to set up your OpenSearch Dashboards environment, run queries, explore data, and debug problems. To access the Dev Tools console, select **Dev Tools** from the **Management** menu on the OpenSearch Dashboards home page. The following are examples of how you can use the Dev Tools console in OpenSearch Dashboards: +**Dev Tools** allows you to set up your OpenSearch Dashboards environment, identify and fix bugs, and customize your dashboards' appearance and behavior. -- Set up your OpenSearch Dashboards environment. For example, you can use the console to configure authentication settings for your OpenSearch Dashboards instance. -- [Run queries to explore your data]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/run-queries/). For example, you can use the console to run a query to find all the documents in your index that contain a specific word. -- Debug problems with your queries. For example, if your query is not returning the results you expect, you can use the console to look for error messages and identify the problem. -- Learn about the APIs in OpenSearch. For example, you can use the API reference documentation linked in the console (select the question circle icon ({::nomarkdown}question circle icon{:/})) to look up the syntax for different API calls. -- Develop custom visualizations. For example, you can use the console to create Vega visualizations. -- Customize the appearance and behavior of dashboards. For example, you can use the console to customize dashboard visualization colors or to add new filters. -- Identify and fix bugs. For example, you can use the console to view logs and identify the cause of the problem. +To access the Dev Tools console, select **Dev Tools** in the menu on the OpenSearch Dashboards home page. You'll see an interface like the one shown in the following image. -The Dev Tools console is a valuable resource for developers, analysts, and anyone else who works with OpenSearch data. +Dev Tools interface from home page diff --git a/_dashboards/dev-tools/run-queries.md b/_dashboards/dev-tools/run-queries.md index 8e09438fc7..fe1e8e3d17 100644 --- a/_dashboards/dev-tools/run-queries.md +++ b/_dashboards/dev-tools/run-queries.md @@ -9,7 +9,17 @@ redirect_from: # Running queries in the Dev Tools console -Use the Dev Tools console to send queries to OpenSearch. To access the Dev Tools console, select **Dev Tools** under the **Management** menu on the OpenSearch Dashboards home page. +You can use the OpenSearch Dev Tools Console to send queries to OpenSearch. + +## Navigating to the console + +To open the console, select **Dev Tools** on the main OpenSearch Dashboards page: + +Dev Tools Console from main page{: .img-fluid } + +You can open the console from any other page by navigating to the main menu and selecting **Management** > **Dev Tools**. + +Dev Tools Console from all pages ## Writing queries @@ -17,7 +27,7 @@ Write your queries in the editor pane on the left side of the console: Request pane{: .img-fluid } -Collapse or expand your query by selecting the triangle next to the line numbers. +You can collapse and expand parts of your query by selecting the small triangles next to the line numbers. {: .tip} To learn more about writing queries in OpenSearch domain-specific language (DSL), see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl). @@ -46,7 +56,6 @@ The console uses an easier syntax to format REST requests than the `curl` comman For example, the following `curl` command runs a search query: -```` ```bash curl -XGET http://localhost:9200/shakespeare/_search?pretty -H 'Content-Type: application/json' -d' { @@ -57,12 +66,9 @@ curl -XGET http://localhost:9200/shakespeare/_search?pretty -H 'Content-Type: ap } }' ``` -{% include copy.html %} -```` The same query has a simpler syntax in the console format: -```` ```json GET shakespeare/_search { @@ -73,8 +79,6 @@ GET shakespeare/_search } } ``` -{% include copy-curl.html %} -```` If you paste a `curl` command directly into the console, the command is automatically converted into the format the console uses. diff --git a/_dashboards/discover/multi-data-sources.md b/_dashboards/discover/multi-data-sources.md index d9203c5713..5134419011 100644 --- a/_dashboards/discover/multi-data-sources.md +++ b/_dashboards/discover/multi-data-sources.md @@ -21,9 +21,9 @@ To enable multiple data sources: 2. Open your local copy of the Dashboards configuration file, `opensearch_dashboards.yml`. If you don't have a copy, [`opensearch_dashboards.yml`](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/config/opensearch_dashboards.yml) is available on GitHub. 3. Set `data_source.enabled:` to `true` and save the YAML file. 4. Restart the Dashboards container. -5. Verify that the configuration settings were created and configured properly by connecting to Dashboards and viewing the **Dashboards Management** console. **Data Sources** appears in the sidebar, as shown in the following image. +5. Verify that the configuration settings were created and configured properly by connecting to Dashboards and viewing the **Stack Management** console. **Data Sources** appears in the sidebar, as shown in the following image. -Data sources sidebar on the Dashboards Management interface +![Data Sources navigation menu]({{site.url}}{{site.baseurl}}/images/dashboards/data-sources.png) ## Creating a data source connection @@ -32,26 +32,23 @@ A data source connection specifies the parameters needed to connect to a data so To create a new data source connection: 1. Go to [`http://localhost:5601`](http://localhost:5601/) and log in with the username `admin` and password `admin`. If you’re running the Security plugin, go to [`https://localhost:5601`](https://localhost:5601/). -2. From the OpenSearch Dashboards main menu, select **Dashboards Management** > **Data sources** > **Create data source connection**. -3. Add information to each field to configure **Connection Details** and **Authentication Method**. +2. From the OpenSearch Dashboards main menu, select **Stack Management**, **Data Sources**, and then **Create data source connection**. +3. Add information to each field to configure **Connection Details**, **Endpoint URL**, and **Authentication Method**. - Under **Connection Details**, enter a title and endpoint URL. For this tutorial, use the URL `http://localhost:5601/app/management/opensearch-dashboards/dataSources`. Entering a description is optional. + In the **Connection Details** window, enter a title. Entering a description is optional. - Under **Authentication Method**, select an authentication method from the dropdown list. Once an authentication method is selected, the applicable fields for that method appear. You can then enter the required details. The authentication method options are: + In the **Endpoint** window, enter the **Endpoint URL**. For this tutorial, use the URL `http://localhost:5601/app/management/opensearch-dashboards/dataSources`. + + In the **Authentication** window, select an **Authentication Method**. The options are: - **No authentication**: No authentication is used to connect to the data source. - **Username & Password**: A basic username and password are used to connect to the data source. - - **AWS SigV4**: An AWS Signature Version 4 authenticating request is used to connect to the data source. AWS Signature Version 4 requires an access key and a secret key. - - For AWS Signature Version 4 authentication, first specify the **Region**. Next, select the OpenSearch service in the **Service Name** list. The options are **Amazon OpenSearch Service** and **Amazon OpenSearch Serverless**. Last, enter the **Access Key** and **Secret Key** for authorization. For an example setup, see the following image. + - **AWS Sigv4**: An AWS Signature Version 4 authenticating request is used to connect to the data source. AWS Sigv4 requires an access key and a secret key. First specify the **Region**, and then enter the **Access Key** and **Secret Key** for authorization. For information about available AWS Regions for AWS accounts, see [Available Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions). For more about Sigv4 authentication requests, see [Authenticating Requests (AWS Signature Version 4)](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html). + + When you select the authentication method, the applicable fields appear for the selected method. Enter the required details. - AWS Signature Version 4 auth type setup - - For information about available AWS Regions for AWS accounts, see [Available Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions). For more information about AWS Signature Version 4 authentication requests, see [Authenticating Requests (AWS Signature Version 4)](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html). - {: .note } + After you have entered the appropriate details in all of the required fields, the **Test connection** and **Create data source connection** buttons become active. You can select **Test connection** to confirm that the connection is valid. - After you have entered the appropriate details in all of the required fields, the **Test connection** and **Create data source** buttons become active. You can select **Test connection** to confirm that the connection is valid. - -4. Select **Create data source** to save your settings. The connection is created. The active window returns to the **Data Sources** main page, and the new connection appears in the list of data sources. +4. Select **Create data source connection** to save your settings. The connection is created. The active window returns to the **Data Sources** main page, and the new connection appears in the list of data sources. 5. Delete the data source connection by selecting the check box to the left of the title and then choosing **Delete 1 connection**. Selecting multiple check boxes for multiple connections is supported. @@ -61,9 +58,9 @@ To make changes to the data source connection, select a connection in the list o To make changes to **Connection Details**, edit one or both of the **Title** and **Description** fields and select **Save changes** in the lower-right corner of the screen. You can also cancel changes here. To change the **Authentication Method**, choose a different authentication method, enter your credentials (if applicable), and then select **Save changes** in the lower-right corner of the screen. The changes are saved. -When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid. +When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid. -When **AWS SigV4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid. +When **AWS Sigv4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid. To delete the data source connection, select the trash can icon ({::nomarkdown}trash can icon{:/}). @@ -92,7 +89,9 @@ To set the time filter: 2. Select the calendar icon ({::nomarkdown}calendar icon{:/}) to change the time field. The default time period is **Last 15 minutes**. 3. Change the time field to a particular time period, for example, **Last 7 days**, and then select **Refresh**. 4. Change start or end times by selecting the start or end time in the search bar. -5. In the pop-up window, choose **Absolute**, **Relative**, or **Now** and then specify the date. +5. In the pop-up window, choose **Absolute**, **Relative**, or **Now** and then specify the date, for example, as shown in the following image. + +![Time filter with search bar]({{site.url}}{{site.baseurl}}/images/dashboards/time-filter-data-sources.png) ### Selecting a time range from the histogram @@ -124,18 +123,18 @@ Selecting multiple data sources in the Dev Tools console allows you to work with To create data visualizations for a dashboard, follow these steps: -1. In the Dashboards console, choose **Visualize** > **Create visualization**. +1. In the Dashboards console, choose **Visualize** and then **Create visualization**. 2. Select the visualization type. For this tutorial, choose **Line**. 3. Select a source. For this tutorial, choose the index pattern `opensearch_dashboards_sample_data_ecommerce`. -4. Under **Buckets**, choose **Add** > **X-axis**. -5. In the **Aggregation** field, choose **Date Histogram** > **Update**. +4. Under **Buckets**, choose **Add** and then **X-axis**. +5. In the **Aggregation** field, choose **Date Histogram** and then choose **Update**. 6. Choose **Save** and add the file name. ## Connecting visualizations in a single dashboard To connect your visualizations in a single dashboard, follow these steps: -1. In the Dashboards console, choose **Dashboard** > **Create dashboard**. +1. In the Dashboards console, choose **Dashboard** and then **Create dashboard**. 2. Choose **Add an existing** and then select the data you want to add. 3. Choose **Save** and add the dashboard name in the **Title field**. This tutorial uses preconfigured dashboards, so you won’t be able to save your dashboard. 4. Click on the white space left of **Add panels** to view the visualizations in a single dashboard. @@ -144,10 +143,10 @@ Your dashboard might look like the one in the following image. Example dashboard using data visualizations from many data sources -## Limitations +## Understanding feature limitations This feature has the following limitations: * The multiple data sources feature is supported for index-pattern-based visualizations only. * The visualization types Time Series Visual Builder (TSVB), Vega and Vega-Lite, and timeline are not supported. -* External plugins, such as Gantt chart, and non-visualization plugins, such as the developer console, are not supported. +* External plugins, such as Gantt chart, and non-visualization plugins, such as the developer console, are not supported. \ No newline at end of file diff --git a/_dashboards/discover/time-filter.md b/_dashboards/discover/time-filter.md index 730339bb04..70afa1fa53 100644 --- a/_dashboards/discover/time-filter.md +++ b/_dashboards/discover/time-filter.md @@ -11,12 +11,12 @@ redirect_from: You can change the time range to display dashboard data over minutes, hours, days, weeks, months, or years. -The default time range is **Last 15 minutes**. You can change the time range at the dashboard level or under **Dashboards Management** > **Advanced Settings** > **Time filter defaults**. +The default time range is **Last 15 minutes**. You can change the time range at the dashboard level or under **Stack Management > Advanced Settings > Time filter defaults**. {: .note} To change the time range at the dashboard level, perform the following steps: -1. From an OpenSearch Dashboards application (Discover, Dashboards, or Visualize), select the calendar icon ({::nomarkdown}calendar icon{:/}) on the right of the search bar. +1. From an OpenSearch Dashboards application (Discover, Dashboard, or Visualize), select the time clock or calendar icon. 2. Select one of the time filter options, as shown in the following image: - **Quick select:** Choose a time based on the last or next number of seconds, minutes, hours, days, or another time unit. - **Commonly used:** Choose a common time range like **Today**, **Last 7 days**, or **Last 30 days**. diff --git a/_dashboards/im-dashboards/datastream.md b/_dashboards/im-dashboards/datastream.md index 67b8133e4b..e9d381063c 100644 --- a/_dashboards/im-dashboards/datastream.md +++ b/_dashboards/im-dashboards/datastream.md @@ -91,25 +91,3 @@ To perform a force merge operation on two or more indexes, perform the following 1. Optionally, under **Advanced settings** you can to choose to **Flush indices** or **Only expunge delete** and then specify the **Max number of segments** to merge to as shown in the following image. ![Force Merge]({{site.url}}{{site.baseurl}}/images/admin-ui-index/forcemerge2.png) - -## Refreshing a data stream - -Refreshing a data stream makes new updates to the index visible to search operations. - -The refresh operation can be applied only to open indexes associated with the specified data streams. - -To refresh a data stream, select the data stream from the **Data streams** list under **Index Management**. Then select **Refresh** from the **Actions** dropdown list. - -## Flushing a data stream - -The flush operation performs a Lucene commit, writing segments to disk and starting a new translog. - -The flush operation can be applied only to open indexes associated with the specified data streams. - -To flush a data stream, select the data stream from the **Data streams** list under **Index Management**. Then select **Flush** from the **Actions** dropdown list. - -## Clearing a data stream cache - -The [clear cache operation]({{site.url}}{{site.baseurl}}/api-reference/index-apis/clear-index-cache/) can be applied only to open indexes associated with the specified data streams. - -To clear a data stream cache, select the index from the **Indices** list under **Index Management**. Then select **Clear cache** from the **Actions** dropdown list. \ No newline at end of file diff --git a/_dashboards/im-dashboards/index-management.md b/_dashboards/im-dashboards/index-management.md index 56d562f81d..4bf62c6394 100644 --- a/_dashboards/im-dashboards/index-management.md +++ b/_dashboards/im-dashboards/index-management.md @@ -125,34 +125,6 @@ To split an index, select the index you want to split from the **Indices** list User interface showing split page -### Refreshing an index - -Refreshing an index makes new updates to the index visible to search operations. - -The refresh operation can be applied only to open indexes. - -To refresh all indexes, select **Refresh** from the **Actions** dropdown list. - -To refresh a particular index, select the index from the **Indices** list under **Index Management**. Then select **Refresh** from the **Actions** dropdown list. - -### Flushing an index - -The flush operation performs a Lucene commit, writing segments to disk and starting a new translog. - -The flush operation can be applied only to open indexes. - -To flush all indexes, select **Flush** from the **Actions** dropdown list. - -To flush a particular index, select the index from the **Indices** list under **Index Management**. Then select **Flush** from the **Actions** dropdown list. - -### Clearing an index cache - -The [clear cache operation]({{site.url}}{{site.baseurl}}/api-reference/index-apis/clear-index-cache/) can be applied only to open indexes. - -To clear all index caches, select **Clear cache** from the **Actions** dropdown list. - -To clear a particular index cache, select the index from the **Indices** list under **Index Management**. Then select **Clear cache** from the **Actions** dropdown list. - ### Deleting an index If you no longer need an index, you can use the [delete index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/delete-index/) operation to delete it. @@ -198,8 +170,6 @@ An alias is a virtual index name that can point to one or more indexes. If your User interface showing Alias page -### Creating an alias - To create an alias, perform the following steps: 1. Choose the **Create Alias** button on the **Aliases** page under **Index Management**. @@ -207,9 +177,7 @@ To create an alias, perform the following steps: 3. Enter the index, or index patterns, to be included in the alias. 4. Choose **Create alias** as shown in the following image. -User interface showing create Alias page - -### Editing an alias +User interface showing creat Alias page To edit an alias, perform the following steps: @@ -217,36 +185,12 @@ To edit an alias, perform the following steps: 2. Choose the **Actions** button. 3. Choose **Edit** from the dropdown list. -### Deleting an alias - To delete an alias, perform the following steps: 1. Select the alias you want to edit. 2. Choose the **Actions** button. 3. Choose **Delete** from the dropdown list. -### Refreshing an alias - -Refreshing an alias makes new updates to the index visible to search operations. - -The refresh operation can be applied only to open indexes associated with the specified aliases. - -To refresh a particular alias, select the alias from the **Aliases** list under **Index Management**. Then select **Refresh** from the **Actions** dropdown list. - -### Flushing an alias - -The flush operation performs a Lucene commit, writing segments to disk and starting a new translog. - -The flush operation can be applied only to open indexes associated with the specified aliases. - -To flush an alias, select the alias from the **Aliases** list under **Index Management**. Then select **Flush** from the **Actions** dropdown list. - -### Clearing an alias cache - -The [clear cache operation]({{site.url}}{{site.baseurl}}/api-reference/index-apis/clear-index-cache/) can be applied only to open indexes associated with the specified aliases. - -To clear an alias cache, select the alias from the **Aliases** list under **Index Management**. Then select **Clear cache** from the **Actions** dropdown list. - ## Rollup jobs The **Rollup Jobs** section under **Index Management** allows you to create or update index rollup jobs. diff --git a/_dashboards/im-dashboards/index.md b/_dashboards/im-dashboards/index.md index 0accb122a5..03d3688e8d 100644 --- a/_dashboards/im-dashboards/index.md +++ b/_dashboards/im-dashboards/index.md @@ -11,30 +11,6 @@ redirect_from: Introduced 2.5 {: .label .label-purple } -The Index Management interface in OpenSearch Dashboards provides a unified solution for managing common indexing and data stream operations. The interface allows you to perform create, read, update, and delete (CRUD) and mapping operations for indexes, index templates, and aliases instead of using REST APIs or YAML configurations for basic administrative operations and interventions, along with other operations such as open, close, reindex, shrink, and split indexes. The interface also provides you with the capabilities to run index status and data validation before submitting requests and compare changes with previously saved settings before making updates. +Previously, users relied on REST APIs or YAML configurations for basic administrative operations and interventions. This release takes the first step toward a unified administration panel in OpenSearch Dashboards with the launch of several index management UI enhancements. The new interface provides a more user-friendly way to run common indexing and data stream operations. Now you can perform create, read, update, and delete (CRUD) and mapping operations for indexes, index templates, and aliases through the UI. Additionally, you can open, close, reindex, shrink, and split indexes. The UI runs index status and data validation before submitting requests and lets you compare changes with previously saved settings before making updates. -## Get started with index management using Dashboards - -**Step 1: Open Index Management** -Once you're in OpenSearch Dashboards, select **Index Management** from the **OpenSearch Plugins** main menu. Then select **Indices**. - -**Step 2: View indexes** -In the Indices interface you will see a list of existing indexes in your OpenSearch cluster. The list provides information such as index name, health state, document count, index size, and other relevant details. - -**Step 3: Create an index** -To create a new index, select the **Create index** button in the upper-right corner. You will be prompted to enter the index name and configure the index settings, such as number of shards and replicas. Fill in the required information and select **Create** to create the index. - -**Step 4: Delete an index** -To delete an index, locate the index and select the checkbox next to it. Then select the **Actions** button and choose **Delete** from the dropdown list. Use caution when deleting indexes because this action is irreversible. - -**Step 5: Modify an index** -To modify the settings of an existing index, locate the index in the list and select its name. This takes you to the index details page. Here you can update settings such as the numbers of shards, replicas, and other advanced configurations. After making the desired changes, select **Save**. - -**Step 7: Refresh indexes** -To refresh an index, locate the index and select the checkbox next to it. Then select the **Actions** button and choose **Refresh** from the dropdown list. - -**Step 8: Filter and search indexes** -If you have a large number of indexes and want to filter or search for specific indexes, you can use the search bar located above the list of indexes. Enter the relevant keywords or filters to narrow the list of indexes. - -**Step 9: Additional operations** -Index Management provides additional functionalities such as creating index patterns, managing lifecycle policies, and configuring index templates. These options are available in their respective sections of the Index Management interface. +Index management demo gif{: .img-fluid} \ No newline at end of file diff --git a/_dashboards/im-dashboards/notifications.md b/_dashboards/im-dashboards/notifications.md deleted file mode 100644 index 85187f3d69..0000000000 --- a/_dashboards/im-dashboards/notifications.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -layout: default -title: Notification settings -parent: Index management in Dashboards -nav_order: 60 ---- - -# Notification settings - -You can configure global default notification settings for index operations on the **Notification settings** page. You can also configure additional notification settings for individual index operations. - -## Configuring default notification settings - -In the **Notification settings** interface, you can configure the default notification settings for the following index operations that may take longer to complete: - -- Open -- Reindex -- Split -- Shrink -- Clone -- Force merge - -To get started, from the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Index Management**. Under **Index Management**, select **Notification settings**. - -You can choose to be notified when the operation has completed or failed. Additionally, you can select the notification channels for this notification, as shown in the following image. - -![Default notification settings]({{site.url}}{{site.baseurl}}/images/admin-ui-index/notifications.png) - -If you don't have permission to view notification settings, you cannot view the default settings. -{: .note} - -## Configuring notification settings for an individual operation - -You can view default notification settings when you perform an indexing operation as well as set up additional notifications. For example, if you want to configure an additional notification for a reindex operation, perform the following steps: - -1. Select **OpenSearch Plugins** > **Index Management**. - -1. In the **Index Management** interface, select **Indices**. - -1. Select the index you want to reindex. - -1. Select **Reindex** from the **Actions** dropdown list. - -1. After selecting all reindex options, expand **Advanced settings**. Under **Notifications**, default notifications are listed. - - If you don't have permission to view notification settings, you will not be able to view the default settings. - {: .note} - -1. To receive additional notifications, select **Send additional notifications**, as shown in the following image. - - ![Individual notification settings]({{site.url}}{{site.baseurl}}/images/admin-ui-index/notifications-individual.png) - -1. Select whether you want to be notified when the operation has failed or completed. - -1. Select a channel from the **Notification channels** dropdown list. If you want to configure a new notification channel, select **Manage channels**. - - To configure a new notification channel, confirm that the `dashboards-notification` plugin is enabled in OpenSearch Dashboards. - {: .note} - -1. Select the **Reindex** button. diff --git a/_reporting/rep-cli-create.md b/_dashboards/reporting-cli/rep-cli-create.md similarity index 90% rename from _reporting/rep-cli-create.md rename to _dashboards/reporting-cli/rep-cli-create.md index ebff432467..795903425a 100644 --- a/_reporting/rep-cli-create.md +++ b/_dashboards/reporting-cli/rep-cli-create.md @@ -1,14 +1,12 @@ --- layout: default -title: Create and request visualization reports +title: Creating and requesting a visualization report nav_order: 15 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-create/ +parent: Creating reports with the Reporting CLI + --- -# Create and request visualization reports +# Creating and requesting a visualization report First, you need to get the URL for the visualization that you want to download as an image file or PDF. diff --git a/_reporting/rep-cli-cron.md b/_dashboards/reporting-cli/rep-cli-cron.md similarity index 85% rename from _reporting/rep-cli-cron.md rename to _dashboards/reporting-cli/rep-cli-cron.md index e83bb75beb..f4115e5473 100644 --- a/_reporting/rep-cli-cron.md +++ b/_dashboards/reporting-cli/rep-cli-cron.md @@ -1,14 +1,12 @@ --- layout: default -title: Schedule reports with the cron utility +title: Scheduling reports with the cron utility nav_order: 20 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-cron/ +parent: Creating reports with the Reporting CLI + --- -# Schedule reports with the cron utility +# Scheduling reports with the cron utility You can use the cron command-line utility to initiate a report request with the Reporting CLI that runs periodically at any date or time interval. Follow the cron expression syntax to specify the date and time that precedes the command that you want to initiate. diff --git a/_reporting/rep-cli-env-var.md b/_dashboards/reporting-cli/rep-cli-env-var.md similarity index 92% rename from _reporting/rep-cli-env-var.md rename to _dashboards/reporting-cli/rep-cli-env-var.md index a4e079501d..90aa0f6924 100644 --- a/_reporting/rep-cli-env-var.md +++ b/_dashboards/reporting-cli/rep-cli-env-var.md @@ -1,14 +1,12 @@ --- layout: default -title: Use environment variables with the Reporting CLI +title: Using environment variables with the Reporting CLI nav_order: 35 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-env-var/ +parent: Creating reports with the Reporting CLI + --- -# Use environment variables with the Reporting CLI +# Using environment variables with the Reporting CLI Instead of explicitly providing values in the command line, you can save them as environment variables. The Reporting CLI reads environment variables from the current directory inside the project. @@ -101,4 +99,4 @@ The following limitations apply to environment variable usage with the Reporting ## Troubleshooting -To resolve **MessageRejected: Email address is not verified**, see [Why am I getting a 400 "message rejected" error with the message "Email address is not verified" from Amazon SES?](https://repost.aws/knowledge-center/ses-554-400-message-rejected-error) in the AWS Knowledge Center. \ No newline at end of file +To resolve **MessageRejected: Email address is not verified**, see [Why am I getting a 400 "message rejected" error with the message "Email address is not verified" from Amazon SES?](https://aws.amazon.com/premiumsupport/knowledge-center/ses-554-400-message-rejected-error/) in the AWS Knowledge Center. \ No newline at end of file diff --git a/_reporting/rep-cli-index.md b/_dashboards/reporting-cli/rep-cli-index.md similarity index 94% rename from _reporting/rep-cli-index.md rename to _dashboards/reporting-cli/rep-cli-index.md index b7620def9d..7a43b78a42 100644 --- a/_reporting/rep-cli-index.md +++ b/_dashboards/reporting-cli/rep-cli-index.md @@ -1,13 +1,11 @@ --- layout: default -title: Reporting using the CLI -nav_order: 10 +title: Creating reports with the Reporting CLI +nav_order: 75 has_children: true -redirect_from: - - /dashboards/reporting-cli/rep-cli-index/ --- -# Reporting using the CLI +# Creating reports with the Reporting CLI You can programmatically create dashboard reports in PDF or PNG format with the Reporting CLI without using OpenSearch Dashboards or the Reporting plugin. This allows you to create reports automatically within your email workflows. diff --git a/_reporting/rep-cli-install.md b/_dashboards/reporting-cli/rep-cli-install.md similarity index 86% rename from _reporting/rep-cli-install.md rename to _dashboards/reporting-cli/rep-cli-install.md index b50ecb5b92..a83cdc5c6c 100644 --- a/_reporting/rep-cli-install.md +++ b/_dashboards/reporting-cli/rep-cli-install.md @@ -1,14 +1,12 @@ --- layout: default -title: Download and install the Reporting CLI tool +title: Downloading and installing the Reporting CLI tool nav_order: 10 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-install/ +parent: Creating reports with the Reporting CLI + --- -# Download and install the Reporting CLI tool +# Downloading and installing the Reporting CLI tool You can download and install the Reporting CLI tool from either the npm software registry or the OpenSearch.org [Artifacts](https://opensearch.org/artifacts) hub. Refer to the following sections for instructions. diff --git a/_reporting/rep-cli-lambda.md b/_dashboards/reporting-cli/rep-cli-lambda.md similarity index 98% rename from _reporting/rep-cli-lambda.md rename to _dashboards/reporting-cli/rep-cli-lambda.md index 4d5dbc10fb..47b9507ced 100644 --- a/_reporting/rep-cli-lambda.md +++ b/_dashboards/reporting-cli/rep-cli-lambda.md @@ -1,11 +1,9 @@ --- layout: default -title: Schedule reports with AWS Lambda +title: Scheduling reports with AWS Lambda nav_order: 30 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-lambda/ +parent: Creating reports with the Reporting CLI + --- # Scheduling reports with AWS Lambda diff --git a/_reporting/rep-cli-options.md b/_dashboards/reporting-cli/rep-cli-options.md similarity index 96% rename from _reporting/rep-cli-options.md rename to _dashboards/reporting-cli/rep-cli-options.md index 9b7d016eab..3631b0de30 100644 --- a/_reporting/rep-cli-options.md +++ b/_dashboards/reporting-cli/rep-cli-options.md @@ -2,10 +2,8 @@ layout: default title: Reporting CLI options nav_order: 30 -parent: Reporting using the CLI -grand_parent: Reporting -redirect_from: - - /dashboards/reporting-cli/rep-cli-options/ +parent: Creating reports with the Reporting CLI + --- # Reporting CLI options diff --git a/_reporting/report-dashboard-index.md b/_dashboards/reporting.md similarity index 93% rename from _reporting/report-dashboard-index.md rename to _dashboards/reporting.md index 0df87a965c..1778816e36 100644 --- a/_reporting/report-dashboard-index.md +++ b/_dashboards/reporting.md @@ -1,20 +1,18 @@ --- layout: default -title: Reporting using OpenSearch Dashboards -nav_order: 5 -redirect_from: - - /dashboards/reporting/ +title: Creating reports with the Dashboards interface +nav_order: 70 --- -# Reporting using OpenSearch Dashboards +# Creating reports with the Dashboards interface You can use OpenSearch Dashboards to create PNG, PDF, and CSV reports. To create reports, you must have the correct permissions. For a summary of the predefined roles and the permissions they grant, see the [Security plugin]({{site.url}}{{site.baseurl}}/security/access-control/users-roles#predefined-roles). CSV reports have a non-configurable 10,000 row limit. They have no explicit size limit (for example, MB), but extremely large documents could cause report generation to fail with an out of memory error from the V8 JavaScript engine. {: .tip } -## Generating reports +## Generating reports with the interface To generate a report from the interface: @@ -46,8 +44,6 @@ Definitions let you generate reports on a periodic schedule. ## Troubleshooting -You can use the following topics to troubleshoot and resolve issues with reporting. - ### Chromium fails to launch with OpenSearch Dashboards While creating a report for dashboards or visualizations, you might see a the following error: diff --git a/_dashboards/sm-dashboards.md b/_dashboards/sm-dashboards.md index fb0c0cbf79..b7860b925a 100644 --- a/_dashboards/sm-dashboards.md +++ b/_dashboards/sm-dashboards.md @@ -6,9 +6,11 @@ redirect_from: - /dashboards/admin-ui-index/sm-dashboards/ --- -# Snapshot Management in Dashboards +# Snapshot management -[Snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/index/) are backups of a cluster’s indexes and state. The state includes cluster settings, node information, index metadata (mappings, settings, templates), and shard allocation. The Snapshot Management (SM) interface in OpenSearch Dashboards provides a unified solution for taking and restoring snapshots. +You can set up Snapshot Management (SM) in OpenSearch Dashboards. + +[Snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/index/) are backups of a cluster’s indexes and state. The state includes cluster settings, node information, index metadata (mappings, settings, templates), and shard allocation. Snapshots have two main uses: @@ -20,11 +22,15 @@ Snapshots have two main uses: For example, if you’re moving from a proof of concept to a production cluster, you might take a snapshot of the former and restore it on the latter. +You can take and restore snapshots using snapshot management in OpenSearch Dashboards. + +If you need to automate snapshots creation, you can use a snapshot policy. + ## Creating a repository -Before you create an SM policy, set up a repository for snapshots. +Before you create an SM policy, you need to set up a repository for snapshots. -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Repositories**. 1. Choose the **Create Repository** button. 1. Enter the repository name, type, and location. @@ -40,9 +46,6 @@ Before you create an SM policy, set up a repository for snapshots. ``` 1. Choose the **Add** button. -If you need to automate snapshot creation, you can use a snapshot policy. -{: .note} - ## Deleting a repository To delete a snapshot repository configuration, select the repository from the **Repositories** list and then choose the **Delete** button. @@ -51,7 +54,7 @@ To delete a snapshot repository configuration, select the repository from the ** Create an SM policy to set up automatic snapshots. An SM policy defines an automated snapshot creation schedule and an optional automated deletion schedule. -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**. 1. Select the **Create Policy** button. 1. In the **Policy settings** section: @@ -77,7 +80,7 @@ Create an SM policy to set up automatic snapshots. An SM policy defines an autom You can view, edit, or delete an SM policy on the policy details page. -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**. 1. Click on the **Policy name** of the policy you want to view, edit, or delete.
The policy settings, snapshot schedule, snapshot retention period, notifications, and last creation and deletion are displayed in the policy details page.
If a snapshot creation or deletion fails, you can view information about the failure in the **Last Creation/Deletion** section. To view the failure message, click on the **cause** in the **Info** column. @@ -85,23 +88,23 @@ The policy settings, snapshot schedule, snapshot retention period, notifications ## Enable, disable, or delete SM policies -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshot Policies**. 1. Select one or more policies in the list. 1. To enable or disable selected SM policies, select the **Enable** or **Disable** button. To delete selected SM policies, in the **Actions** list, select the **Delete** option. ## View snapshots -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshots**. All automatically or manually taken snapshots appear in the list. 1. To view a snapshot, click on its **Name**. ## Take a snapshot -Follow these steps to take a snapshot manually: +Use the steps below to take a snapshot manually: -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshots**. 1. Select the **Take snapshot** button. 1. Enter the snapshot name. @@ -122,7 +125,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps ## Restoring a snapshot -1. From the OpenSearch Dashboards main menu, select **OpenSearch Plugins** > **Snapshot Management**. +1. On the top menu bar, go to **OpenSearch Plugins > Snapshot Management**. 1. In the left panel, under **Snapshot Management**, select **Snapshots**. The **Snapshots** tab is selected by default. 1. Select the checkbox next to the snapshot you want to restore, as shown in the following image: Snapshots{: .img-fluid} @@ -151,7 +154,7 @@ The **Delete** button [deletes]({{site.url}}{{site.baseurl}}/api-reference/snaps Custom settings - For more information about index settings, see [Index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/). + For more information about index settings, see [Index settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-settings). For a list of settings that you cannot change or ignore, see [Restore snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#restore-snapshots). diff --git a/_dashboards/visualize/selfhost-maps-server.md b/_dashboards/visualize/selfhost-maps-server.md index 925c5449fe..9b41608989 100644 --- a/_dashboards/visualize/selfhost-maps-server.md +++ b/_dashboards/visualize/selfhost-maps-server.md @@ -21,7 +21,7 @@ You can access the `maps-server` image via the official OpenSearch [Docker Hub r Open your terminal and run the following command: -`docker pull opensearchproject/opensearch-maps-server:1.0.0` +`docker pull opensearch/opensearch-maps-server` ## Setting up the server @@ -96,7 +96,7 @@ Configure the manifest URL in `opensearch_dashboards.yml`: ### Option 2: Configure Default WMS properties in OpenSearch Dashboards -1. On the OpenSearch Dashboards console, select **Dashboards Management** > **Advanced Settings**. +1. On the OpenSearch Dashboards console, select **Stack Management > Advanced Settings**. 2. Locate `visualization:tileMap:WMSdefaults` under **Default WMS properties**. 3. Change `"enabled": false` to `"enabled": true` and add the URL for the valid map server. @@ -107,4 +107,4 @@ Tiles are generated per [Terms of Use for Natural Earth vector map data](https:/ ## Related articles * [Configuring a Web Map Service (WMS)]({{site.url}}{{site.baseurl}}/dashboards/visualize/maptiles/) -* [Using coordinate and region maps]({{site.url}}{{site.baseurl}}/dashboards/visualize/geojson-regionmaps/) +* [Using coordinate and region maps]({{site.url}}{{site.baseurl}}/dashboards/visualize/geojson-regionmaps/) \ No newline at end of file diff --git a/_dashboards/visualize/visbuilder.md b/_dashboards/visualize/visbuilder.md index 7b32e818f5..677768eeac 100644 --- a/_dashboards/visualize/visbuilder.md +++ b/_dashboards/visualize/visbuilder.md @@ -41,7 +41,7 @@ Follow these steps to create a new visualization using VisBuilder in your enviro - If you're running the Security plugin, go to https://localhost:5601 and log in with your username and password (default is admin/admin). 2. Confirm that the **Enable experimental visualizations** option is turned on. - - From the top menu, select **Management** > **Dashboards Management** > **Advanced Settings**. + - From the top menu, select **Management** **>** **Stack Management** **>** **Advanced Settings**. - Select **Visualization** and verify that the option is turned on. Enable experimental visualizations diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index b27ba8e49d..d4f26610d0 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -5,12 +5,11 @@ parent: Managing Data Prepper nav_order: 5 redirect_from: - /clients/data-prepper/data-prepper-reference/ - - /monitoring-plugins/trace/data-prepper-reference/ --- # Configuring Data Prepper -You can customize your Data Prepper configuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options. +You can customize your Data Prepper confiuration by editing the `data-prepper-config.yaml` file in your Data Prepper installation. The following configuration options are independent from pipeline configuration options. ## Data Prepper configuration diff --git a/_data-prepper/managing-data-prepper/source-coordination.md b/_data-prepper/managing-data-prepper/source-coordination.md deleted file mode 100644 index 3c60b45280..0000000000 --- a/_data-prepper/managing-data-prepper/source-coordination.md +++ /dev/null @@ -1,148 +0,0 @@ ---- -layout: default -title: Source coordination -nav_order: 35 -parent: Managing Data Prepper ---- - -# Source coordination - -_Source coordination_ is the concept of coordinating and distributing work between Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. - -Data Prepper source coordination decides which partition of work is performed by each node in the Data Prepper cluster and prevents duplicate partitions of work. - -Inspired by the [Kinesis Client Library](https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html), Data Prepper utilizes a distributed store in the form of a lease to handle the distribution and deduplication of work. - -## Formatting partitions - -Source coordination separates sources into "partitions of work." For example, an S3 object would be a partition of work for Amazon S3, or an OpenSearch index would be a partition of work for OpenSearch. - -Data Prepper takes each partition of work that is chosen by the source and creates corresponding items in the distributed store that Data Prepper uses for source coordination. Each of these items has the following standard format, which can be extended by the distributed store implementation. - -| Value | Type | Description | -| :--- | :--- | :--- | -| `sourceIdentifier` | String | The identifier for which the Data Prepper pipeline works on this partition. By default, the `sourceIdentifier` is prefixed by the sub-pipeline name, but an additional prefix can be configured with `partition_prefix` in your data-prepper-config.yaml file. | -| `sourcePartitionKey` | String | The identifier for the partition of work associated with this item. For example, for an `s3` source with scan capabilities, this identifier is the S3 bucket's `objectKey` combination. -| `partitionOwner` | String | An identifier for the node that actively owns and is working on this partition. This ID contains the hostname of the node but is `null` when this partition is not owned. | -| `partitionProgressState` | String | A JSON string object representing the progress made on a partition of work or any additional metadata that may be needed by the source in the case of another node resuming where the last node stopped during a crash. | -| `partitionOwnershipTimeout` | Timestamp | Whenever a Data Prepper node acquires a partition, a 10-minute timeout is given to the owner of the partition to handle the event of a node crashing. The ownership is renewed with another 10 minutes when the owner saves the state of the partition. | -| `sourcePartitionStatus` | Enum | Represents the current state of the partition: `ASSIGNED` means the partition is currently being processed, `UNASSIGNED` means the partition is waiting to be processed, `CLOSED` means the partition is waiting to be processed at a later date, and `COMPLETED` means the partition has already been processed. | -| `reOpenAt` | Timestamp | Represents the time at which CLOSED partitions reopen and are considered to be available for processing. Only applies to CLOSED partitions. | -| `closedCount` | Long | Tracks how many times the partition has been marked as `CLOSED`.| - - -## Acquiring partitions - -Partitions are acquired in the order that they are returned in the `List` provided by the source. When a node attempts to acquire a partition, Data Prepper performs the following steps: - -1. Data Prepper queries the `ASSIGNED` partitions to check whether any `ASSIGNED` partitions have expired partition owners. This is intended to assign priority to partitions that have had nodes crash in the middle of processing, which can allow for using a partition state that may be time sensitive. -2. After querying `ASSIGNED` partitions, Data Prepper queries the `CLOSED` partitions to determine whether any of the partition's `reOpenAt` timestamps have been reached. -3. If there are no `ASSIGNED` or `CLOSED` partitions available, then Data Prepper queries the `UNASSIGNED` partitions until on of these partitions is `ASSIGNED`. - -If this flow occurs and no partition is acquired by the node, then the partition supplier function provided in the `getNextPartition` method of `SourceCoordinator` will create new partitions. After the supplier function completes, Data Prepper again queries the partitions for `ASSIGNED`, `CLOSED`, and `UNASSIGNED`. - -## Global state - -Any function that is passed to the `getNextPartition` method creates new partitions with a global state of `Map`. This state is shared between all of the nodes in the cluster and will only be run by a single node at a time, as determined by the source. - -## Configuration - -The following table provide optional configuration values for `source_coordination`. - -| Value | Type | Description | -| :--- | :--- | :--- | -| `partition_prefix` | String | A prefix to the `sourceIdentifier` used to differentiate between Data Prepper clusters that share the same distributed store. | -| `store` | Object | The object that comprises the configuration for the store to be used, where the key is the name of the store, such as `in_memory` or `dynamodb`, and the value is any configuration available on that store type. | - -### Supported stores -As of Data Prepper 2.4, only `in_memory` and `dynamodb` stores are supported: - -- The `in_memory` store is the -default when no `source_coordination` settings are configured in the `data-prepper-config.yaml` file and should only be used for single-node configurations. -- The `dynamodb` store is used for multi-node Data Prepper environments. The `dynamodb` store can be shared between one or more Data Prepper clusters that need to utilize source coordination. - -#### DynamoDB store - -Data Prepper will attempt to create the `dynamodb` table on startup unless the `skip_table_creation` flag is configured to `true`. Optionally, you can configure the [time-to-live](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) (`ttl`) on the table, which results in the store cleaning up items over time. Some sources rely on source coordination for the deduplication of data, so be sure to configure a large enough `ttl` for the pipeline duration. - -If `ttl` is not configured on the table, any items no longer needed in the table must be cleaned manually. - -The following shows the full set of permissions needed for Data Prepper to create the table, enable `ttl`, and interact with the table: - -```json -{ - "Sid": "ReadWriteSourceCoordinationDynamoStore", - "Effect": "Allow", - "Action": [ - "dynamodb:DescribeTimeToLive", - "dynamodb:UpdateTimeToLive", - "dynamodb:DescribeTable", - "dynamodb:CreateTable", - "dynamodb:GetItem", - "dynamodb:PutItem", - "dynamodb:Query" - ], - "Resource": [ - "arn:aws:dynamodb:${REGION}:${AWS_ACCOUNT_ID}:table/${TABLE_NAME}", - "arn:aws:dynamodb:${REGION}:${AWS_ACCOUNT_ID}:table/${TABLE_NAME}/index/source-status" - ] -} -``` - - -| Value | Required | Type | Description | -| :--- | :--- | :--- | :--- | -| `table_name` | Yes | String | The name of the table to be used for source coordination. | -| `region` | Yes | String | The region of the DynamoDB table. | -| `sts_role_arn` | No | String | The `sts` role that contains the table permissions. Uses default credentials when not provided. | -| `sts_external_id` | No | String | The external ID used in the API call to assume the `sts_role_arn`. | -| `skip_table_creation` | No | Boolean | If set to `true` when using an existing store, the attempt to create the store is skipped. Default is `false`. | -| `provisioned_write_capacity_units` | No | Integer | The number of write capacity units to configure on the table. Default is `10`. | -| `provisioned_read_capacity_units` | No | Integer | The number of read capacity units to configure on the table. Default is `10`. | -| `ttl` | Duration | Optional. The duration of the TTL for the items in the table. The TTL is extended by this duration when an update is made to the item. Defaults to no TTL being used on the table. | - -The following example shows a `dynamodb` store: - -```yaml -source_coordination: - store: - dynamodb: - table_name: "DataPrepperSourceCoordinationStore" - region: "us-east-1" - sts_role_arn: "arn:aws:iam::##########:role/SourceCoordinationTableRole" - ttl: "P7D" - skip_table_creation: true -``` - -#### In-memory store (default) - -The following example shows an `in_memory` store, which is best used with a single-node cluster: - - -```yaml -source_coordination: - store: - in_memory: -``` - - -## Metrics - -Source coordination metrics are interpreted differently depending on which source is configured. The format of a source coordination metric is `_source_coordinator_`. You can use the sub-pipeline name to identify the source for these metrics because each sub-pipeline is unique to each source. - -### Progress metrics - -The following are metrics related to partition progress: - -* `partitionsCreatedCount`: The number of partition items that have been created. For an S3 scan, this is the number of objects that have had partitions created for them. -* `partitionsCompleted`: The number of partitions that have been fully processed and marked as `COMPLETED`. For an S3 scan, this is the number of objects that have been processed. -* `noPartitionsAcquired`: The number of times that a node has attempted to acquire a partition on which to perform work but has found no available partitions in the store. Use this to indicate that there is no more data coming into the source. -* `partitionsAcquired`: The number of partitions that have been acquired by nodes on which to perform work. In non-error scenarios, this should be equal to the number of partitions created. -* `partitionsClosed`: The number of partitions that have been marked as `CLOSED`. This is only applicable to sources that use the CLOSED functionality. - -The following are metrics related to partition errors: - -* `partitionNotFoundErrors`: Indicates that a partition item that is actively owned by a node does not have a corresponding store item. This should only occur if an item in the table has been manually deleted. -* `partitionNotOwnedErrors`: Indicates that a node that owns a partition has lost ownership due to the partition ownership timeout expiring. Unless the source is able to checkpoint the partition with `saveState`, this error results in duplicate item processing. -* `partitionUpdateErrors`: The number of errors that were received when an update to the store for this partition item failed. Is prefixed with either `saveState`, `close`, or `complete` to indicate which update action is failing. - diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md deleted file mode 100644 index 4c33d8baab..0000000000 --- a/_data-prepper/pipelines/configuration/processors/obfuscate.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -layout: default -title: obfuscate -parent: Processors -grand_parent: Pipelines -nav_order: 71 ---- - -# obfuscate - -The `obfuscate` process enables obfuscation of fields inside your documents in order to protect sensitive data. - -## Usage - -In this example, a document contains a `log` field and a `phone` field, as shown in the following object: - -```json -{ - "id": 1, - "phone": "(555) 555 5555", - "log": "My name is Bob and my email address is abc@example.com" -} -``` - - -To obfuscate the `log` and `phone` fields, add the `obfuscate` processor and call each field in the `source` option. To account for both the `log` and `phone` fields, the following example uses multiple `obfuscate` processors because each processor can only obfuscate one source. - -In the first `obfuscate` processor in the pipeline, the source `log` uses several configuration options to mask the data in the log field, as shown in the following example. For more details on these options, see [configuration](#configuration). - -```yaml -pipeline: - source: - http: - processor: - - obfuscate: - source: "log" - target: "new_log" - patterns: - - "[A-Za-z0-9+_.-]+@([\\w-]+\\.)+[\\w-]{2,4}" - action: - mask: - mask_character: "#" - mask_character_length: 6 - - obfuscate: - source: "phone" - sink: - - stdout: -``` - -When run, the `obfuscate` processor parses the fields into the following output: - -```json -{ - "id": 1, - "phone": "***", - "log": "My name is Bob and my email address is abc@example.com", - "newLog": "My name is Bob and my email address is ######" -} -``` - -## Configuration - -Use the following configuration options with the `obfuscate` processor. - -| Parameter | Required | Description | -| :--- | :--- | :--- | -| `source` | Yes | The source field to obfuscate. | -| `target` | No | The new field in which to store the obfuscated value. This leaves the original source field unchanged. When no `target` is provided, the source field updates with the obfuscated value. | -| `patterns` | No | A list of regex patterns that allow you to obfuscate specific parts of a field. Only parts that match the regex pattern will obfuscate. When not provided, the processor obfuscates the whole field. | -| `action` | No | The obfuscation action. As of Data Prepper 2.3, only the `mask` action is supported. | - -You can customize the `mask` action with the following optional configuration options. - -| Parameter | Default | Description | -| :--- | :--- | :--- | -`mask_character` | `*` | The character to use when masking. Valid characters are !, #, $, %, &, *, and @. | -`mask_character_length` | `3` | The number of characters to mask in the field. The value must be between 1 and 10. | - - -## Predefined patterns - -When using the `patterns` configuration option, you can use a set of predefined obfuscation patterns for common fields. The `obfuscate` processor supports the following predefined patterns. - -You cannot use multiple patterns for one obfuscate processor. Use one pattern for each obfuscate processor. -{: .note} - - -| Pattern name | Examples | -|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| %{EMAIL_ADDRESS} | abc@test.com
123@test.com
abc123@test.com
abc_123@test.com
a-b@test.com
a.b@test.com
abc@test-test.com
abc@test.com.cn
abc@test.mail.com.org | -| %{IP_ADDRESS_V4} | 1.1.1.1
192.168.1.1
255.255.255.0 | -| %{BASE_NUMBER} | 1.1
.1
2000 | -| %{CREDIT_CARD_NUMBER} | 5555555555554444
4111111111111111
1234567890123456
1234 5678 9012 3456
1234-5678-9012-3456 | -| %{US_PHONE_NUMBER} | 1555 555 5555
5555555555
1-555-555-5555
1-(555)-555-5555
1(555) 555 5555
(555) 555 5555
+1-555-555-5555
| -| %{US_SSN_NUMBER} | 123-11-1234 diff --git a/_data-prepper/pipelines/configuration/processors/otel-trace-group.md b/_data-prepper/pipelines/configuration/processors/otel-trace-group.md index 06bc754a98..e38e850584 100644 --- a/_data-prepper/pipelines/configuration/processors/otel-trace-group.md +++ b/_data-prepper/pipelines/configuration/processors/otel-trace-group.md @@ -3,7 +3,7 @@ layout: default title: otel_trace_group parent: Processors grand_parent: Pipelines -nav_order: 73 +nav_order: 45 --- # otel_trace_group diff --git a/_data-prepper/pipelines/configuration/processors/user-agent.md b/_data-prepper/pipelines/configuration/processors/user-agent.md deleted file mode 100644 index 8d2592a596..0000000000 --- a/_data-prepper/pipelines/configuration/processors/user-agent.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -layout: default -title: user_agent -parent: Processors -grand_parent: Pipelines -nav_order: 130 ---- - -# user_agent - -The `user_agent` processor parses any user agent (UA) string in an event and then adds the parsing results to the event's write data. - -## Usage - -In this example, the `user_agent` processor calls the source that contains the UA string, the `ua` field, and indicates the key to which the parsed string will write, `user_agent`, as shown in the following example: - -```yaml - processor: - - user_agent: - source: "ua" - target: "user_agent" -``` - -The following example event contains the `ua` field with a string that provides information about a user: - -```json -{ - "ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1" -} -``` - -The `user_agent` processor parses the string into a format compatible with Elastic Common Schema (ECS) and then adds the result to the specified target, as shown in the following example: - -```json -{ - "user_agent": { - "original": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1", - "os": { - "version": "13.5.1", - "full": "iOS 13.5.1", - "name": "iOS" - }, - "name": "Mobile Safari", - "version": "13.1.1", - "device": { - "name": "iPhone" - } - }, - "ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1" -} -``` - -## Configuration options - -You can use the following configuration options with the `user_agent` processor. - -| Option | Required | Description | -| :--- | :--- | :--- | -| `source` | Yes | The field in the event that will be parsed. -| `target` | No | The field to which the parsed event will write. Default is `user_agent`. -| `exclude_original` | No | Determines whether to exclude the original UA string from the parsing result. Defaults to `false`. -| `cache_size` | No | The cache size of the parser in megabytes. Defaults to `1000`. | -| `tags_on_parse_failure` | No | The tag to add to an event if the `user_agent` processor fails to parse the UA string. | diff --git a/_data-prepper/pipelines/configuration/sinks/file.md b/_data-prepper/pipelines/configuration/sinks/file.md index 74af5a1803..05b2dd6ff1 100644 --- a/_data-prepper/pipelines/configuration/sinks/file.md +++ b/_data-prepper/pipelines/configuration/sinks/file.md @@ -1,31 +1,25 @@ --- layout: default -title: file +title: file sink parent: Sinks grand_parent: Pipelines nav_order: 45 --- -# file +# file sink -Use the `file` sink to create a flat file output, usually a `.log` file. +## Overview -## Configuration options - -The following table describes options you can configure for the `file` sink. +You can use the `file` sink to create a flat file output. The following table describes options you can configure for the `file` sink. Option | Required | Type | Description :--- | :--- | :--- | :--- path | Yes | String | Path for the output file (e.g. `logs/my-transformed-log.log`). -## Usage + \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md index 8da02bd41b..0990e5f7dc 100644 --- a/_data-prepper/pipelines/configuration/sinks/opensearch.md +++ b/_data-prepper/pipelines/configuration/sinks/opensearch.md @@ -1,12 +1,12 @@ --- layout: default -title: opensearch +title: OpenSearch sink parent: Sinks grand_parent: Pipelines -nav_order: 50 +nav_order: 45 --- -# opensearch +# OpenSearch sink You can use the `opensearch` sink plugin to send data to an OpenSearch cluster, a legacy Elasticsearch cluster, or an Amazon OpenSearch Service domain. @@ -66,8 +66,7 @@ insecure | No | Boolean | Whether or not to verify SSL certificates. If set to t proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "<host name or IP>:<port>". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted. index | Conditionally | String | Name of the export index. Applicable and required only when the `index_type` is `custom`. index_type | No | String | This index type tells the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, `management-disabled`. Default value is `custom`. -template_type | No | String | Defines what type of OpenSearch template to use. The available options are `v1` and `index-template`. The default value is `v1`, which uses the original OpenSearch templates available at the `_template` API endpoints. The `index-template` option uses composable [index templates]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) which are available through OpenSearch's `_index_template` API. Composable index types offer more flexibility than the default and are necessary when an OpenSearch cluster has already existing index templates. Composable templates are available for all versions of OpenSearch and some later versions of Elasticsearch. When `distribution_version` is set to `es6`, Data Prepper enforces the `template_type` as `v1`. -template_file | No | String | The path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file such as `/your/local/template-file.json` when `index_type` is set to `custom`. For an example template file, see [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json). If you supply a template file it must match the template format specified by the `template_type` parameter. +template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (for example, `/your/local/template-file.json`) if `index_type` is `custom`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example. document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (for example, `"my-field"`) if `index_type` is `custom`. dlq_file | No | String | The path to your preferred dead letter queue file (for example, `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster. dlq | No | N/A | DLQ configurations. See [Dead Letter Queues]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/dlq/) for details. If the `dlq_file` option is also available, the sink will fail. @@ -75,8 +74,6 @@ bulk_size | No | Integer (long) | The maximum size (in MiB) of bulk requests sen ism_policy_file | No | String | The absolute file path for an ISM (Index State Management) policy JSON file. This policy file is effective only when there is no built-in policy file for the index type. For example, `custom` index type is currently the only one without a built-in policy file, thus it would use the policy file here if it's provided through this parameter. For more information, see [ISM policies]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/). number_of_shards | No | Integer | The number of primary shards that an index should have on the destination OpenSearch server. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/). -distribution_version | No | String | Indicates whether the sink backend version is Elasticsearch 6 or later. `es6` represents Elasticsearch 6. `default` represents the latest compatible backend version, such as Elasticsearch 7.x, OpenSearch 1.x, or OpenSearch 2.x. Default is `default`. -enable_request_compression | No | Boolean | Whether to enable compression when sending requests to OpenSearch. When `distribution_version` is set to `es6`, default is `false`. For all other distribution versions, default is `true`. ### Configure max_retries diff --git a/_data-prepper/pipelines/configuration/sinks/pipeline.md b/_data-prepper/pipelines/configuration/sinks/pipeline.md index 3cba75a220..614a9c4efb 100644 --- a/_data-prepper/pipelines/configuration/sinks/pipeline.md +++ b/_data-prepper/pipelines/configuration/sinks/pipeline.md @@ -1,30 +1,25 @@ --- layout: default -title: pipeline +title: Pipeline sink parent: Sinks grand_parent: Pipelines -nav_order: 55 +nav_order: 45 --- -# pipeline +# Pipeline sink -Use the `pipeline` sink to write to another pipeline. +## Overview -## Configuration options - -The `pipeline` sink supports the following configuration options. +You can use the `pipeline` sink to write to another pipeline. Option | Required | Type | Description :--- | :--- | :--- | :--- name | Yes | String | Name of the pipeline to write to. -## Usage + \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/s3.md b/_data-prepper/pipelines/configuration/sinks/s3.md deleted file mode 100644 index cb881e814a..0000000000 --- a/_data-prepper/pipelines/configuration/sinks/s3.md +++ /dev/null @@ -1,158 +0,0 @@ ---- -layout: default -title: s3 -parent: Sinks -grand_parent: Pipelines -nav_order: 55 ---- - -# s3 - -The `s3` sink saves batches of events to [Amazon Simple Storage Service (Amazon S3)](https://aws.amazon.com/s3/) objects. - -## Usage - -The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type `ndjson`: - -``` -pipeline: - ... - sink: - - s3: - aws: - region: us-east-1 - sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper - sts_header_overrides: - max_retries: 5 - bucket: - name: bucket_name - object_key: - path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/ - threshold: - event_count: 2000 - maximum_size: 50mb - event_collect_timeout: 15s - codec: - ndjson: - buffer_type: in_memory -``` - -## Configuration - -Use the following options when customizing the `s3` sink. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`bucket` | Yes | String | The object from which the data is retrieved and then stored. The `name` must match the name of your object store. -`codec` | Yes | [Buffer type](#buffer-type) | Determines the buffer type. -`aws` | Yes | AWS | The AWS configuration. See [aws](#aws) for more information. -`threshold` | Yes | [Threshold](#threshold-configuration) | Configures when to write an object to S3. -`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. Defaults to the S3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` found inside the root directory of the bucket. -`compression` | No | String | The compression algorithm to apply: `none`, `gzip`, or `snappy`. Default is `none`. -`buffer_type` | No | [Buffer type](#buffer-type) | Determines the buffer type. -`max_retries` | No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to `5`. - -## aws - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). -`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). -`sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin. -`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS. - - -## Threshold configuration - -Use the following options to set ingestion thresholds for the `s3` sink. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`event_count` | Yes | Integer | The maximum number of events the S3 bucket can ingest. -`maximum_size` | Yes | String | The maximum number of bytes that the S3 bucket can ingest after compression. Defaults to `50mb`. -`event_collect_timeout` | Yes | String | Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as `PT20.345S`, or a simple notation, such as `60s` or `1500ms`. - - -## Buffer type - -`buffer_type` is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. The default value is `in_memory`. Use one of the following options: - -- `in_memory`: Stores the record in memory. -- `local_file`: Flushes the record into a file on your machine. -- `multipart`: Writes using the [S3 multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html). Every 10 MB is written as a part. - -## Object key configuration - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`path_prefix` | Yes | String | The S3 key prefix path to use. Accepts date-time formatting. For example, you can use `%{yyyy}/%{MM}/%{dd}/%{HH}/` to create hourly folders in S3. By default, events write to the root of the bucket. - - -## codec - -The `codec` determines how the `s3` source formats data written to each S3 object. - -### avro codec - -The `avro` codec writes an event as an [Apache Avro](https://avro.apache.org/) document. - -Because Avro requires a schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema. -In general, you should define your own schema because it will most accurately reflect your needs. - -We recommend that you make your Avro fields use a null [union](https://avro.apache.org/docs/current/specification/#unions). -Without the null union, each field must be present or the data will fail to write to the sink. -If you can be certain that each each event has a given field, you can make it non-nullable. - -When you provide your own Avro schema, that schema defines the final structure of your data. -Therefore, any extra values inside any incoming events that are not mapped in the Arvo schema will not be included in the final destination. -To avoid confusion between a custom Arvo schema and the `include_keys` or `exclude_keys` sink configurations, Data Prepper does not allow the use of the `include_keys` or `exclude_keys` with a custom schema. - -In cases where your data is uniform, you may be able to automatically generate a schema. -Automatically generated schemas are based on the first event received by the codec. -The schema will only contain keys from this event. -Therefore, you must have all keys present in all events in order for the automatically generated schema to produce a working schema. -Automatically generated schemas make all fields nullable. -Use the sink's `include_keys` and `exclude_keys` configurations to control what data is included in the auto-generated schema. - - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. - - -### ndjson codec - -The `ndjson` codec writes each line as a JSON object. - -The `ndjson` codec does not take any configurations. - - -### json codec - -The `json` codec writes events in a single large JSON file. -Each event is written into an object within a JSON array. - - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`key_name` | No | String | The name of the key for the JSON array. By default this is `events`. - - -### parquet codec - -The `parquet` codec writes events into a Parquet file. -When using the Parquet codec, set the `buffer_type` to `in_memory`. - -The Parquet codec writes data using the Avro schema. -Because Parquet requires an Avro schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema. -However, we generally recommend that you define your own schema so that it can best meet your needs. - -For details on the Avro schema and recommendations, see the [Avro codec](#avro-codec) documentation. - - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true. -`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event. - diff --git a/_data-prepper/pipelines/configuration/sinks/sinks.md b/_data-prepper/pipelines/configuration/sinks/sinks.md index 0f3af6ab25..89fc15ba7e 100644 --- a/_data-prepper/pipelines/configuration/sinks/sinks.md +++ b/_data-prepper/pipelines/configuration/sinks/sinks.md @@ -14,9 +14,6 @@ Sinks define where Data Prepper writes your data to. The following table describes options you can use to configure the `sinks` sink. -Option | Required | Type | Description -:--- | :--- |:------------| :--- -routes | No | String list | A list of routes for which this sink applies. If not provided, this sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information. -tags_target_key | No | String | When specified, includes event tags in the output of the provided key. -include_keys | No | String list | When specified, provides the keys in this list in the data sent to the sink. Some codecs and sinks do not allow use of this field. -exclude_keys | No | String list | When specified, excludes the keys given from the data sent to the sink. Some codecs and sinks do not allow use of this field. +Option | Required | Type | Description +:--- | :--- | :--- | :--- +routes | No | List | List of routes that the sink accepts. If not specified, the sink accepts all upstream events. \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sinks/stdout.md b/_data-prepper/pipelines/configuration/sinks/stdout.md index 35b1b08126..7b55cb0a10 100644 --- a/_data-prepper/pipelines/configuration/sinks/stdout.md +++ b/_data-prepper/pipelines/configuration/sinks/stdout.md @@ -8,4 +8,12 @@ nav_order: 45 # stdout sink -Use the `stdout` sink for console output and testing. It has no configurable options. +## Overview + +You can use the `stdout` sink for console output and testing. It has no configurable options. + + \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/sources/kafka.md b/_data-prepper/pipelines/configuration/sources/kafka.md deleted file mode 100644 index 4df72cfdd6..0000000000 --- a/_data-prepper/pipelines/configuration/sources/kafka.md +++ /dev/null @@ -1,144 +0,0 @@ ---- -layout: default -title: kafka -parent: Sources -grand_parent: Pipelines -nav_order: 6 ---- - -# kafka - -You can use the Apache Kafka source (`kafka`) in Data Prepper to read records from one or more Kafka [topics](https://kafka.apache.org/intro#intro_concepts_and_terms). These records hold events that your Data Prepper pipeline can ingest. The `kafka` source uses Kafka's [Consumer API](https://kafka.apache.org/documentation/#consumerapi) to consume messages from the Kafka broker, which then creates Data Prepper events for further processing by the Data Prepper pipeline. - -## Usage - -The following example shows the `kafka` source in a Data Prepper pipeline: - -```json -kafka-pipeline: - source: - kafka: - bootstrap_servers: - - 127.0.0.1:9093 - topics: - - name: Topic1 - group_id: groupID1 - - name: Topic2 - group_id: groupID1 -``` - -## Configuration - -Use the following configuration options with the `kafka` source. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`bootstrap_servers` | Yes, when not using Amazon Managed Streaming for Apache Kafka (Amazon MSK) as a cluster. | IP address | The host or port for the initial connection to the Kafka cluster. You can configure multiple Kafka brokers by using the IP address or port number for each broker. When using [Amazon MSK](https://aws.amazon.com/msk/) as your Kafka cluster, the bootstrap server information is obtained from MSK using the MSK Amazon Resource Name (ARN) provided in the configuration. -`topics` | Yes | JSON array | The Kafka topics that the Data Prepper `kafka` source uses to read messages. You can configure up to 10 topics. For more information about `topics` configuration options, see [Topics](#topics). -`schema` | No | JSON object | The schema registry configuration. For more information, see [Schema](#schema). -`authentication` | No | JSON object | Set the authentication options for both the pipeline and Kafka. For more information, see [Authentication](#authentication). -`encryption` | No | JSON object | The encryption configuration. For more information, see [Encryption](#encryption). -`aws` | No | JSON object | The AWS configuration. For more information, see [aws](#aws). -`acknowledgments` | No | Boolean | If `true`, enables the `kafka` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`. -`client_dns_lookup` | Yes, when a DNS alias is used. | String | Sets Kafka's `client.dns.lookup` option. Default is `default`. - -### Topics - -Use the following options in the `topics` array. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`name` | Yes | String | The name of each Kafka topic. -`group_id` | Yes | String | Sets Kafka's `group.id` option. -`workers` | No | Integer | The number of multithreaded consumers associated with each topic. Default is `2`. The maximum value is `200`. -`serde_format` | No | String | Indicates the serialization and deserialization format of the messages in the topic. Default is `plaintext`. -`auto_commit` | No | Boolean | When `false`, the consumer's offset will not be periodically committed to Kafka in the background. Default is `false`. -`commit_interval` | No | Integer | When `auto_commit` is set to `true`, sets how frequently, in seconds, the consumer offsets are auto-committed to Kafka through Kafka's `auto.commit.interval.ms` option. Default is `5s`. -`session_timeout` | No | Integer | The amount of time during which the source detects client failures when using Kafka's group management features, which can be used to balance the data stream. Default is `45s`. -`auto_offset_reset` | No | String | Automatically resets the offset to an earlier or the latest offset through Kafka's `auto.offset.reset` option. Default is `latest`. -`thread_waiting_time` | No | Integer | The amount of time that threads wait for the preceding thread to complete its task and to signal the next thread. The Kafka consumer API poll timeout value is set to half of this setting. Default is `5s`. -`max_partition_fetch_bytes` | No | Integer | Sets the maximum limit in megabytes for max data returns from each partition through Kafka's `max.partition.fetch.bytes` setting. Default is `1mb`. -`heart_beat_interval` | No | Integer | The expected amount of time between heartbeats to the consumer coordinator when using Kafka's group management facilities through Kafka's `heartbeat.interval.ms` setting. Default is `5s`. -`fetch_max_wait` | No | Integer | The maximum amount of time during which the server blocks a fetch request when there isn't sufficient data to satisfy the `fetch_min_bytes` requirement through Kafka's `fetch.max.wait.ms` setting. Default is `500ms`. -`fetch_max_bytes` | No | Integer | The maximum record size accepted by the broker through Kafka's `fetch.max.bytes` setting. Default is `50mb`. -`fetch_min_bytes` | No | Integer | The minimum amount of data the server returns during a fetch request through Kafka's `retry.backoff.ms` setting. Default is `1b`. -`retry_backoff` | No | Integer | The amount of time to wait before attempting to retry a failed request to a given topic partition. Default is `10s`. -`max_poll_interval` | No | Integer | The maximum delay between invocations of a `poll()` when using group management through Kafka's `max.poll.interval.ms` option. Default is `300s`. -`consumer_max_poll_records` | No | Integer | The maximum number of records returned in a single `poll()` call through Kafka's `max.poll.records` setting. Default is `500`. -`key_mode` | No | String | Indicates how the key field of the Kafka message should be handled. The default setting is `include_as_field`, which includes the key in the `kafka_key` event. The `include_as_metadata` setting includes the key in the event's metadata. The `discard` setting discards the key. - -### Schema - -The following option is required inside the `schema` configuration. - -Option | Type | Description -:--- | :--- | :--- -`type` | String | Sets the type of schema based on your registry, either the AWS Glue Schema Registry, `aws_glue`, or the Confluent Schema Registry, `confluent`. When using the `aws_glue` registry, set any [AWS](#aws) configuration options. - -The following configuration options are only required when using a `confluent` registry. - -Option | Type | Description -:--- | :--- | :--- -`registry_url` | String | Deserializes a record value from a `bytearray` into a string. Default is `org.apache.kafka.common.serialization.StringDeserializer`. -`version` | String | Deserializes a record key from a `bytearray` into a string. Default is `org.apache.kafka.common.serialization.StringDeserializer`. -`schema_registry_api_key` | String | The schema registry API key. -`schema_registry_api_secret` | String | The schema registry API secret. - -### Authentication - -The following option is required inside the `authentication` object. - -Option | Type | Description -:--- | :--- | :--- -`sasl` | JSON object | The Simple Authentication and Security Layer (SASL) authentication configuration. - -### SASL - -Use one of the following options when configuring SASL authentication. - - -Option | Type | Description -:--- | :--- | :--- -`plaintext` | JSON object | The [PLAINTEXT](#sasl-plaintext) authentication configuration. -`aws_msk_iam` | String | The Amazon MSK AWS Identity and Access Management (IAM) configuration. If set to `role`, the `sts_role_arm` set in the `aws` configuration is used. Default is `default`. - - - -#### SASL PLAINTEXT - -The following options are required when using the [SASL PLAINTEXT](https://kafka.apache.org/10/javadoc/org/apache/kafka/common/security/auth/SecurityProtocol.html) protocol. - -Option | Type | Description -:--- | :--- | :--- -`username` | String | The username for the PLAINTEXT auth. -`password` | String | The password for the PLAINTEXT auth. - -#### Encryption - -Use the following options when setting SSL encryption. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`type` | No | String | The encryption type. Use `none` to disable encryption. Default is `ssl`. -`Insecure` | No | Boolean | A Boolean flag used to turn off SSL certificate verification. If set to `true`, certificate authority (CA) certificate verification is turned off and insecure HTTP requests are sent. Default is `false`. - - -#### AWS - -Use the following options when setting up authentication for `aws` services. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). -`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Storage Service (Amazon S3). Default is `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). -`msk` | No | JSON object | The [MSK](#msk) configuration settings. - -#### MSK - -Use the following options inside the `msk` object. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`arn` | Yes | String | The [MSK ARN](https://docs.aws.amazon.com/msk/1.0/apireference/configurations-arn.html) to use. -`broker_connection_type` No | String | The type of connector to use with the MSK broker, either `public`, `single_vpc`, or `multip_vpc`. Default is `single_vpc`. - diff --git a/_data-prepper/pipelines/configuration/sources/otel-trace.md b/_data-prepper/pipelines/configuration/sources/otel-trace.md index 4b17647768..c129427939 100644 --- a/_data-prepper/pipelines/configuration/sources/otel-trace.md +++ b/_data-prepper/pipelines/configuration/sources/otel-trace.md @@ -13,6 +13,7 @@ nav_order: 15 The `otel_trace` source is a source for the OpenTelemetry Collector. The following table describes options you can use to configure the `otel_trace` source. + Option | Required | Type | Description :--- | :--- | :--- | :--- @@ -33,6 +34,10 @@ awsRegion | Conditionally | String | Represents the AWS region used by ACM or Am authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This parameter uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/1.2.0/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java). + + ## Metrics ### Counters diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 5624bed46f..2eb1a53c71 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -21,11 +21,7 @@ In order to use the `s3` source, configure your AWS Identity and Access Manageme { "Sid": "s3-access", "Effect": "Allow", - "Action": [ - "s3:GetObject", - "s3:ListBucket", - "s3:DeleteObject" - ], + "Action": "s3:GetObject", "Resource": "arn:aws:s3:::/*" }, { @@ -49,31 +45,6 @@ In order to use the `s3` source, configure your AWS Identity and Access Manageme If your S3 objects or Amazon SQS queues do not use [AWS Key Management Service (AWS KMS)](https://aws.amazon.com/kms/), remove the `kms:Decrypt` permission. -## Cross-account S3 access - -When Data Prepper fetches data from an S3 bucket, it verifies the ownership of the bucket using the -[bucket owner condition](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-owner-condition.html). -By default, Data Prepper expects an S3 bucket to be owned by the same that owns the correlating SQS queue. -When no SQS is provided, Data Prepper uses the Amazon Resource Name (ARN) role in the `aws` configuration. - -If you plan to ingest data from multiple S3 buckets but each bucket is associated with a different S3 account, you need to configure Data Prepper to check for cross-account S3 access, according to the following conditions: - -- If all S3 buckets you want data from belong to an account other than that of the SQS queue, set `default_bucket_owner` to the account ID of the bucket account holder. -- If your S3 buckets are in multiple accounts, use a `bucket_owners` map. - -In the following example, the SQS queue is owned by account `000000000000`. The SQS queue contains data from two S3 buckets: `my-bucket-01` and `my-bucket-02`. -Because `my-bucket-01` is owned by `123456789012` and `my-bucket-02` is owned by `999999999999`, the `bucket_owners` map calls both bucket owners with their account IDs, as shown in the following configuration: - -``` -s3: - sqs: - queue_url: "https://sqs.us-east-1.amazonaws.com/000000000000/MyQueue" - bucket_owners: - my-bucket-01: 123456789012 - my-bucket-02: 999999999999 -``` - -You can use both `bucket_owners` and `default_bucket_owner` together. ## Configuration @@ -81,21 +52,17 @@ You can use the following options to configure the `s3` source. Option | Required | Type | Description :--- | :--- | :--- | :--- -`notification_type` | Yes | String | Must be `sqs`. -`notification_source` | No | String | Determines how notifications are received by SQS. Must be `s3` or `eventbridge`. `s3` represents notifications that are directly sent from Amazon S3 to Amazon SQS or fanout notifications from Amazon S3 to Amazon Simple Notification Service (Amazon SNS) to Amazon SQS. `eventbridge` represents notifications from [Amazon EventBridge](https://aws.amazon.com/eventbridge/) and [Amazon Security Lake](https://aws.amazon.com/security-lake/). Default is `s3`. -`compression` | No | String | The compression algorithm to apply: `none`, `gzip`, or `automatic`. Default is `none`. -`codec` | Yes | Codec | The [codec](#codec) to apply. -`sqs` | Yes | SQS | The SQS configuration. See [sqs](#sqs) for more information. -`aws` | Yes | AWS | The AWS configuration. See [aws](#aws) for more information. -`on_error` | No | String | Determines how to handle errors in Amazon SQS. Can be either `retain_messages` or `delete_messages`. `retain_messages` leaves the message in the Amazon SQS queue and tries to send the message again. This is recommended for dead-letter queues. `delete_messages` deletes failed messages. Default is `retain_messages`. -buffer_timeout | No | Duration | The amount of time allowed for writing events to the Data Prepper buffer before timeout occurs. Any events that the Amazon S3 source cannot write to the buffer during the set amount of time are discarded. Default is `10s`. -`records_to_accumulate` | No | Integer | The number of messages that accumulate before being written to the buffer. Default is `100`. -`metadata_root_key` | No | String | The base key for adding S3 metadata to each event. The metadata includes the key and bucket for each S3 object. Default is `s3/`. -`disable_bucket_ownership_validation` | No | Boolean | When `true`, the S3 source does not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the Amazon SQS queue. Default is `false`. -`acknowledgments` | No | Boolean | When `true`, enables `s3` sources to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks. -`s3_select` | No | [s3_select](#s3_select) | The Amazon S3 Select configuration. -`scan` | No | [scan](#scan) | The S3 scan configuration. -`delete_s3_objects_on_read` | No | Boolean | When `true`, the S3 scan attempts to delete S3 objects after all events from the S3 object are successfully acknowledged by all sinks. `acknowledgments` should be enabled when deleting S3 objects. Default is `false`. +notification_type | Yes | String | Must be `sqs`. +compression | No | String | The compression algorithm to apply: `none`, `gzip`, or `automatic`. Default value is `none`. +codec | Yes | Codec | The [codec](#codec) to apply. +sqs | Yes | sqs | The SQS configuration. See [sqs](#sqs) for details. +aws | Yes | aws | The AWS configuration. See [aws](#aws) for details. +on_error | No | String | Determines how to handle errors in Amazon SQS. Can be either `retain_messages` or `delete_messages`. If `retain_messages`, then Data Prepper will leave the message in the Amazon SQS queue and try again. This is recommended for dead-letter queues. If `delete_messages`, then Data Prepper will delete failed messages. Default value is `retain_messages`. +buffer_timeout | No | Duration | The amount of time allowed for for writing events to the Data Prepper buffer before timeout occurs. Any events that the Amazon S3 source cannot write to the buffer in this time will be discarded. Default value is 10 seconds. +records_to_accumulate | No | Integer | The number of messages that accumulate before writing to the buffer. Default value is 100. +metadata_root_key | No | String | Base key for adding S3 metadata to each Event. The metadata includes the key and bucket for each S3 object. Defaults to `s3/`. +disable_bucket_ownership_validation | No | Boolean | If `true`, the S3Source will not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the Amazon SQS queue. Defaults to `false`. +acknowledgments | No | Boolean | If `true`, enables `s3` sources to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. ## sqs @@ -104,20 +71,20 @@ The following parameters allow you to configure usage for Amazon SQS in the `s3` Option | Required | Type | Description :--- | :--- | :--- | :--- -`queue_url` | Yes | String | The URL of the Amazon SQS queue from which messages are received. -`maximum_messages` | No | Integer | The maximum number of messages to receive from the Amazon SQS queue in any single request. Default is `10`. -`visibility_timeout` | No | Duration | The visibility timeout to apply to messages read from the Amazon SQS queue. This should be set to the amount of time that Data Prepper may take to read all the S3 objects in a batch. Default is `30s`. -`wait_time` | No | Duration | The amount of time to wait for long polling on the Amazon SQS API. Default is `20s`. -`poll_delay` | No | Duration | A delay placed between the reading and processing of a batch of Amazon SQS messages and making a subsequent request. Default is `0s`. +queue_url | Yes | String | The URL of the Amazon SQS queue from which messages are received. +maximum_messages | No | Integer | The maximum number of messages to receive from the Amazon SQS queue in any single request. Default value is `10`. +visibility_timeout | No | Duration | The visibility timeout to apply to messages read from the Amazon SQS queue. This should be set to the amount of time that Data Prepper may take to read all the S3 objects in a batch. Default value is `30s`. +wait_time | No | Duration | The amount of time to wait for long polling on the Amazon SQS API. Default value is `20s`. +poll_delay | No | Duration | A delay to place between reading/processing a batch of Amazon SQS messages and making a subsequent request. Default value is `0s`. ## aws Option | Required | Type | Description :--- | :--- | :--- | :--- -`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). -`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). -`aws_sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin. +region | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). +sts_role_arn | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to null, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). +aws_sts_header_overrides | No | Map | A map of header overrides that the IAM role assumes for the sink plugin. ## codec @@ -131,8 +98,8 @@ Use the following options to configure the `newline` codec. Option | Required | Type | Description :--- | :--- |:--------| :--- -`skip_lines` | No | Integer | The number of lines to skip before creating events. You can use this configuration to skip common header rows. Default is `0`. -`header_destination` | No | String | A key value to assign to the header line of the S3 object. If this option is specified, then each event will contain a `header_destination` field. +skip_lines | No | Integer | The number of lines to skip before creating events. You can use this configuration to skip common header rows. Default is `0`. +header_destination | No | String | A key value to assign to the header line of the S3 object. If this option is specified, then each event will contain a header_destination field. ### json codec @@ -144,23 +111,23 @@ The `csv` codec parses objects in comma-separated value (CSV) format, with each Option | Required | Type | Description :--- |:---------|:------------| :--- -`delimiter` | Yes | Integer | The delimiter separating columns. Default is `,`. -`quote_character` | Yes | String | The character used as a text qualifier for CSV data. Default is `"`. -`header` | No | String list | The header containing the column names used to parse CSV data. -`detect_header` | No | Boolean | Whether the first line of the S3 object should be interpreted as a header. Default is `true`. +delimiter | Yes | Integer | The delimiter separating columns. Default is `,`. +quote_character | Yes | String | The character used as a text qualifier for CSV data. Default is `"`. +header | No | String list | The header containing the column names used to parse CSV data. +detect_header | No | Boolean | Whether the first line of the S3 object should be interpreted as a header. Default is `true`. -## Using `s3_select` with the `s3` source +## Using `s3_select` with the `s3` source When configuring `s3_select` to parse S3 objects, use the following options. Option | Required | Type | Description :--- |:-----------------------|:------------| :--- -`expression` | Yes, when using `s3_select` | String | The expression used to query the object. Maps directly to the [expression](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-Expression) property. -`expression_type` | No | String | The type of the provided expression. Default value is `SQL`. Maps directly to the [ExpressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ExpressionType). -`input_serialization` | Yes, when using `s3_select` | String | Provides the S3 Select file format. Amazon S3 uses this format to parse object data into records and returns only records that match the specified SQL expression. May be `csv`, `json`, or `parquet`. -`compression_type` | No | String | Specifies an object's compression format. Maps directly to the [CompressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_InputSerialization.html#AmazonS3-Type-InputSerialization-CompressionType). -`csv` | No | [csv](#s3_select_csv) | Provides the CSV configuration for processing CSV data. -`json` | No | [json](#s3_select_json) | Provides the JSON configuration for processing JSON data. +expression | Yes, when using `s3_select` | String | The expression used to query the object. Maps directly to the [expression](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-Expression) property. +expression_type | No | String | The type of the provided expression. Default value is `SQL`. Maps directly to the [ExpressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ExpressionType). +input_serialization | Yes, when using `s3_select` | String | Provides the S3 Select file format. Amazon S3 uses this format to parse object data into records and returns only records that match the specified SQL expression. May be `csv`, `json`, or `parquet`. +compression_type | No | String | Specifies an object's compression format. Maps directly to the [CompressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_InputSerialization.html#AmazonS3-Type-InputSerialization-CompressionType). +csv | No | [csv](#s3_select_csv) | Provides the CSV configuration for processing CSV data. +json | No | [json](#s3_select_json) | Provides the JSON configuration for processing JSON data. ### csv @@ -170,59 +137,18 @@ These options map directly to options available in the S3 Select [CSVInput](http Option | Required | Type | Description :--- |:---------|:------------| :--- -`file_header_info` | No | String | Describes the first line of input. Maps directly to the [FileHeaderInfo](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-FileHeaderInfo) property. -`quote_escape` | No | String | A single character used for escaping the quotation mark character inside an already escaped value. Maps directly to the [QuoteEscapeCharacter](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-QuoteEscapeCharacter) property. -`comments` | No | String | A single character used to indicate that a row should be ignored when the character is present at the start of that row. Maps directly to the [Comments](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-Comments) property. +file_header_info | No | String | Describes the first line of input. Maps directly to the [FileHeaderInfo](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-FileHeaderInfo) property. +quote_escape | No | String | A single character used for escaping the quotation mark character inside an already escaped value. Maps directly to the [QuoteEscapeCharacter](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-QuoteEscapeCharacter) property. +comments | No | String | A single character used to indicate that a row should be ignored when the character is present at the start of that row. Maps directly to the [Comments](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-Comments) property. #### json Use the following option in conjunction with `json` for `s3_select` to determine how S3 Select processes the JSON file. -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`type` | No | String | The type of JSON array. May be either `DOCUMENT` or `LINES`. Maps directly to the [Type](https://docs.aws.amazon.com/AmazonS3/latest/API/API_JSONInput.html#AmazonS3-Type-JSONInput-Type) property. - -## Using `scan` with the `s3` source -The following parameters allow you to scan S3 objects. All options can be configured at the bucket level. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together. -`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. `end_time` and `range` cannot be used together. -`range` | No | String | The time range from which objects are scanned from all buckets. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). `start_time` and `end_time` cannot be used with `range`. Range `P12H` scans all the objects modified in the last 12 hours from the time pipeline started. -`buckets` | Yes | List | A list of [buckets](#bucket) to scan. -`scheduling` | No | List | The configuration for scheduling periodic scans on all buckets. `start_time`, `end_time` and `range` can not be used if scheduling is configured. - -### bucket - -Option | Required | Type | Description -:--- | :--- |:-----| :--- -`bucket` | Yes | Map | Provides options for each bucket. - -You can configure the following options inside the [bucket](#bucket) setting. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`name` | Yes | String | The string representing the S3 bucket name to scan. -`filter` | No | [Filter](#filter) | Provides the filter configuration. -`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together. This will overwrites the `start_time` at the [scan](#scan) level. -`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. This overwrites the `end_time` at the [scan](#scan) level. -`range` | No | String | The time range from which objects are scanned from all buckets. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). `start_time` and `end_time` cannot be used with `range`. Range `P12H` scans all the objects modified in the last 12 hours from the time pipeline started. This overwrites the `range` at the [scan](#scan) level. - -### filter - -Use the following options inside the `filter` configuration. - -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`include_prefix` | No | List | A list of S3 key prefix strings included in the scan. By default, all the objects in a bucket are included. -`exclude_suffix` | No | List | A list of S3 key suffix strings excluded from the scan. By default, no objects in a bucket are excluded. +Option | Required | Type | Description +:--- |:---------|:------------| :--- +type | No | String | The type of JSON array. May be either `DOCUMENT` or `LINES`. Maps directly to the [Type](https://docs.aws.amazon.com/AmazonS3/latest/API/API_JSONInput.html#AmazonS3-Type-JSONInput-Type) property. -### scheduling -Option | Required | Type | Description -:--- | :--- | :--- | :--- -`interval` | Yes | String | Indicates the minimum interval between each scan. The next scan in the interval will start after the interval duration from the last scan ends and when all the objects from the previous scan are processed. Supports ISO_8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`). -`count` | No | Integer | Specifies how many times a bucket will be scanned. Defaults to `Integer.MAX_VALUE`. ## Metrics @@ -237,10 +163,6 @@ The `s3` source includes the following metrics. * `sqsMessagesReceived`: The number of Amazon SQS messages received from the queue by the `s3` source. * `sqsMessagesDeleted`: The number of Amazon SQS messages deleted from the queue by the `s3` source. * `sqsMessagesFailed`: The number of Amazon SQS messages that the `s3` source failed to parse. -* `s3ObjectNoRecordsFound` -- The number of S3 objects that resulted in 0 records added to the buffer by the `s3` source. -* `sqsMessagesDeleteFailed` -- The number of SQS messages that the `s3` source failed to delete from the SQS queue. -* `s3ObjectsDeleted` -- The number of S3 objects deleted by the `s3` source. -* `s3ObjectsDeleteFailed` -- The number of S3 objects that the `s3` source failed to delete. ### Timers @@ -253,7 +175,7 @@ The `s3` source includes the following metrics. * `s3ObjectProcessedBytes`: Measures the bytes processed by the `s3` source for a given object. For compressed objects, this is the uncompressed size. * `s3ObjectsEvents`: Measures the number of events (sometimes called records) produced by an S3 object. -## Example: Uncompressed logs with sqs +## Example: Uncompressed logs The following pipeline.yaml file shows the minimum configuration for reading uncompressed newline-delimited logs: @@ -270,27 +192,3 @@ source: region: "us-east-1" sts_role_arn: "arn:aws:iam::123456789012:role/Data-Prepper" ``` - -## Example: Uncompressed logs with scan - -The following pipeline.yaml file shows the minimum configuration for scanning objects with uncompressed newline-delimited logs: - -``` -source: - s3: - codec: - newline: - compression: none - aws: - region: "us-east-1" - sts_role_arn: "arn:aws:iam::123456789012:role/Data-Prepper" - scan: - start_time: 2023-01-01T00:00:00 - range: "P365D" - buckets: - - bucket: - name: "s3-scan-test" - filter: - exclude_suffix: - - "*.log" -``` diff --git a/_data-prepper/pipelines/expression-syntax.md b/_data-prepper/pipelines/expression-syntax.md index 8257ab8978..cc7e0ad2ef 100644 --- a/_data-prepper/pipelines/expression-syntax.md +++ b/_data-prepper/pipelines/expression-syntax.md @@ -22,7 +22,6 @@ Operators are listed in order of precedence (top to bottom, left to right). | `and`, `or` | Conditional Expression | left-to-right | ## Reserved for possible future functionality - Reserved symbol set: `^`, `*`, `/`, `%`, `+`, `-`, `xor`, `=`, `+=`, `-=`, `*=`, `/=`, `%=`, `++`, `--`, `${}` ## Set initializer @@ -34,19 +33,15 @@ The set initializer defines a set or term and/or expressions. The following are examples of set initializer syntax. #### HTTP status codes - ``` {200, 201, 202} ``` - #### HTTP response payloads - ``` {"Created", "Accepted"} ``` #### Handle multiple event types with different keys - ``` {/request_payload, /request_message} ``` @@ -62,11 +57,9 @@ A priority expression identifies an expression that will be evaluated at the hig ``` ## Relational operators - Relational operators are used to test the relationship of two numeric values. The operands must be numbers or JSON Pointers that resolve to numbers. ### Syntax - ``` < <= @@ -75,13 +68,11 @@ Relational operators are used to test the relationship of two numeric values. Th ``` ### Example - ``` /status_code >= 200 and /status_code < 300 ``` ## Equality operators - Equality operators are used to test whether two values are equivalent. ### Syntax @@ -115,7 +106,6 @@ null != /response ``` #### Conditional expression - A conditional expression is used to chain together multiple expressions and/or values. #### Syntax @@ -218,30 +208,3 @@ White space is **required** surrounding set initializers, priority expressions, | `==`, `!=` | Equality operators | No | `/status == 200`
`/status_code==200` | | | `and`, `or`, `not` | Conditional operators | Yes | `/a<300 and /b>200` | `/b<300and/b>200` | | `,` | Set value delimiter | No | `/a in {200, 202}`
`/a in {200,202}`
`/a in {200 , 202}` | `/a in {200,}` | - - -## Functions - -Data Prepper supports the following built-in functions that can be used in an expression. - -### `length()` - -The `length()` function takes one argument of the JSON pointer type and returns the length of the value passed. For example, `length(/message)` returns a length of `10` when a key message exists in the event and has a value of `1234567890`. - -### `hasTags()` - -The `hastags()` function takes one or more string type arguments and returns `true` if all the arguments passed are present in an event's tags. When an argument does not exist in the event's tags, the function returns `false`. For example, if you use the expression `hasTags("tag1")` and the event contains `tag1`, Data Prepper returns `true`. If you use the expression `hasTags("tag2")` but the event only contains a `tag1` tag, Data Prepper returns `false`. - -### `getMetadata()` - -The `getMetadata()` function takes one literal string argument to look up specific keys in a an event's metadata. If the key contains a `/`, then the function looks up the metadata recursively. When passed, the expression returns the value corresponding to the key. The value returned can be of any type. For example, if the metadata contains `{"key1": "value2", "key2": 10}`, then the function, `getMetadata("key1")`, returns `value2`. The function, `getMetadata("key2")`, returns 10. - -### `contains()` - -The `contains()` function takes two string arguments and determines whether either a literal string or a JSON pointer is contained within an event. When the second argument contains a substring of the first argument, such as `contains("abcde", "abcd")`, the function returns `true`. If the second argument does not contain any substrings, such as `contains("abcde", "xyz")`, it returns `false`. - -### `cidrContains()` - -The `cidrContains()` function takes two or more arguments. The first argument is a JSON pointer, which represents the key to the IP address that is checked. It supports both IPv4 and IPv6 addresses. Every argument that comes after the key is a string type that represents CIDR blocks that are checked against. - -If the IP address in the first argument is in the range of any of the given CIDR blocks, the function returns `true`. If the IP address is not in the range of the CIDR blocks, the function returns `false`. For example, `cidrContains(/sourceIp,"192.0.2.0/24","10.0.1.0/16")` will return `true` if the `sourceIp` field indicated in the JSON pointer has a value of `192.0.2.5`. diff --git a/_data-prepper/pipelines/pipelines-configuration-options.md b/_data-prepper/pipelines/pipelines-configuration-options.md index 5667906af1..73f8adad8e 100644 --- a/_data-prepper/pipelines/pipelines-configuration-options.md +++ b/_data-prepper/pipelines/pipelines-configuration-options.md @@ -14,5 +14,4 @@ This page provides information about pipeline configuration options in Data Prep Option | Required | Type | Description :--- | :--- | :--- | :--- workers | No | Integer | Essentially the number of application threads. As a starting point for your use case, try setting this value to the number of CPU cores on the machine. Default is 1. -delay | No | Integer | Amount of time in milliseconds workers wait between buffer read attempts. Default is `3000`. - +delay | No | Integer | Amount of time in milliseconds workers wait between buffer read attempts. Default is 3,000. \ No newline at end of file diff --git a/_data-prepper/pipelines/pipelines.md b/_data-prepper/pipelines/pipelines.md index 50063079e7..fca8cd16a3 100644 --- a/_data-prepper/pipelines/pipelines.md +++ b/_data-prepper/pipelines/pipelines.md @@ -56,12 +56,13 @@ Alternatively, the source sends a negative acknowledgment when an event cannot b When any component of a pipeline fails and is unable to send an event, the source receives no acknowledgment. In the case of a failure, the pipeline's source times out. This gives you the ability to take any necessary actions to address the source failure, including rerunning the pipeline or logging the failure. +As of Data Prepper 2.2, only the `s3` source and `opensearch` sink support E2E acknowledgments. ## Conditional routing -Pipelines also support **conditional routing** which allows you to route events to different sinks based on specific conditions. To add conditional routing to a pipeline, specify a list of named routes under the `route` component and add specific routes to sinks under the `routes` property. Any sink with the `routes` property will only accept events that match at least one of the routing conditions. +Pipelines also support **conditional routing** which allows you to route Events to different sinks based on specific conditions. To add conditional routing to a pipeline, specify a list of named routes under the `route` component and add specific routes to sinks under the `routes` property. Any sink with the `routes` property will only accept Events that match at least one of the routing conditions. -In the following example, `application-logs` is a named route with a condition set to `/log_type == "application"`. The route uses [Data Prepper expressions](https://github.com/opensearch-project/data-prepper/tree/main/examples) to define the conditions. Data Prepper only routes events that satisfy the condition to the first OpenSearch sink. By default, Data Prepper routes all events to a sink which does not define a route. In the example, all events route into the third OpenSearch sink. +In the following example, `application-logs` is a named route with a condition set to `/log_type == "application"`. The route uses [Data Prepper expressions](https://github.com/opensearch-project/data-prepper/tree/main/examples) to define the conditions. Data Prepper only routes events that satisfy the condition to the first OpenSearch sink. By default, Data Prepper routes all Events to a sink which does not define a route. In the example, all Events route into the third OpenSearch sink. ```yml conditional-routing-sample-pipeline: @@ -131,8 +132,8 @@ log-pipeline: # username and password above. #aws_sigv4: true #aws_region: us-east-1 - # Since we are Grok matching for Apache logs, it makes sense to send them to an OpenSearch index named apache_logs. - # You should change this to correspond with how your OpenSearch indexes are set up. + # Since we are grok matching for apache logs, it makes sense to send them to an OpenSearch index named apache_logs. + # You should change this to correspond with how your OpenSearch indices are set up. index: apache_logs ``` @@ -308,7 +309,7 @@ docker run --name data-prepper \ ## Configure peer forwarder -Data Prepper provides an HTTP service to forward events between Data Prepper nodes for aggregation. This is required for operating Data Prepper in a clustered deployment. Currently, peer forwarding is supported in `aggregate`, `service_map_stateful`, and `otel_trace_raw` processors. Peer forwarder groups events based on the identification keys provided by the processors. For `service_map_stateful` and `otel_trace_raw` it's `traceId` by default and can not be configured. For `aggregate` processor, it is configurable using `identification_keys` option. +Data Prepper provides an HTTP service to forward Events between Data Prepper nodes for aggregation. This is required for operating Data Prepper in a clustered deployment. Currently, peer forwarding is supported in `aggregate`, `service_map_stateful`, and `otel_trace_raw` processors. Peer forwarder groups events based on the identification keys provided by the processors. For `service_map_stateful` and `otel_trace_raw` it's `traceId` by default and can not be configured. For `aggregate` processor, it is configurable using `identification_keys` option. Peer forwarder supports peer discovery through one of three options: a static list, a DNS record lookup , or AWS Cloud Map. Peer discovery can be configured using `discovery_mode` option. Peer forwarder also supports SSL for verification and encryption, and mTLS for mutual authentication in a peer forwarding service. diff --git a/_data/top_nav.yml b/_data/top_nav.yml index 5830305eb8..61c76586b1 100644 --- a/_data/top_nav.yml +++ b/_data/top_nav.yml @@ -1,13 +1,4 @@ items: - - - label: OpenSearchCon - children: - - - label: Register for OpenSearchCon! - url: https://opensearchcon2023.splashthat.com - - - label: OpenSearchCon 2023 CFP! - url: /opensearchcon2023-cfp.html - label: Download fragments: @@ -37,7 +28,6 @@ items: - community_projects - blog - partners - - slack children: - label: Blog @@ -45,9 +35,6 @@ items: - label: Forum url: https://forum.opensearch.org/ - - - label: Slack - url: /slack.html - label: Events url: /events @@ -61,16 +48,3 @@ items: label: Documentation fragment: docs url: /docs/ - - - label: Platform - children: - - - label: Vector Database - url: /platform/search/vector-database.html - - - label: Live Demo - url: https://playground.opensearch.org/ - - - label: Performance Benchmarks - url: /benchmarks - diff --git a/_data/versions.json b/_data/versions.json index da2a35fb48..0515ac80de 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,12 +1,10 @@ { - "current": "2.9", + "current": "2.7", "all": [ - "2.9", + "2.7", "1.3" ], "archived": [ - "2.8", - "2.7", "2.6", "2.5", "2.4", @@ -18,7 +16,7 @@ "1.1", "1.0" ], - "latest": "2.9" + "latest": "2.7" } diff --git a/_developer-documentation/extensions.md b/_developer-documentation/extensions.md deleted file mode 100644 index fd1e279ff2..0000000000 --- a/_developer-documentation/extensions.md +++ /dev/null @@ -1,46 +0,0 @@ ---- -layout: default -title: Extensions -nav_order: 10 ---- - -# Extensions - -Extensions is an experimental feature. Therefore, we do not recommend the use of extensions in a production environment. For updates on the progress of extensions, or if you want leave feedback that could help improve the feature, refer to the [issue on GitHub](https://github.com/opensearch-project/OpenSearch/issues/2447). -{: .warning} - -Until extensions were introduced, plugins were the only way to extend OpenSearch functionality. However, plugins have significant shortcomings: they require frequent updates to stay up to date with OpenSearch core, they pose a security risk because they run in the same process as OpenSearch, and updating or installing them requires a full cluster restart. Moreover, plugins can fatally impact the cluster in the event of failure. - -Extensions provide an easier, more secure way to customize OpenSearch. Extensions support all plugin functionality and let you build additional modular features for OpenSearch. The [OpenSearch SDK for Java](https://github.com/opensearch-project/opensearch-sdk-java/) provides the library of classes and interfaces that you can use to develop extensions. Extensions are decoupled from OpenSearch core and do not need frequent updates. Additionally, they can run in a separate process or on another node and can be installed while a cluster is running. - -## Getting started - -Use the following documentation to get started with extensions: - -### Step 1: Learn the basics - -Read the [design documentation](https://opensearch-project.github.io/opensearch-sdk-java/DESIGN.html) to learn about extension architecture and how extensions work. - -### Step 2: Try it out - -Try running the sample Hello World extension by following detailed steps in the [Getting started section of the Developer Guide](https://opensearch-project.github.io/opensearch-sdk-java/DEVELOPER_GUIDE.html#getting-started). - -### Step 3: Create your own extension - -Develop a custom create, read, update, delete (CRUD) extension by following the instructions in [this tutorial](https://opensearch-project.github.io/opensearch-sdk-java/CREATE_YOUR_FIRST_EXTENSION.html). - -### Step 4: Learn how to deploy your extension - -For instructions on building, testing, and running an extension, see the [Developing your own extension section of the Developer Guide](https://opensearch-project.github.io/opensearch-sdk-java/DEVELOPER_GUIDE.html#developing-your-own-extension). - - - -## Plugin migration - -The [Anomaly Detection plugin](https://github.com/opensearch-project/anomaly-detection) is now [implemented as an extension](https://github.com/opensearch-project/anomaly-detection/tree/feature/extensions). For details, see [this GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/3635). - -For tips on migrating an existing plugin to an extension, see the [plugin migration documentation](https://opensearch-project.github.io/opensearch-sdk-java/PLUGIN_MIGRATION.html). \ No newline at end of file diff --git a/_developer-documentation/index.md b/_developer-documentation/index.md deleted file mode 100644 index 5c43dc9a30..0000000000 --- a/_developer-documentation/index.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -layout: default -title: Developer documentation -nav_order: 1 -has_children: false -has_toc: false -nav_exclude: true ---- - -# Developer documentation - -We welcome your contributions to the OpenSearch Project. Here are some helpful links to explore the OpenSearch repositories and learn how to contribute: - -- [OpenSearch Project GitHub repo](https://github.com/opensearch-project/) -- [Javadoc documentation](https://opensearch.org/javadocs/) -- [Getting started as an OpenSearch contributor](https://github.com/opensearch-project/.github/blob/main/ONBOARDING.md) -- [OpenSearch Dashboards Developer Guide](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/DEVELOPER_GUIDE.md) -- [OpenSearch release schedule and maintenance policy](https://opensearch.org/releases.html) -- [OpenSearch Project roadmap](https://github.com/orgs/opensearch-project/projects/1) -- [OpenSearch Community Forum](https://forum.opensearch.org/) - -## What's new - -New in version 2.9, OpenSearch introduces _extensions_---an easier-to-develop and more secure alternative to plugins---to simplify creating custom functionality for OpenSearch. To learn more about building extensions using _OpenSearch SDK for Java_, see [Extensions]({{site.url}}{{site.baseurl}}/developer-documentation/extensions/). diff --git a/_field-types/supported-field-types/flat-object.md b/_field-types/flat-object.md similarity index 84% rename from _field-types/supported-field-types/flat-object.md rename to _field-types/flat-object.md index adde1f3dc7..1f76f6d14b 100644 --- a/_field-types/supported-field-types/flat-object.md +++ b/_field-types/flat-object.md @@ -5,8 +5,6 @@ nav_order: 43 has_children: false parent: Object field types grand_parent: Supported field types -redirect_from: - - /field-types/flat-object/ --- # Flat object field type @@ -45,16 +43,16 @@ Flat objects do not support: The flat object field type supports the following queries: -- [Term]({{site.url}}{{site.baseurl}}/query-dsl/term/term/) -- [Terms]({{site.url}}{{site.baseurl}}/query-dsl/term/terms/) -- [Terms set]({{site.url}}{{site.baseurl}}/query-dsl/term/terms-set/) -- [Prefix]({{site.url}}{{site.baseurl}}/query-dsl/term/prefix/) -- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term/range/) +- [Term]({{site.url}}{{site.baseurl}}/query-dsl/term#term) +- [Terms]({{site.url}}{{site.baseurl}}/query-dsl/term#terms) +- [Terms set]({{site.url}}{{site.baseurl}}/query-dsl/term#terms-set) +- [Prefix]({{site.url}}{{site.baseurl}}/query-dsl/term#prefix) +- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range) - [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match) - [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match) -- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/) +- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#query-string) - [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string) -- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term/exists/) +- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists) ## Limitations @@ -69,6 +67,9 @@ This functionality is planned for a future release. The following example illustrates mapping a field as a flat object, indexing documents with flat object fields, and searching for leaf values of the flat object in those documents. +Only the root field of a document can be defined as a flat object. You cannot define an object that is part of another JSON object as a flat object because when a flat object is flattened to a string, the nested architecture of the leaves is lost. +{: .note} + First, create a mapping for your index, where `issue` is of type `flat_object`: ```json @@ -215,30 +216,3 @@ GET /test-index/_search } } ``` - -## Defining a subfield as a flat object - -You can define a subfield of a JSON object as a flat object. For example, use the following query to define the `issue.labels` as `flat_object`: - -```json -PUT /test-index/ -{ - "mappings": { - "properties": { - "issue": { - "properties": { - "number": { - "type": "double" - }, - "labels": { - "type": "flat_object" - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -Because `issue.number` is not part of the flat object, you can use it to aggregate and sort documents. \ No newline at end of file diff --git a/_field-types/supported-field-types/date-nanos.md b/_field-types/supported-field-types/date-nanos.md deleted file mode 100644 index 12399a69d4..0000000000 --- a/_field-types/supported-field-types/date-nanos.md +++ /dev/null @@ -1,290 +0,0 @@ ---- -layout: default -title: Date nanoseconds -nav_order: 35 -has_children: false -parent: Date field types -grand_parent: Supported field types ---- - -# Date nanoseconds field type - -The `date_nanos` field type is similar to the [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) field type in that it holds a date. However, `date` stores the date in millisecond resolution, while `date_nanos` stores the date in nanosecond resolution. Dates are stored as `long` values that correspond to nanoseconds since the epoch. Therefore, the range of supported dates is approximately 1970--2262. - -Queries on `date_nanos` fields are converted to range queries on the field value's `long` representation. Then the stored fields and aggregation results are converted to a string using the format set on the field. - -The `date_nanos` field supports all [formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#formats) and [parameters]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date#parameters) that `date` supports. You can use multiple formats separated by `||`. -{: .note} - -For `date_nanos` fields, you can use the `strict_date_optional_time_nanos` format to preserve nanosecond resolution. If you don't specify the format when mapping a field as `date_nanos`, the default format is `strict_date_optional_time||epoch_millis` that lets you pass values in either `strict_date_optional_time` or `epoch_millis` format. The `strict_date_optional_time` format supports dates in nanosecond resolution, but the `epoch_millis` format supports dates in millisecond resolution only. - -## Example - -Create a mapping with the `date` field of type `date_nanos` that has the `strict_date_optional_time_nanos` format: - -```json -PUT testindex/_mapping -{ - "properties": { - "date": { - "type": "date_nanos", - "format" : "strict_date_optional_time_nanos" - } - } -} -``` -{% include copy-curl.html %} - -Index two documents into the index: - -```json -PUT testindex/_doc/1 -{ "date": "2022-06-15T10:12:52.382719622Z" } -``` -{% include copy-curl.html %} - -```json -PUT testindex/_doc/2 -{ "date": "2022-06-15T10:12:52.382719624Z" } -``` -{% include copy-curl.html %} - -You can use a range query to search for a date range: - -```json -GET testindex/_search -{ - "query": { - "range": { - "date": { - "gte": "2022-06-15T10:12:52.382719621Z", - "lte": "2022-06-15T10:12:52.382719623Z" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains the document whose date is in the specified range: - -```json -{ - "took": 43, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - } - } - ] - } -} -``` - -When querying documents with `date_nanos` fields, you can use `fields` or `docvalue_fields`: - -```json -GET testindex/_search -{ - "fields": ["date"] -} -``` -{% include copy-curl.html %} - -```json -GET testindex/_search -{ - "docvalue_fields" : [ - { - "field" : "date" - } - ] -} -``` -{% include copy-curl.html %} - -The response to either of the preceding queries contains both indexed documents: - -```json -{ - "took": 4, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - }, - "fields": { - "date": [ - "2022-06-15T10:12:52.382719622Z" - ] - } - }, - { - "_index": "testindex", - "_id": "2", - "_score": 1, - "_source": { - "date": "2022-06-15T10:12:52.382719624Z" - }, - "fields": { - "date": [ - "2022-06-15T10:12:52.382719624Z" - ] - } - } - ] - } -} -``` - -You can sort on a `date_nanos` field as follows: - -```json -GET testindex/_search -{ - "sort": { - "date": "asc" - } -} -``` -{% include copy-curl.html %} - -The response contains the sorted documents: - -```json -{ - "took": 5, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": null, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": null, - "_source": { - "date": "2022-06-15T10:12:52.382719622Z" - }, - "sort": [ - 1655287972382719700 - ] - }, - { - "_index": "testindex", - "_id": "2", - "_score": null, - "_source": { - "date": "2022-06-15T10:12:52.382719624Z" - }, - "sort": [ - 1655287972382719700 - ] - } - ] - } -} -``` - -You can also use a Painless script to access the nanoseconds part of the field: - -```json -GET testindex/_search -{ - "script_fields" : { - "my_field" : { - "script" : { - "lang" : "painless", - "source" : "doc['date'].value.nano" - } - } - } -} -``` -{% include copy-curl.html %} - -The response contains only the nanosecond parts of the fields: - -```json -{ - "took": 4, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 2, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "testindex", - "_id": "1", - "_score": 1, - "fields": { - "my_field": [ - 382719622 - ] - } - }, - { - "_index": "testindex", - "_id": "2", - "_score": 1, - "fields": { - "my_field": [ - 382719624 - ] - } - } - ] - } -} -``` \ No newline at end of file diff --git a/_field-types/supported-field-types/date.md b/_field-types/supported-field-types/date.md index 09b1110707..ea09311718 100644 --- a/_field-types/supported-field-types/date.md +++ b/_field-types/supported-field-types/date.md @@ -3,8 +3,7 @@ layout: default title: Date nav_order: 25 has_children: false -parent: Date field types -grand_parent: Supported field types +parent: Supported field types redirect_from: - /opensearch/supported-field-types/date/ - /field-types/date/ @@ -221,7 +220,7 @@ GET testindex/_search ## Date math -The date field type supports using date math to specify durations in queries. For example, the `gt`, `gte`, `lt`, and `lte` parameters in [range queries]({{site.url}}{{site.baseurl}}/query-dsl/term/range/) and the `from` and `to` parameters in [date range aggregations]({{site.url}}{{site.baseurl}}/query-dsl/aggregations/bucket/date-range/) accept date math expressions. +The date field type supports using date math to specify durations in queries. For example, the `gt`, `gte`, `lt`, and `lte` parameters in [range queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#range) and the `from` and `to` parameters in [date range aggregations]({{site.url}}{{site.baseurl}}/opensearch/bucket-agg/#range-date_range-ip_range) accept date math expressions. A date math expression contains a fixed date, optionally followed by one or more mathematical expressions. The fixed date may be either `now` (current date and time in milliseconds since the epoch) or a string ending with `||` that specifies a date (for example, `2022-05-18||`). The date must be in the `strict_date_optional_time||epoch_millis` format. @@ -256,7 +255,7 @@ The following example expressions illustrate using date math: ### Using date math in a range query -The following example illustrates using date math in a [range query]({{site.url}}{{site.baseurl}}/query-dsl/term/range/). +The following example illustrates using date math in a [range query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#range). Set up an index with `release_date` mapped as `date`: diff --git a/_field-types/supported-field-types/dates.md b/_field-types/supported-field-types/dates.md deleted file mode 100644 index 7c6e47cb60..0000000000 --- a/_field-types/supported-field-types/dates.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -layout: default -title: Date field types -nav_order: 25 -has_children: true -has_toc: false -parent: Supported field types ---- - -# Date field types - -Date field types contain a date value that can be formatted using different date formats. The following table lists all date field types that OpenSearch supports. - -Field data type | Description -:--- | :--- -[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date stored in millisecond resolution. -[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/) | A date stored in nanosecond resolution. diff --git a/_field-types/supported-field-types/geographic.md b/_field-types/supported-field-types/geographic.md index cbe3982a4d..07d0382082 100644 --- a/_field-types/supported-field-types/geographic.md +++ b/_field-types/supported-field-types/geographic.md @@ -12,7 +12,7 @@ redirect_from: # Geographic field types -Geographic fields contain values that represent points or shapes on a map. The following table lists all geographic field types that OpenSearch supports. +The following table lists all geographic field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index 8d0b29afa1..230f635124 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -13,22 +13,21 @@ redirect_from: You can specify data types for your fields when creating a mapping. The following table lists all data field types that OpenSearch supports. -Category | Field types and descriptions -:--- | :--- -Alias | [`alias`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/alias/): An additional name for an existing field. -Binary | [`binary`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/binary/): A binary value in Base64 encoding. -[Numeric]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) | A numeric value (`byte`, `double`, `float`, `half_float`, `integer`, `long`, [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/), `scaled_float`, `short`). -Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/): A Boolean value. -[Date]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/dates/)| [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/): A date stored in milliseconds.
[`date_nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/): A date stored in nanoseconds. -IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. -[Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). -[Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. -[Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. -[Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. -[Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). -[k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) | Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search. -Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. +Field data type | Description +:--- | :--- +[`alias`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/alias/) | An additional name for an existing field. +[`binary`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary/) | A binary value in Base64 encoding. +[Numeric]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) | `byte`, `double`, `float`, `half_float`, `integer`, `long`, `scaled_float`, `short`. +[`boolean`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/boolean/) | A Boolean value. +[`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) | A date value as a formatted string, a long value, or an integer. +[`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) | An IP address in IPv4 or IPv6 format. +[Range]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) | `integer_range`, `long_range`,`double_range`, `float_range`, `date_range`,`ip_range`. +[Object]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object/) | `object`, `nested`, `join`. +String | [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/), [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/), [`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/). +[Autocomplete]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/autocomplete/) | `completion`, `search_as_you_type`. +[Geographic]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geographic/) | `geo_point`, `geo_shape`. +[Rank]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/rank/) | `rank_feature`, `rank_features`. +[`percolator`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/percolator/) | Specifies to treat this field as a query. ## Arrays diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md deleted file mode 100644 index e6bc4b1f64..0000000000 --- a/_field-types/supported-field-types/knn-vector.md +++ /dev/null @@ -1,270 +0,0 @@ ---- -layout: default -title: k-NN vector -nav_order: 58 -has_children: false -parent: Supported field types -has_math: true ---- - -# k-NN vector - -The [k-NN plugin]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/) introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors. -into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. In general, a `knn_vector` field can be built either by providing a method definition or specifying a model id. - -## Example - -For example, to map `my_vector1` as a `knn_vector`, use the following request: - -```json -PUT test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "method": { - "name": "hnsw", - "space_type": "l2", - "engine": "lucene", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -## Method definitions - -[Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) are used when the underlying [approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. - -```json -"my_vector": { - "type": "knn_vector", - "dimension": 4, - "method": { - "name": "hnsw", - "space_type": "l2", - "engine": "nmslib", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } -} -``` - -## Model IDs - -Model IDs are used when the underlying Approximate k-NN algorithm requires a training step. As a prerequisite, the -model has to be created with the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model). The -model contains the information needed to initialize the native library segment files. - -```json - "type": "knn_vector", - "model_id": "my-model" -} -``` - -However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension. - ```json - "type": "knn_vector", - "dimension": 128 - } - ``` - -## Lucene byte vector - -By default, k-NN vectors are `float` vectors, where each dimension is 4 bytes. If you want to save storage space, you can use `byte` vectors with the `lucene` engine. In a `byte` vector, each dimension is a signed 8-bit integer in the [-128, 127] range. - -Byte vectors are supported only for the `lucene` engine. They are not supported for the `nmslib` and `faiss` engines. -{: .note} - -In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). - -When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. -{: .important} - -Introduced in k-NN plugin version 2.9, the optional `data_type` parameter defines the data type of a vector. The default value of this parameter is `float`. - -To use a `byte` vector, set the `data_type` parameter to `byte` when creating mappings for an index: - - ```json -PUT test-index -{ - "settings": { - "index": { - "knn": true, - "knn.algo_param.ef_search": 100 - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 3, - "data_type": "byte", - "method": { - "name": "hnsw", - "space_type": "l2", - "engine": "lucene", - "parameters": { - "ef_construction": 128, - "m": 24 - } - } - } - } - } -} -``` -{% include copy-curl.html %} - -Then ingest documents as usual. Make sure each dimension in the vector is in the supported [-128, 127] range: - -```json -PUT test-index/_doc/1 -{ - "my_vector1": [-126, 28, 127] -} -``` -{% include copy-curl.html %} - -```json -PUT test-index/_doc/2 -{ - "my_vector1": [100, -128, 0] -} -``` -{% include copy-curl.html %} - -When querying, be sure to use a `byte` vector: - -```json -GET test-index/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector1": { - "vector": [26, -120, 99], - "k": 2 - } - } - } -} -``` -{% include copy-curl.html %} - -### Quantization techniques - -If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. - -#### Scalar quantization for the L2 space type - -The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on Euclidean datasets with the L2 space type. Euclidean distance is shift invariant. If you shift both $$x$$ and $$y$$ by the same $$z$$, then the distance remains the same ($$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$). - -```python -# Random dataset (Example to create a random dataset) -dataset = np.random.uniform(-300, 300, (100, 10)) -# Random query set (Example to create a random queryset) -queryset = np.random.uniform(-350, 350, (100, 10)) -# Number of values -B = 256 - -# INDEXING: -# Get min and max -dataset_min = np.min(dataset) -dataset_max = np.max(dataset) -# Shift coordinates to be non-negative -dataset -= dataset_min -# Normalize into [0, 1] -dataset *= 1. / (dataset_max - dataset_min) -# Bucket into 256 values -dataset = np.floor(dataset * (B - 1)) - int(B / 2) - -# QUERYING: -# Clip (if queryset range is out of datset range) -queryset = queryset.clip(dataset_min, dataset_max) -# Shift coordinates to be non-negative -queryset -= dataset_min -# Normalize -queryset *= 1. / (dataset_max - dataset_min) -# Bucket into 256 values -queryset = np.floor(queryset * (B - 1)) - int(B / 2) -``` -{% include copy.html %} - -#### Scalar quantization for the cosine similarity space type - -The following example pseudocode illustrates the scalar quantization technique used for the benchmarking tests on angular datasets with the cosine similarity space type. Cosine similarity is not shift invariant ($$cos(x, y) \neq cos(x-z, y-z)$$). - -The following pseudocode is for positive numbers: - -```python -# For Positive Numbers - -# INDEXING and QUERYING: - -# Get Max of train dataset -max = np.max(dataset) -min = 0 -B = 127 - -# Normalize into [0,1] -val = (val - min) / (max - min) -val = (val * B) - -# Get int and fraction values -int_part = floor(val) -frac_part = val - int_part - -if 0.5 < frac_part: - bval = int_part + 1 -else: - bval = int_part - -return Byte(bval) -``` -{% include copy.html %} - -The following pseudocode is for negative numbers: - -```python -# For Negative Numbers - -# INDEXING and QUERYING: - -# Get Min of train dataset -min = 0 -max = -np.min(dataset) -B = 128 - -# Normalize into [0,1] -val = (val - min) / (max - min) -val = (val * B) - -# Get int and fraction values -int_part = floor(var) -frac_part = val - int_part - -if 0.5 < frac_part: - bval = int_part + 1 -else: - bval = int_part - -return Byte(bval) -``` -{% include copy.html %} \ No newline at end of file diff --git a/_field-types/supported-field-types/nested.md b/_field-types/supported-field-types/nested.md index d61ccd53df..d09caf0ea8 100644 --- a/_field-types/supported-field-types/nested.md +++ b/_field-types/supported-field-types/nested.md @@ -37,7 +37,7 @@ When these objects are stored, they are flattened, so their internal representat { "patients.name" : ["John Doe", "Mary Major"], "patients.age" : [56, 85], - "patients.smoker" : [true, false] + "smoker" : [true, false] } ``` @@ -172,118 +172,11 @@ PUT testindex1/_doc/100 ``` {% include copy-curl.html %} -You can use the following nested query to search for patients older than 75 OR smokers: +Now if you run the same query to search for patients older than 75 AND smokers, nothing is returned, which is correct. ```json -GET testindex1/_search -{ - "query": { - "nested": { - "path": "patients", - "query": { - "bool": { - "should": [ - { - "term": { - "patients.smoker": true - } - }, - { - "range": { - "patients.age": { - "gte": 75 - } - } - } - ] - } - } - } - } -} -``` -{% include copy-curl.html %} - -The query correctly returns both patients: - -```json -{ - "took" : 7, - "timed_out" : false, - "_shards" : { - "total" : 1, - "successful" : 1, - "skipped" : 0, - "failed" : 0 - }, - "hits" : { - "total" : { - "value" : 1, - "relation" : "eq" - }, - "max_score" : 0.8465736, - "hits" : [ - { - "_index" : "testindex1", - "_id" : "100", - "_score" : 0.8465736, - "_source" : { - "patients" : [ - { - "name" : "John Doe", - "age" : 56, - "smoker" : true - }, - { - "name" : "Mary Major", - "age" : 85, - "smoker" : false - } - ] - } - } - ] - } -} -``` - -You can use the following nested query to search for patients older than 75 AND smokers: - -```json -GET testindex1/_search { - "query": { - "nested": { - "path": "patients", - "query": { - "bool": { - "must": [ - { - "term": { - "patients.smoker": true - } - }, - { - "range": { - "patients.age": { - "gte": 75 - } - } - } - ] - } - } - } - } -} -``` -{% include copy-curl.html %} - -The previous query returns no results, as expected: - -```json -{ - "took" : 7, + "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, @@ -310,5 +203,5 @@ Parameter | Description :--- | :--- [`dynamic`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/object#the-dynamic-parameter) | Specifies whether new fields can be dynamically added to this object. Valid values are `true`, `false`, and `strict`. Default is `true`. `include_in_parent` | A Boolean value that specifies whether all fields in the child nested object should also be added to the parent document in flattened form. Default is `false`. -`include_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. +`incude_in_root` | A Boolean value that specifies whether all fields in the child nested object should also be added to the root document in flattened form. Default is `false`. `properties` | Fields of this object, which can be of any supported type. New properties can be dynamically added to this object if `dynamic` is set to `true`. diff --git a/_field-types/supported-field-types/numeric.md b/_field-types/supported-field-types/numeric.md index b887b8be2d..d31cad8fc0 100644 --- a/_field-types/supported-field-types/numeric.md +++ b/_field-types/supported-field-types/numeric.md @@ -3,7 +3,7 @@ layout: default title: Numeric field types parent: Supported field types nav_order: 15 -has_children: true +has_children: false redirect_from: - /opensearch/supported-field-types/numeric/ - /field-types/numeric/ @@ -15,14 +15,15 @@ The following table lists all numeric field types that OpenSearch supports. Field data type | Description :--- | :--- -`byte` | A signed 8-bit integer. Minimum is −128. Maximum is 127. -`double` | A double-precision 64-bit IEEE 754 floating-point value. Minimum magnitude is 2−1074 . Maximum magnitude is (2 − 2−52) · 21023. The number of significant bits is 53. The number of significant digits is 15.95. -`float` | A single-precision 32-bit IEEE 754 floating-point value. Minimum magnitude is 2−149 . Maximum magnitude is (2 − 2−23) · 2127. The number of significant bits is 24. The number of significant digits is 7.22. -`half_float` | A half-precision 16-bit IEEE 754 floating-point value. Minimum magnitude is 2−24 . Maximum magnitude is 65504. The number of significant bits is 11. The number of significant digits is 3.31. -`integer` | A signed 32-bit integer. Minimum is −231. Maximum is 231 − 1. -`long` | A signed 64-bit integer. Minimum is −263. Maximum is 263 − 1. -[`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/) | An unsigned 64-bit integer. Minimum is 0. Maximum is 264 − 1. -`short` | A signed 16-bit integer. Minimum is −215. Maximum is 215 − 1. +`byte` | A signed 8-bit integer. Minimum is -128. Maximum is 127. +`double` | A double-precision 64-bit IEEE 754 floating-point value. Minimum magnitude is 2-1074 . Maximum magnitude is (2 − 2-52) · 21023. The number of significant bits is 53. The number of significant digits is 15.95. +`float` | A single-precision 32-bit IEEE 754 floating-point value. Minimum magnitude is 2-149 . Maximum magnitude is (2 − 2-23) · 2127. The number of significant bits is 24. The number of significant digits is 7.22. +`half_float` | A half-precision 16-bit IEEE 754 floating-point value. Minimum magnitude is 2-24 . Maximum magnitude is 65504. The number of significant bits is 11. The number of significant digits is 3.31. +`integer` | A signed 32-bit integer. Minimum is -231. Maximum is 231 − 1. +`long` | A signed 64-bit integer. Minimum is -263. Maximum is 263 − 1. +`short` | A signed 16-bit integer. Minimum is -215. Maximum is 215 − 1. + +:--- | :--- [`scaled_float`](#scaled-float-field-type) | A floating-point value that is multiplied by the double scale factor and stored as a long value. Integer, long, float, and double field types have corresponding [range field types]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/). diff --git a/_field-types/supported-field-types/object-fields.md b/_field-types/supported-field-types/object-fields.md index 429c5b94c7..64869fc34d 100644 --- a/_field-types/supported-field-types/object-fields.md +++ b/_field-types/supported-field-types/object-fields.md @@ -12,7 +12,7 @@ redirect_from: # Object field types -Object field types contain values that are objects or relations. The following table lists all object field types that OpenSearch supports. +The following table lists all object field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/range.md b/_field-types/supported-field-types/range.md index 22ae1d619e..af770678ff 100644 --- a/_field-types/supported-field-types/range.md +++ b/_field-types/supported-field-types/range.md @@ -61,45 +61,6 @@ PUT testindex/_doc/1 ``` {% include copy-curl.html %} -## IP address ranges - -You can specify IP address ranges in two formats: as a range and in [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation). - -Create a mapping with an IP address range: - -```json -PUT testindex -{ - "mappings" : { - "properties" : { - "ip_address_range" : { - "type" : "ip_range" - }, - "ip_address_cidr" : { - "type" : "ip_range" - } - } - } -} -``` -{% include copy-curl.html %} - -Index a document with IP address ranges in both formats: - -```json -PUT testindex/_doc/2 -{ - "ip_address_range" : { - "gte" : "10.24.34.0", - "lte" : "10.24.35.255" - }, - "ip_address_cidr" : "10.24.34.0/24" -} -``` -{% include copy-curl.html %} - -## Querying range fields - You can use a [Term query](#term-query) or a [Range query](#range-query) to search for values within range fields. ### Term query @@ -124,7 +85,17 @@ GET testindex/_search ### Range query -A range query on a range field returns documents within that range. +A range query on a range field returns documents within that range. Along with the field to be matched, you can further specify a date format or relational operators with the following optional parameters: + +Parameter | Description +:--- | :--- +format | A [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#formats) for dates in this query. Default is the field's mapped format. +relation | Provides a relation between the query's date range and the document's date range. There are three types of relations that you can specify:
1. `intersects` matches documents for which there are dates that belong to both the query's date range and document's date range. This is the default.
2. `contains` matches documents for which the query's date range is a subset of the document's date range.
3. `within` matches documents for which the document's date range is a subset of the query's date range. + +To use a date format other than the field's mapped format in a query, specify it in the `format` field. + +For a full description of range query usage, including all range query parameters, see [Range query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#range). +{: .tip } Query for all graduation dates in 2019, providing the date range in a "MM/dd/yyyy" format: @@ -145,7 +116,44 @@ GET testindex1/_search ``` {% include copy-curl.html %} -The preceding query will return document 1 for the `within` and `intersects` relations but will not return it for the `contains` relation. For more information about relation types, see [range query parameters]({{site.url}}{{site.baseurl}}/query-dsl/term/range#parameters). +The above query will return document 1 for the `within` and `intersects` relations but will not return it for the `contains` relation. + +### IP address ranges + +You can specify IP address ranges in two formats: as a range and in [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation). + +Create a mapping with an IP address range: + +```json +PUT testindex +{ + "mappings" : { + "properties" : { + "ip_address_range" : { + "type" : "ip_range" + }, + "ip_address_cidr" : { + "type" : "ip_range" + } + } + } +} +``` +{% include copy-curl.html %} + +Index a document with IP address ranges in both formats: + +```json +PUT testindex/_doc/2 +{ + "ip_address_range" : { + "gte" : "10.24.34.0", + "lte" : "10.24.35.255" + }, + "ip_address_cidr" : "10.24.34.0/24" +} +``` +{% include copy-curl.html %} ## Parameters diff --git a/_field-types/supported-field-types/rank.md b/_field-types/supported-field-types/rank.md index a4ec0fac4c..c46467f8a5 100644 --- a/_field-types/supported-field-types/rank.md +++ b/_field-types/supported-field-types/rank.md @@ -23,7 +23,7 @@ Rank feature and rank features fields can be queried with [rank feature queries] ## Rank feature -A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. +A rank feature field type uses a positive [float]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/numeric/) value to boost or decrease the relevance score of a document in a `rank_feature` query. By default, this value boosts the relevance score. To decrease the relevance score, set the optional `positive_score_impact` parameter to false. ### Example diff --git a/_field-types/supported-field-types/string.md b/_field-types/supported-field-types/string.md index f24dea2325..21cee52dad 100644 --- a/_field-types/supported-field-types/string.md +++ b/_field-types/supported-field-types/string.md @@ -12,7 +12,7 @@ redirect_from: # String field types -String field types contain text values or values derived from text. The following table lists all string field types that OpenSearch supports. +The following table lists all string field types that OpenSearch supports. Field data type | Description :--- | :--- diff --git a/_field-types/supported-field-types/unsigned-long.md b/_field-types/supported-field-types/unsigned-long.md deleted file mode 100644 index dde8d25dee..0000000000 --- a/_field-types/supported-field-types/unsigned-long.md +++ /dev/null @@ -1,164 +0,0 @@ ---- -layout: default -title: Unsigned long -parent: Numeric field types -grand_parent: Supported field types -nav_order: 15 -has_children: false ---- - -# Unsigned long field type - -The `unsigned_long` field type is a numeric field type that represents an unsigned 64-bit integer with a minimum value of 0 and a maximum value of 264 − 1. In the following example, `counter` is mapped as an `unsigned_long` field: - - -```json -PUT testindex -{ - "mappings" : { - "properties" : { - "counter" : { - "type" : "unsigned_long" - } - } - } -} -``` -{% include copy-curl.html %} - -## Indexing - -To index a document with an `unsigned_long` value, use the following request: - -```json -PUT testindex/_doc/1 -{ - "counter" : 10223372036854775807 -} -``` -{% include copy-curl.html %} - -Alternatively, you can use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/) as follows: - -```json -POST _bulk -{ "index": { "_index": "testindex", "_id": "1" } } -{ "counter": 10223372036854775807 } -``` -{% include copy-curl.html %} - -If a field of type `unsigned_long` has the `store` parameter set to `true` (that is, the field is a stored field), it will be stored and returned as a string. `unsigned_long` values do not support the decimal part, so, if supplied, the decimal part is truncated. -{: .note} - -## Querying - -`unsigned_long` fields support most of the queries that other numeric types support. For example, you can use a term query on `unsigned_long` fields: - -```json -POST _search -{ - "query": { - "term": { - "counter": { - "value": 10223372036854775807 - } - } - } -} -``` -{% include copy-curl.html %} - -You can also use a range query: - -```json -POST _search -{ - "query": { - "range": { - "counter": { - "gte": 10223372036854775807 - } - } - } -} -``` -{% include copy-curl.html %} - -## Sorting - -You can use `sort` values with `unsigned_long` fields to order the search results, for example: - -```json -POST _search -{ - "sort" : [ - { - "counter" : { - "order" : "asc" - } - } - ], - "query": { - "range": { - "counter": { - "gte": 10223372036854775807 - } - } - } -} -``` -{% include copy-curl.html %} - - -An `unsigned_long` field cannot be used as an index sort field (in the `sort.field` index setting). -{: .warning} - -## Aggregations - -Like other numeric fields, `unsigned_long` fields support aggregations. For `terms` and `multi_terms` aggregations, `unsigned_long` values are used as is, but for other aggregation types, the values are converted to the `double` type (with possible loss of precision). The following is an example of the `terms` aggregation: - -```json -POST _search -{ - "query": { - "match_all": {} - }, - "aggs": { - "counters": { - "terms": { - "field": "counter" - } - } - } -} -``` -{% include copy-curl.html %} - -## Scripting - -In scripts, `unsigned_long` fields are returned as instances of the `BigInteger` class: - -```json -POST _search -{ - "query": { - "bool": { - "filter": { - "script": { - "script": "BigInteger amount = doc['counter'].value; return amount.compareTo(BigInteger.ZERO) > 0;" - } - } - } - } -} -``` -{% include copy-curl.html %} - - -## Limitations - -Note the following limitations of the `unsigned_long` field type: - -- When aggregations are performed across different numeric types and one of the types is `unsigned_long`, the values are converted to the `double` type and `double` arithmetic is used, with high likelihood of precision loss. - -- An `unsigned_long` field cannot be used as an index sort field (in the `sort.field` index setting). This limitation also applies when a search is performed on multiple indexes and the results are sorted by the field that has the `unsigned_long` type in at least one of the indexes but a different numeric type or types in others. \ No newline at end of file diff --git a/_im-plugin/index-codecs.md b/_im-plugin/index-codecs.md deleted file mode 100644 index eaf1b80e75..0000000000 --- a/_im-plugin/index-codecs.md +++ /dev/null @@ -1,82 +0,0 @@ ---- -layout: default -title: Index codecs -nav_order: 3 -parent: Index settings ---- - -# Index codecs - -Index codecs determine how the index’s stored fields are compressed and stored on disk. The index codec is controlled by the static `index.codec` setting that specifies the compression algorithm. The setting impacts the index shard size and index operation performance. - -## Supported codecs - -OpenSearch provides support for four codecs that can be used for compressing the stored fields. Each codec offers different tradeoffs between compression ratio (storage size) and indexing performance (speed): - -* `default` -- This codec employs the [LZ4 algorithm](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) with a preset dictionary, which prioritizes performance over compression ratio. It offers faster indexing and search operations when compared with `best_compression` but may result in larger index/shard sizes. If no codec is provided in the index settings, then LZ4 is used as the default algorithm for compression. -* `best_compression` -- This codec uses [zlib](https://en.wikipedia.org/wiki/Zlib) as an underlying algorithm for compression. It achieves high compression ratios that result in smaller index sizes. However, this may incur additional CPU usage during index operations and may subsequently result in high indexing and search latencies. -* `zstd` (OpenSearch 2.9 and later) -- This codec uses the [Zstandard compression algorithm](https://github.com/facebook/zstd), which provides a good balance between compression ratio and speed. It provides significant compression comparable to the `best_compression` codec with reasonable CPU usage and improved indexing and search performance compared to the `default` codec. -* `zstd_no_dict` (OpenSearch 2.9 and later) -- This codec is similar to `zstd` but excludes the dictionary compression feature. It provides faster indexing and search operations compared to `zstd` at the expense of a slightly larger index size. - -For the `zstd` and `zstd_no_dict` codecs, you can optionally specify a compression level in the `index.codec.compression_level` setting. This setting takes integers in the [1, 6] range. A higher compression level results in a higher compression ratio (smaller storage size) with a tradeoff in speed (slower compression and decompression speeds lead to greater indexing and search latencies). - -When an index segment is created, it uses the current index codec for compression. If you update the index codec, any segment created after the update will use the new compression algorithm. For specific operation considerations, see [Index codec considerations for index operations](#index-codec-considerations-for-index-operations). -{: .note} - -## Choosing a codec - -The choice of index codec impacts the amount of disk space required to store the index data. Codecs like `best_compression`, `zstd`, and `zstd_no_dict` can achieve higher compression ratios, resulting in smaller index sizes. Conversely, the `default` codec doesn’t prioritize compression ratio, resulting in larger index sizes but with faster search operations than `best_compression`. The `zstd` and `zstd_no_dict` codecs ensure better search performance than the other two codecs. - -## Index codec considerations for index operations - -The following index codec considerations apply to various index operations. - -### Writes - -Every index consists of shards, each of which is further divided into Lucene segments. During index writes, the new segments are created based on the codec specified in the index settings. If you update the codec for an index, the new segments will use the new codec algorithm. - -### Merges - -During segment merges, OpenSearch combines smaller index segments into larger segments in order to provide optimal resource utilization and improve performance. The index codec setting influences the speed and efficiency of the merge operations. The number of merges that happen on an index is a factor of the segment size, and a smaller segment size directly translates into smaller merge sizes. If you update the `index.codec` setting, the new merge operations will use the new codec when creating merged segments. The merged segments will have the compression characteristics of the new codec. - -### Splits and shrinks - -The [Split API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/split/) splits an original index into a new index where each original primary shard is divided into two or more primary shards. The [Shrink API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/shrink-index/) shrinks an existing index to a new index with a smaller number of primary shards. As part of split or shrink operations, any newly created segments will use the latest codec settings. - -### Snapshots - -When creating a [snapshot]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/index/), the index codec setting influences the size of the snapshot and the time required for its creation. If the codec of an index is updated, newly created snapshots will use the latest codec setting. The resulting snapshot size will reflect the compression characteristics of the latest codec setting. Existing segments included in the snapshot will retain their original compression characteristics. - -When you restore the indexes from a snapshot of a cluster to another cluster, it is important to verify that the target cluster supports the codecs of the segments in the source snapshot. For example, if the source snapshot contains segments of the `zstd` or `zstd_no_dict` codecs (introduced in OpenSearch 2.9), you won't be able to restore the snapshot to a cluster that runs on an older OpenSearch version because it doesn't support these codecs. - -### Reindexing - -When you are performing a [reindex]({{site.url}}{{site.baseurl}}/im-plugin/reindex-data/) operation from a source index, the new segments created in the target index will have the properties of the codec settings of the target index. - -### Index rollups and transforms - -When an index [rollup]({{site.url}}{{site.baseurl}}/im-plugin/index-rollups/) or [transform]({{site.url}}{{site.baseurl}}/im-plugin/index-transforms/) job is completed, the segments created in the target index will have the properties of the index codec specified during target index creation, irrespective of the source index codec. If the target index is created dynamically through a rollup job, the default codec is used for segments of the target index. - -Changing the index codec setting does not affect the size of existing segments. Only new segments created after the update will reflect the new codec setting. To ensure consistent segment sizes and compression ratios, it may be necessary to perform a reindexing or other indexing operation, such as a merge. -{: .important} - -## Performance tuning and benchmarking - -Depending on your specific use case, you might need to experiment with different index codec settings to fine-tune the performance of your OpenSearch cluster. Conducting benchmark tests with different codecs and measuring the impact on indexing speed, search performance, and resource utilization can help you identify the optimal index codec setting for your workload. With the `zstd` and `zstd_no_dict` codecs, you can also fine-tune the compression level in order to identify the optimal configuration for your cluster. - -### Benchmarking - -The following table provides a performance comparison of the `best_compression`, `zstd`, and `zstd_no_dict` codecs against the `default` codec. The tests were performed with the [`nyc_taxi`](https://github.com/topics/nyc-taxi-dataset) dataset. The results are listed in terms of percent change, and bold results indicate performance improvement. - -| | `best_compression` | `zstd` | `zstd_no_dict` | -|:--- |:--- |:--- |:--- | -|**Write** | | | -|Median Latency |0% |0% |−1% | -|p90 Latency |3% |2% |**−5%** | -|Throughput |−2% |**7%** |**14%** | -|**Read** | | | -|Median Latency |0% |1% |0% | -|p90 Latency |1% |1% |**−2%** | -|**Disk** | | | -| Compression ratio |**−34%** |**−35%** |**−30%** | - diff --git a/_im-plugin/index-rollups/index.md b/_im-plugin/index-rollups/index.md index 59cd304dde..e2cac72911 100644 --- a/_im-plugin/index-rollups/index.md +++ b/_im-plugin/index-rollups/index.md @@ -343,7 +343,7 @@ POST example_rollup/_search } ``` -#### Example response +#### Sample Response ```json { @@ -829,8 +829,4 @@ The response contains two buckets, "Error" and "Success", and the document count } } } -``` - -## Index codec considerations - -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#index-rollups-and-transforms). \ No newline at end of file +``` \ No newline at end of file diff --git a/_im-plugin/index-settings.md b/_im-plugin/index-settings.md deleted file mode 100644 index d105d89b88..0000000000 --- a/_im-plugin/index-settings.md +++ /dev/null @@ -1,122 +0,0 @@ ---- -layout: default -title: Index settings -nav_order: 3 -has_children: true ---- - -# Index settings - -You can specify index settings at index creation. There are two types of index settings: - -- [Static index settings](#static-index-settings) are settings that you cannot update while the index is open. To update a static setting, you must close the index, update the setting, and then reopen the index. -- [Dynamic index settings](#dynamic-index-settings) are settings that you can update at any time. - -## Specifying a setting when creating an index - -When creating an index, you can specify its static or dynamic settings as follows: - -```json -PUT /testindex -{ - "settings": { - "index.number_of_shards": 1, - "index.number_of_replicas": 2 - } -} -``` -{% include copy-curl.html %} - -## Static index settings - -The following table lists all available static index settings. - -Setting | Description -:--- | :--- -index.number_of_shards | The number of primary shards in the index. Default is 1. -index.number_of_routing_shards | The number of routing shards used to split an index. -index.shard.check_on_startup | Whether the index's shards should be checked for corruption. Available options are `false` (do not check for corruption), `checksum` (check for physical corruption), and `true` (check for both physical and logical corruption). Default is `false`. -index.codec | Determines how the index’s stored fields are compressed and stored on disk. This setting impacts the size of the index shards and the performance of the index operations. Valid values are:
- `default`
- `best_compression`
- `zstd` (OpenSearch 2.9 and later)
- `zstd_no_dict`(OpenSearch 2.9 and later).
For `zstd` and `zstd_no_dict`, you can specify the compression level in the `index.codec.compression_level` setting. For more information, see [Index codec settings]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/). Optional. Default is `default`. -index.codec.compression_level | The compression level setting provides a tradeoff between compression ratio and speed. A higher compression level results in a higher compression ratio (smaller storage size) with a tradeoff in speed (slower compression and decompression speeds lead to greater indexing and search latencies). Can only be specified if `index.codec` is set to `zstd` and `zstd_no_dict` compression levels in OpenSearch 2.9 and later. Valid values are integers in the [1, 6] range. For more information, see [Index codec settings]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/). Optional. Default is 3. -index.routing_partition_size | The number of shards a custom routing value can go to. Routing helps an imbalanced cluster by relocating values to a subset of shards rather than a single shard. To enable routing, set this value to greater than 1 but less than `index.number_of_shards`. Default is 1. -index.soft_deletes.retention_lease.period | The maximum amount of time to retain a shard's history of operations. Default is `12h`. -index.load_fixed_bitset_filters_eagerly | Whether OpenSearch should preload cached filters. Available options are `true` and `false`. Default is `true`. -index.hidden | Whether the index should be hidden. Hidden indexes are not returned as part of queries that have wildcards. Available options are `true` and `false`. Default is `false`. - -## Updating a static index setting - -You can update a static index setting only on a closed index. The following example demonstrates updating the index codec setting. - -First, close an index: - -```json -POST /testindex/_close -``` -{% include copy-curl.html %} - -Then update the settings by sending a request to the `_settings` endpoint: - -```json -PUT /testindex/_settings -{ - "index": { - "codec": "zstd_no_dict", - "codec.compression_level": 3 - } -} -``` -{% include copy-curl.html %} - -Last, reopen the index to enable read and write operations: - -```json -POST /testindex/_open -``` -{% include copy-curl.html %} - -For more information about updating settings, including supported query parameters, see [Update settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/update-settings/). - -## Dynamic index settings - -The following table lists all available dynamic index settings. - -Setting | Description -:--- | :--- -index.number_of_replicas | The number of replica shards each primary shard should have. For example, if you have 4 primary shards and set `index.number_of_replicas` to 3, the index has 12 replica shards. Default is 1. -index.auto_expand_replicas | Whether the cluster should automatically add replica shards based on the number of data nodes. Specify a lower bound and upper limit (for example, 0--9) or `all` for the upper limit. For example, if you have 5 data nodes and set `index.auto_expand_replicas` to 0--3, then the cluster does not automatically add another replica shard. However, if you set this value to `0-all` and add 2 more nodes for a total of 7, the cluster will expand to now have 6 replica shards. Default is disabled. -index.search.idle.after | The amount of time a shard should wait for a search or get request until it goes idle. Default is `30s`. -index.refresh_interval | How often the index should refresh, which publishes its most recent changes and makes them available for searching. Can be set to `-1` to disable refreshing. Default is `1s`. -index.max_result_window | The maximum value of `from` + `size` for searches of the index. `from` is the starting index to search from, and `size` is the number of results to return. Default is 10000. -index.max_inner_result_window | The maximum value of `from` + `size` that specifies the number of returned nested search hits and most relevant document aggregated during the query. `from` is the starting index to search from, and `size` is the number of top hits to return. Default is 100. -index.max_rescore_window | The maximum value of `window_size` for rescore requests to the index. Rescore requests reorder the index's documents and return a new score, which can be more precise. Default is the same as `index.max_inner_result_window` or 10000 by default. -index.max_docvalue_fields_search | The maximum number of `docvalue_fields` allowed in a query. Default is 100. -index.max_script_fields | The maximum number of `script_fields` allowed in a query. Default is 32. -index.max_ngram_diff | The maximum difference between `min_gram` and `max_gram` values for the NGramTokenizer and NGramTokenFilter. Default is 1. -index.max_shingle_diff | The maximum difference between `max_shingle_size` and `min_shingle_size` to feed into the `shingle` token filter. Default is 3. -index.max_refresh_listeners | The maximum number of refresh listeners each shard is allowed to have. -index.analyze.max_token_count | The maximum number of tokens that can be returned from the `_analyze` API operation. Default is 10000. -index.highlight.max_analyzed_offset | The number of characters a highlight request can analyze. Default is 1000000. -index.max_terms_count | The maximum number of terms a terms query can accept. Default is 65536. -index.max_regex_length | The maximum character length of regex that can be in a regexp query. Default is 1000. -index.query.default_field | A field or list of fields that OpenSearch uses in queries in case a field isn't specified in the parameters. -index.routing.allocation.enable | Specifies options for the index’s shard allocation. Available options are `all` (allow allocation for all shards), `primaries` (allow allocation only for primary shards), `new_primaries` (allow allocation only for new primary shards), and `none` (do not allow allocation). Default is `all`. -index.routing.rebalance.enable | Enables shard rebalancing for the index. Available options are `all` (allow rebalancing for all shards), `primaries` (allow rebalancing only for primary shards), `replicas` (allow rebalancing only for replicas), and `none` (do not allow rebalancing). Default is `all`. -index.gc_deletes | The amount of time to retain a deleted document's version number. Default is `60s`. -index.default_pipeline | The default ingest node pipeline for the index. If the default pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. -index.final_pipeline | The final ingest node pipeline for the index. If the final pipeline is set and the pipeline does not exist, then index requests fail. The pipeline name `_none` specifies that the index does not have an ingest pipeline. - -## Updating a dynamic index setting - -You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request: - -```json -PUT /testindex/_settings -{ - "index": { - "refresh_interval": "2s" - } -} -``` -{% include copy-curl.html %} - -For more information about updating settings, including supported query parameters, see [Update settings]({{site.url}}{{site.baseurl}}/api-reference/index-apis/update-settings/). \ No newline at end of file diff --git a/_im-plugin/index-transforms/index.md b/_im-plugin/index-transforms/index.md index 3488eae3d2..7fd1957102 100644 --- a/_im-plugin/index-transforms/index.md +++ b/_im-plugin/index-transforms/index.md @@ -151,7 +151,3 @@ GET finished_flight_job/_search } ``` - -## Index codec considerations - -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#index-rollups-and-transforms). \ No newline at end of file diff --git a/_im-plugin/index.md b/_im-plugin/index.md index fd48aa2898..f912c41d2e 100644 --- a/_im-plugin/index.md +++ b/_im-plugin/index.md @@ -11,13 +11,13 @@ redirect_from: --- # Managing indexes +OpenSearch Dashboards +{: .label .label-yellow :} You index data using the OpenSearch REST API. Two APIs exist: the index API and the `_bulk` API. For situations in which new data arrives incrementally (for example, customer orders from a small business), you might use the index API to add documents individually as they arrive. For situations in which the flow of data is less frequent (for example, weekly updates to a marketing website), you might prefer to generate a file and send it to the `_bulk` API. For large numbers of documents, lumping requests together and using the `_bulk` API offers superior performance. If your documents are enormous, however, you might need to index them individually. -When indexing documents, the document `_id` must be 512 bytes or less in size. - ## Introduction to indexing @@ -93,8 +93,6 @@ OpenSearch indexes have the following naming restrictions: `:`, `"`, `*`, `+`, `/`, `\`, `|`, `?`, `#`, `>`, or `<` - - ## Read data After you index a document, you can retrieve it by sending a GET request to the same endpoint that you used for indexing: diff --git a/_im-plugin/ism/index.md b/_im-plugin/ism/index.md index 90744e1a17..9d16c20c56 100644 --- a/_im-plugin/ism/index.md +++ b/_im-plugin/ism/index.md @@ -9,6 +9,8 @@ has_toc: false --- # Index State Management +OpenSearch Dashboards +{: .label .label-yellow :} If you analyze time-series data, you likely prioritize new data over old data. You might periodically perform certain operations on older indexes, such as reducing replica count or deleting them. diff --git a/_im-plugin/ism/managedindexes.md b/_im-plugin/ism/managedindexes.md index 8de7a3e981..91fcca3c74 100644 --- a/_im-plugin/ism/managedindexes.md +++ b/_im-plugin/ism/managedindexes.md @@ -4,8 +4,6 @@ title: Managed indexes nav_order: 3 parent: Index State Management has_children: false -redirect_from: - - /im-plugin/ism/managedindices/ --- # Managed indexes diff --git a/_im-plugin/notifications-settings.md b/_im-plugin/notifications-settings.md deleted file mode 100644 index 57d31f008a..0000000000 --- a/_im-plugin/notifications-settings.md +++ /dev/null @@ -1,237 +0,0 @@ ---- -layout: default -title: Notification settings -nav_order: 100 ---- - -# Notification settings - -Introduced 2.8 -{: .label .label-purple } - -You can use notification settings to configure notifications about long-running index operations. Set up automatic [notifications]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/) when long-running index operations are complete by [using Notifications in OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/notifications/) or through the API. - -Configuring notification settings is useful for long-running index operations, such as `open`, `reindex`, `resize`, and `force merge`. When you send a request for those operations and set the `wait_for_completion` parameter to `false`, the operation returns immediately and the response contains a task ID. You can use that task ID to configure notifications for this operation. - -## Configuring notification settings - -You can configure long-running operation notifications through the API by using the `task_id` and `action_name` parameters: - -- **One-time setting**: If you pass `task_id` in the `lron_config` object, the task runs one time and the setting is automatically deleted when the task ends. If you pass both `task_id` and `action_name`, `action_name` is ignored but may be useful to you for searching and debugging notification settings. -- **Global, persistent setting**: If you pass `action_name` and not `task_id` in the `lron_config` object, the task is global and persistent and applies to all operations of this action type. - -The following table lists the parameters for long-running index operation notifications. - -| Parameter | Type | Description | -| :--- | :--- | :--- | -| `lron_config` | Object | Long-running index operation notification configuration. | -| `task_id` | String | The task ID of the task that you want to be notified about. Optional. One of `task_id` and `action_name` must be specified.| -| `action_name` | String | The operation type that you want to be notified about. Provide `action_name` but not `task_id` to be notified of all operations of this type. Supported values are `indices:data/write/reindex`, `indices:admin/resize`, `indices:admin/forcemerge`, and `indices:admin/open`. Optional. One of `task_id` and `action_name` must be specified. | -| `lron_condition` | Object | Specifies which events you want to be notified about. Optional. If not provided, you'll be notified of both the operation success and failure. | -| `lron_condition.success` | Boolean | Set this parameter to `true` to be notified when the operation succeeds. Optional. Default is `true`. | -| `lron_condition.failure` | Boolean | Set this parameter to `true` to be notified when the operation fails or times out. Optional. Default is `true`. | -| `channels` | Object | Supported communication channels include Amazon Chime, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Email Service (Amazon SES), email through SMTP, Slack, and custom webhooks. If either `lron_condition.success` or `lron_condition.failure` is `true`, `channels` must contain at least one channel. Learn how to configure notification channels in [Notifications]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/). | - -## Create notification settings - -The following example request sets up notifications on a failure of a reindex task: - -```json -POST /_plugins/_im/lron -{ - "lron_config": { - "task_id":"dQlcQ0hQS2mwF-AQ7icCMw:12354", - "action_name":"indices:data/write/reindex", - "lron_condition": { - "success": false, - "failure": true - }, - "channels":[ - {"id":"channel1"}, - {"id":"channel2"} - ] - } -} -``` -{% include copy-curl.html %} - -The preceding request results in the following response: - -```json -{ - "_id": "LRON:dQlcQ0hQS2mwF-AQ7icCMw:12354", - "lron_config": { - "lron_condition": { - "success": false, - "failure": true - }, - "task_id": "dQlcQ0hQS2mwF-AQ7icCMw:12354", - "action_name": "indices:data/write/reindex", - "channels": [ - { - "id": "channel1" - }, - { - "id": "channel2" - } - ] - } -} -``` - -### Notification setting ID - -The response returns an ID for the notification setting in the `_id` field. You can use this ID to read, update, or delete this notification setting. For a global `lron_config`, the ID is in the form `LRON:` (for example, `LRON:indices:data/write/reindex`). - -The `action_name` may contain a slash character (`/`), which must be HTTP encoded as `%2F` if you use it the Dev Tools console. For example, `LRON:indices:data/write/reindex` becomes `LRON:indices:data%2Fwrite%2Freindex`. -{: .important} - -For a task `lron_config`, the ID is in the form `LRON:`. - -## Retrieve notification settings - -The following examples retrieve the current configured notification settings. - -Use the following request to retrieve a notification setting with the specified [notification setting ID](#notification-setting-id): - -```json - GET /_plugins/_im/lron/ -``` -{% include copy-curl.html %} - -For example, the following request retrieves the notification setting for the `reindex` operation: - -```json -{ - "lron_configs": [ - { - "_id": "LRON:indices:data/write/reindex", - "lron_config": { - "lron_condition": { - "success": false, - "failure": true - }, - "action_name": "indices:data/write/reindex", - "channels": [ - { - "id": "my_chime" - } - ] - } - } - ], - "total_number": 1 -} -``` -{% include copy-curl.html %} - -Use the following request to retrieve all notification settings: - -```json -GET /_plugins/_im/lron -``` -{% include copy-curl.html %} - -The response contains all configured notification settings with their IDs: - -```json -{ - "lron_configs": [ - { - "_id": "LRON:indices:admin/open", - "lron_config": { - "lron_condition": { - "success": false, - "failure": false - }, - "action_name": "indices:admin/open", - "channels": [] - } - }, - { - "_id": "LRON:indices:data/write/reindex", - "lron_config": { - "lron_condition": { - "success": false, - "failure": true - }, - "action_name": "indices:data/write/reindex", - "channels": [ - { - "id": "my_chime" - } - ] - } - } - ], - "total_number": 2 -} -``` - -## Update notification settings - -The following example modifies an existing notification setting with the specified [notification setting ID](#notification-setting-id): - -```json -PUT /_plugins/_im/lron/ -{ - "lron_config": { - "task_id":"dQlcQ0hQS2mwF-AQ7icCMw:12354", - "action_name":"indices:data/write/reindex", - "lron_condition": { - "success": false, - "failure": true - }, - "channels":[ - {"id":"channel1"}, - {"id":"channel2"} - ] - } -} -``` -{% include copy-curl.html %} - -The response contains the updated setting: - -```json -{ - "_id": "LRON:dQlcQ0hQS2mwF-AQ7icCMw:12354", - "lron_config": { - "lron_condition": { - "success": false, - "failure": true - }, - "task_id": "dQlcQ0hQS2mwF-AQ7icCMw:12354", - "action_name": "indices:data/write/reindex", - "channels": [ - { - "id": "channel1" - }, - { - "id": "channel2" - } - ] - } -} -``` - -## Delete notification settings - -The following example removes a notifications setting with the specified [notification setting ID](#notification-setting-id): - -```json -DELETE /_plugins/_im/lron/ -``` -{% include copy-curl.html %} - -For example, the following request deletes the notification setting for the `reindex` operation: - -```json -DELETE _plugins/_im/lron/LRON:indices:data%2Fwrite%2Freindex -``` -{% include copy-curl.html %} - -## Next steps - -- Learn more about the [ISM API]({{site.url}}{{site.baseurl}}/im-plugin/ism/api/). -- Learn more about the [Notifications]({{site.url}}{{site.baseurl}}/observing-your-data/notifications/index/) application. diff --git a/_im-plugin/reindex-data.md b/_im-plugin/reindex-data.md index 2e3288087a..fcb127a649 100644 --- a/_im-plugin/reindex-data.md +++ b/_im-plugin/reindex-data.md @@ -262,7 +262,3 @@ Option | Valid values | Description | Required :--- | :--- | :--- `index` | String | The name of the destination index. | Yes `version_type` | Enum | The version type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | No - -## Index codec considerations - -For index codec considerations, see [Index codecs]({{site.url}}{{site.baseurl}}/im-plugin/index-codecs/#reindexing). diff --git a/_includes/nav.html b/_includes/nav.html index c3e93175b0..4330a15b93 100644 --- a/_includes/nav.html +++ b/_includes/nav.html @@ -1,9 +1,4 @@ -