Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.9] Add new Bechmark IA #5055

Merged
merged 1 commit into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions _benchmark/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@ OpenSearch Benchmark is a macrobenchmark utility provided by the [OpenSearch Pro

OpenSearch Benchmark can be installed directly on a compatible host running Linux and macOS. You can also run OpenSearch Benchmark in a Docker container. See [Installing OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) for more information.

## Concepts
The following diagram visualizes how OpenSearch Benchmark works when run against a local host:

Before using OpenSearch Benchmark, familiarize yourself with the following concepts:
![Benchmark workflow]({{site.url}}{{site.baseurl}}/images/benchmark/OSB-workflow.png).

- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus from which to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads inside the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
The OpenSearch Benchmark documentation is split into five sections:

- [Quickstart]({{site.url}}{{site.baseurl}}/benchmark/quickstart/): Learn how to quickly run and install OpenSearch Benchmark.
- [User guide]({{site.url}}{{site.baseurl}}/benchmark/user-guide/index/): Dive deep into how OpenSearch Benchmark can help you track the performance of your cluster.
- [Tutorials]({{site.url}}{{site.baseurl}}/benchmark/tutorials/index/): Use step-by-step guides for more advanced benchmarking configurations and functionality.
- [Commands]({{site.url}}{{site.baseurl}}/benchmark/commands/index/): A detailed reference of commands and command options supported by OpenSearch.
- [Workloads]({{site.url}}{{site.baseurl}}/benchmark/workloads/index/): A detailed reference of options available for both default and custom workloads.

- **Pipeline**: A series of steps before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
- `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
- `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.

- **Test**: A single invocation of the OpenSearch Benchmark binary.


401 changes: 401 additions & 0 deletions _benchmark/quickstart.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions _benchmark/tutorials/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: Tutorials
nav_order: 10
has_children: true
---

# Tutorial

This section of the OpenSearch Benchmark documentation provides a set of tutorials for those who want to learn more advanced OpenSearch Benchmark concepts.
32 changes: 32 additions & 0 deletions _benchmark/tutorials/sigv4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
layout: default
title: AWS Signature Version 4 support
nav_order: 70
parent: Tutorials
---

# Running OpenSearch Benchmark with AWS Signature Version 4

Check failure on line 8 in _benchmark/tutorials/sigv4.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/tutorials/sigv4.md#L8

[OpenSearch.HeadingCapitalization] 'Running OpenSearch Benchmark with AWS Signature Version 4' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'Running OpenSearch Benchmark with AWS Signature Version 4' is a heading and should be in sentence case.", "location": {"path": "_benchmark/tutorials/sigv4.md", "range": {"start": {"line": 8, "column": 3}}}, "severity": "ERROR"}

OpenSearch Benchmark supports AWS Signature Version 4 authentication. To run Benchmark with Signature Version 4, use the following steps:

1. Set up an [IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) and provide it access to the OpenSearch cluster using Signature Version 4 authentication.

Check warning on line 12 in _benchmark/tutorials/sigv4.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/tutorials/sigv4.md#L12

[OpenSearch.AcronymParentheses] 'IAM': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.
Raw output
{"message": "[OpenSearch.AcronymParentheses] 'IAM': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.", "location": {"path": "_benchmark/tutorials/sigv4.md", "range": {"start": {"line": 12, "column": 53}}}, "severity": "WARNING"}

2. Set up the following environment variables for your IAM user:

```bash
OSB_AWS_ACCESS_KEY_ID=<<IAM USER AWS ACCESS KEY ID>

Check warning on line 17 in _benchmark/tutorials/sigv4.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/tutorials/sigv4.md#L17

[OpenSearch.AcronymParentheses] 'IAM': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.
Raw output
{"message": "[OpenSearch.AcronymParentheses] 'IAM': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.", "location": {"path": "_benchmark/tutorials/sigv4.md", "range": {"start": {"line": 17, "column": 28}}}, "severity": "WARNING"}
OSB_AWS_SECRET_ACCESS_KEY=<IAM USER AWS SECRET ACCESS KEY>
OSB_REGION=<YOUR REGION>
OSB_SERVICE=aos
```
{% include copy.html %}

3. Customize and run the following `execute-test` command with the ` --client-options=amazon_aws_log_in:environment` flag. This flag tells OpenSearch Benchmark the location of your exported credentials.

```bash
opensearch-benchmark execute-test \
--target-hosts=<CLUSTER ENDPOINT> \
--pipeline=benchmark-only \
--workload=geonames \
--client-options=timeout:120,amazon_aws_log_in:environment \
```
171 changes: 171 additions & 0 deletions _benchmark/user-guide/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
layout: default
title: Concepts
nav_order: 3
parent: User guide
---

# Concepts

Before using OpenSearch Benchmark, familiarize yourself with the following concepts.

## Core concepts and definitions

- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload](#anatomy-of-a-workload). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).

- **Pipeline**: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
- `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
- `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.

- **Test**: A single invocation of the OpenSearch Benchmark binary.

A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:

- One or more data streams that are ingested into indexes.
- A set of queries and operations that are invoked as part of the benchmark.

## Anatomy of a workload

The following example workload shows all of the essential elements needed to create a `workload.json` file. You can run this workload in your own benchmark configuration to understand how all of the elements work together:

```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
```

A workload usually includes the following elements:

- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload.

Check failure on line 95 in _benchmark/user-guide/concepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/user-guide/concepts.md#L95

[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.
Raw output
{"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.", "location": {"path": "_benchmark/user-guide/concepts.md", "range": {"start": {"line": 95, "column": 4}}}, "severity": "ERROR"}
- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload.
- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations.
- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized.

### Indices

Check failure on line 100 in _benchmark/user-guide/concepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/user-guide/concepts.md#L100

[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'Indices'.
Raw output
{"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'Indices'.", "location": {"path": "_benchmark/user-guide/concepts.md", "range": {"start": {"line": 100, "column": 5}}}, "severity": "ERROR"}

To create an index, specify its `name`. To add definitions to your index, use the `body` option and point it to the JSON file containing the index definitions. For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/).

Check failure on line 102 in _benchmark/user-guide/concepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/user-guide/concepts.md#L102

[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.
Raw output
{"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.", "location": {"path": "_benchmark/user-guide/concepts.md", "range": {"start": {"line": 102, "column": 188}}}, "severity": "ERROR"}

### Corpora

The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters:

- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name.
- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.
- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.
- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.

### Operations

The `operations` element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to `create-index`, an index in the test cluster to which OpenSearch Benchmark can write documents. Operations are usually listed inside of `schedule`.

### Schedule

The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`:

```json
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
```

According to this schedule, the actions will run in the following order:

1. The `create-index` operation creates an index. The index remains empty until the `bulk` operation adds documents with benchmarked data.

Check failure on line 164 in _benchmark/user-guide/concepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/user-guide/concepts.md#L164

[OpenSearch.Spelling] Error: benchmarked. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: benchmarked. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_benchmark/user-guide/concepts.md", "range": {"start": {"line": 164, "column": 122}}}, "severity": "ERROR"}
2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`.
- The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously.
- Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds.
5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently.
6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified.
- The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
- Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Configuring OpenSearch Benchmark
nav_order: 7
has_children: false
parent: User guide
redirect_from: /benchmark/configuring-benchmark/
---

# Configuring OpenSearch Benchmark
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Creating custom workloads
nav_order: 10
has_children: false
parent: User guide
redirect_from: /benchmark/creating-custom-workloads/
---

# Creating custom workloads
Expand Down
10 changes: 10 additions & 0 deletions _benchmark/user-guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: User guide
nav_order: 5
has_children: true
---

# OpenSearch Benchmark User Guide

Check failure on line 8 in _benchmark/user-guide/index.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _benchmark/user-guide/index.md#L8

[OpenSearch.HeadingCapitalization] 'OpenSearch Benchmark User Guide' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'OpenSearch Benchmark User Guide' is a heading and should be in sentence case.", "location": {"path": "_benchmark/user-guide/index.md", "range": {"start": {"line": 8, "column": 3}}}, "severity": "ERROR"}

The OpenSearch Benchmark User Guide includes core [concepts]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts/), [installation]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) instructions, and [configuration options]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to help you get the most out of OpenSearch Benchmark.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Installing OpenSearch Benchmark
nav_order: 5
has_children: false
parent: User guide
redirect_from: /benchmark/installing-benchmark/
---

# Installing OpenSearch Benchmark
Expand Down Expand Up @@ -150,6 +151,59 @@ run -v $HOME/benchmarks:/opensearch-benchmark/.benchmark opensearchproject/opens

See [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to learn more about the files and subdirectories located in `/opensearch-benchmark/.benchmark`.

## Provisioning an OpenSearch cluster with a test

OpenSearch Benchmark is compatible with JDK versions 17, 16, 15, 14, 13, 12, 11, and 8.
{: .note}

If you installed OpenSearch with PyPi, you can also provision a new OpenSearch cluster by specifying a `distribution-version` in the `execute-test` command.

If you plan on having Benchmark provision a cluster, you'll need to inform Benchmark of the location of the `JAVA_HOME` path for the Benchmark cluster. To set the `JAVA_HOME` path and provision a cluster:

1. Find the `JAVA_HOME` path you're currently using. Open a terminal and enter `/usr/libexec/java_home`.

2. Set your corresponding JDK version environment variable by entering the path from the previous step. Enter `export JAVA17_HOME=<Java Path>`.

3. Run the `execute-test` command and indicate the distribution version of OpenSearch you want to use:

```bash
opensearch-benchmark execute-test --distribution-version=2.3.0 --workload=geonames --test-mode
```

## Directory structure

After running OpenSearch Benchmark for the first time, you can search through all related files, including configuration files, in the `~/.benchmark` directory. The directory includes the following file tree:

```
# ~/.benchmark Tree
.
├── benchmark.ini
├── benchmarks
│ ├── data
│ │ └── geonames
│ ├── distributions
│ │ ├── opensearch-1.0.0-linux-x64.tar.gz
│ │ └── opensearch-2.3.0-linux-x64.tar.gz
│ ├── test_executions
│ │ ├── 0279b13b-1e54-49c7-b1a7-cde0b303a797
│ │ └── 0279c542-a856-4e88-9cc8-04306378cd38
│ └── workloads
│ └── default
│ └── geonames
├── logging.json
├── logs
│ └── benchmark.log
```

* `benchmark.ini`: Contains any adjustable configurations for tests. For information about how to configure OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
* `data`: Contains all the data corpora and documents related to OpenSearch Benchmark's [official workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/geonames).
* `distributions`: Contains all the OpenSearch distributions downloaded from [OpenSearch.org](http://opensearch.org/) and used to provision clusters.
* `test_executions`: Contains all the test `execution_id`s from previous runs of OpenSearch Benchmark.
* `workloads`: Contains all files related to workloads, except for the data corpora.
* `logging.json`: Contains all of the configuration options related to how logging is performed within OpenSearch Benchmark.
* `logs`: Contains all the logs from OpenSearch Benchmark runs. This can be helpful when you've encountered errors during runs.


## Next steps

- [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/)
Expand Down
Loading
Loading