Skip to content

Commit

Permalink
Add new Bechmark IA (#5022)
Browse files Browse the repository at this point in the history
* Rework Benchmark IA

Signed-off-by: Naarcha-AWS <[email protected]>

* Add new Benchmark IA.

Signed-off-by: Naarcha-AWS <[email protected]>

* Add Quickstart steps and Sigv4 guide.

Signed-off-by: Naarcha-AWS <[email protected]>

* Add tutorial text

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix links

Signed-off-by: Naarcha-AWS <[email protected]>

* Add technical feedback

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Remove section

Signed-off-by: Naarcha-AWS <[email protected]>

* Update quickstart.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update concepts.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Sep 20, 2023
1 parent 70be12b commit 796076b
Show file tree
Hide file tree
Showing 11 changed files with 695 additions and 155 deletions.
17 changes: 9 additions & 8 deletions _benchmark/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@ OpenSearch Benchmark is a macrobenchmark utility provided by the [OpenSearch Pro

OpenSearch Benchmark can be installed directly on a compatible host running Linux and macOS. You can also run OpenSearch Benchmark in a Docker container. See [Installing OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) for more information.

## Concepts
The following diagram visualizes how OpenSearch Benchmark works when run against a local host:

Before using OpenSearch Benchmark, familiarize yourself with the following concepts:
![Benchmark workflow]({{site.url}}{{site.baseurl}}/images/benchmark/OSB-workflow.png).

- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus from which to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads inside the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
The OpenSearch Benchmark documentation is split into five sections:

- [Quickstart]({{site.url}}{{site.baseurl}}/benchmark/quickstart/): Learn how to quickly run and install OpenSearch Benchmark.
- [User guide]({{site.url}}{{site.baseurl}}/benchmark/user-guide/index/): Dive deep into how OpenSearch Benchmark can help you track the performance of your cluster.
- [Tutorials]({{site.url}}{{site.baseurl}}/benchmark/tutorials/index/): Use step-by-step guides for more advanced benchmarking configurations and functionality.
- [Commands]({{site.url}}{{site.baseurl}}/benchmark/commands/index/): A detailed reference of commands and command options supported by OpenSearch.
- [Workloads]({{site.url}}{{site.baseurl}}/benchmark/workloads/index/): A detailed reference of options available for both default and custom workloads.

- **Pipeline**: A series of steps before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
- `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
- `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.

- **Test**: A single invocation of the OpenSearch Benchmark binary.


401 changes: 401 additions & 0 deletions _benchmark/quickstart.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions _benchmark/tutorials/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: Tutorials
nav_order: 10
has_children: true
---

# Tutorial

This section of the OpenSearch Benchmark documentation provides a set of tutorials for those who want to learn more advanced OpenSearch Benchmark concepts.
32 changes: 32 additions & 0 deletions _benchmark/tutorials/sigv4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
layout: default
title: AWS Signature Version 4 support
nav_order: 70
parent: Tutorials
---

# Running OpenSearch Benchmark with AWS Signature Version 4

OpenSearch Benchmark supports AWS Signature Version 4 authentication. To run Benchmark with Signature Version 4, use the following steps:

1. Set up an [IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) and provide it access to the OpenSearch cluster using Signature Version 4 authentication.

2. Set up the following environment variables for your IAM user:

```bash
OSB_AWS_ACCESS_KEY_ID=<<IAM USER AWS ACCESS KEY ID>
OSB_AWS_SECRET_ACCESS_KEY=<IAM USER AWS SECRET ACCESS KEY>
OSB_REGION=<YOUR REGION>
OSB_SERVICE=aos
```
{% include copy.html %}
3. Customize and run the following `execute-test` command with the ` --client-options=amazon_aws_log_in:environment` flag. This flag tells OpenSearch Benchmark the location of your exported credentials.
```bash
opensearch-benchmark execute-test \
--target-hosts=<CLUSTER ENDPOINT> \
--pipeline=benchmark-only \
--workload=geonames \
--client-options=timeout:120,amazon_aws_log_in:environment \
```
171 changes: 171 additions & 0 deletions _benchmark/user-guide/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
layout: default
title: Concepts
nav_order: 3
parent: User guide
---

# Concepts

Before using OpenSearch Benchmark, familiarize yourself with the following concepts.

## Core concepts and definitions

- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload](#anatomy-of-a-workload). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).

- **Pipeline**: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
- `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
- `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
- `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.

- **Test**: A single invocation of the OpenSearch Benchmark binary.

A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:

- One or more data streams that are ingested into indexes.
- A set of queries and operations that are invoked as part of the benchmark.

## Anatomy of a workload

The following example workload shows all of the essential elements needed to create a `workload.json` file. You can run this workload in your own benchmark configuration to understand how all of the elements work together:

```json
{
"description": "Tutorial benchmark for OpenSearch Benchmark",
"indices": [
{
"name": "movies",
"body": "index.json"
}
],
"corpora": [
{
"name": "movies",
"documents": [
{
"source-file": "movies-documents.json",
"document-count": 11658903, # Fetch document count from command line
"uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
}
]
}
],
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
```

A workload usually includes the following elements:

- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload.
- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload.
- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations.
- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized.

### Indices

To create an index, specify its `name`. To add definitions to your index, use the `body` option and point it to the JSON file containing the index definitions. For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/).

### Corpora

The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters:

- `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name.
- `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents.
- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs.
- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.

### Operations

The `operations` element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to `create-index`, an index in the test cluster to which OpenSearch Benchmark can write documents. Operations are usually listed inside of `schedule`.

### Schedule

The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`:

```json
"schedule": [
{
"operation": {
"operation-type": "create-index"
}
},
{
"operation": {
"operation-type": "cluster-health",
"request-params": {
"wait_for_status": "green"
},
"retry-until-success": true
}
},
{
"operation": {
"operation-type": "bulk",
"bulk-size": 5000
},
"warmup-time-period": 120,
"clients": 8
},
{
"operation": {
"name": "query-match-all",
"operation-type": "search",
"body": {
"query": {
"match_all": {}
}
}
},
"iterations": 1000,
"target-throughput": 100
}
]
}
```

According to this schedule, the actions will run in the following order:

1. The `create-index` operation creates an index. The index remains empty until the `bulk` operation adds documents with benchmarked data.
2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`.
- The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously.
- Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds.
5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently.
6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified.
- The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
- Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Configuring OpenSearch Benchmark
nav_order: 7
has_children: false
parent: User guide
redirect_from: /benchmark/configuring-benchmark/
---

# Configuring OpenSearch Benchmark
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Creating custom workloads
nav_order: 10
has_children: false
parent: User guide
redirect_from: /benchmark/creating-custom-workloads/
---

# Creating custom workloads
Expand Down
10 changes: 10 additions & 0 deletions _benchmark/user-guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: User guide
nav_order: 5
has_children: true
---

# OpenSearch Benchmark User Guide

The OpenSearch Benchmark User Guide includes core [concepts]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts/), [installation]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) instructions, and [configuration options]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to help you get the most out of OpenSearch Benchmark.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
layout: default
title: Installing OpenSearch Benchmark
nav_order: 5
has_children: false
parent: User guide
redirect_from: /benchmark/installing-benchmark/
---

# Installing OpenSearch Benchmark
Expand Down Expand Up @@ -150,6 +151,59 @@ run -v $HOME/benchmarks:/opensearch-benchmark/.benchmark opensearchproject/opens

See [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to learn more about the files and subdirectories located in `/opensearch-benchmark/.benchmark`.

## Provisioning an OpenSearch cluster with a test

OpenSearch Benchmark is compatible with JDK versions 17, 16, 15, 14, 13, 12, 11, and 8.
{: .note}

If you installed OpenSearch with PyPi, you can also provision a new OpenSearch cluster by specifying a `distribution-version` in the `execute-test` command.

If you plan on having Benchmark provision a cluster, you'll need to inform Benchmark of the location of the `JAVA_HOME` path for the Benchmark cluster. To set the `JAVA_HOME` path and provision a cluster:

1. Find the `JAVA_HOME` path you're currently using. Open a terminal and enter `/usr/libexec/java_home`.

2. Set your corresponding JDK version environment variable by entering the path from the previous step. Enter `export JAVA17_HOME=<Java Path>`.

3. Run the `execute-test` command and indicate the distribution version of OpenSearch you want to use:

```bash
opensearch-benchmark execute-test --distribution-version=2.3.0 --workload=geonames --test-mode
```

## Directory structure

After running OpenSearch Benchmark for the first time, you can search through all related files, including configuration files, in the `~/.benchmark` directory. The directory includes the following file tree:

```
# ~/.benchmark Tree
.
├── benchmark.ini
├── benchmarks
│ ├── data
│ │ └── geonames
│ ├── distributions
│ │ ├── opensearch-1.0.0-linux-x64.tar.gz
│ │ └── opensearch-2.3.0-linux-x64.tar.gz
│ ├── test_executions
│ │ ├── 0279b13b-1e54-49c7-b1a7-cde0b303a797
│ │ └── 0279c542-a856-4e88-9cc8-04306378cd38
│ └── workloads
│ └── default
│ └── geonames
├── logging.json
├── logs
│ └── benchmark.log
```

* `benchmark.ini`: Contains any adjustable configurations for tests. For information about how to configure OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
* `data`: Contains all the data corpora and documents related to OpenSearch Benchmark's [official workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/geonames).
* `distributions`: Contains all the OpenSearch distributions downloaded from [OpenSearch.org](http://opensearch.org/) and used to provision clusters.
* `test_executions`: Contains all the test `execution_id`s from previous runs of OpenSearch Benchmark.
* `workloads`: Contains all files related to workloads, except for the data corpora.
* `logging.json`: Contains all of the configuration options related to how logging is performed within OpenSearch Benchmark.
* `logs`: Contains all the logs from OpenSearch Benchmark runs. This can be helpful when you've encountered errors during runs.


## Next steps

- [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/)
Expand Down
Loading

0 comments on commit 796076b

Please sign in to comment.