Add new Bechmark IA (#5022)

* Rework Benchmark IA Signed-off-by: Naarcha-AWS <[email protected]> * Add new Benchmark IA. Signed-off-by: Naarcha-AWS <[email protected]> * Add Quickstart steps and Sigv4 guide. Signed-off-by: Naarcha-AWS <[email protected]> * Add tutorial text Signed-off-by: Naarcha-AWS <[email protected]> * Fix links Signed-off-by: Naarcha-AWS <[email protected]> * Add technical feedback Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Moore <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Moore <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Remove section Signed-off-by: Naarcha-AWS <[email protected]> * Update quickstart.md Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Co-authored-by: Chris Moore <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update concepts.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
opensearch-project · Sep 20, 2023 · 796076b · 796076b
1 parent 70be12b
commit 796076b
Show file tree

Hide file tree

Showing 11 changed files with 695 additions and 155 deletions.
diff --git a/_benchmark/index.md b/_benchmark/index.md
@@ -17,17 +17,18 @@ OpenSearch Benchmark is a macrobenchmark utility provided by the [OpenSearch Pro
 
 OpenSearch Benchmark can be installed directly on a compatible host running Linux and macOS. You can also run OpenSearch Benchmark in a Docker container. See [Installing OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) for more information.
 
-## Concepts
+The following diagram visualizes how OpenSearch Benchmark works when run against a local host:
 
-Before using OpenSearch Benchmark, familiarize yourself with the following concepts:
+![Benchmark workflow]({{site.url}}{{site.baseurl}}/images/benchmark/OSB-workflow.png).
 
-- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus from which to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads inside the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
+The OpenSearch Benchmark documentation is split into five sections:
+
+- [Quickstart]({{site.url}}{{site.baseurl}}/benchmark/quickstart/): Learn how to quickly run and install OpenSearch Benchmark.
+- [User guide]({{site.url}}{{site.baseurl}}/benchmark/user-guide/index/): Dive deep into how OpenSearch Benchmark can help you track the performance of your cluster.
+- [Tutorials]({{site.url}}{{site.baseurl}}/benchmark/tutorials/index/): Use step-by-step guides for more advanced benchmarking configurations and functionality.
+- [Commands]({{site.url}}{{site.baseurl}}/benchmark/commands/index/): A detailed reference of commands and command options supported by OpenSearch.
+- [Workloads]({{site.url}}{{site.baseurl}}/benchmark/workloads/index/): A detailed reference of options available for both default and custom workloads.
 
-- **Pipeline**: A series of steps before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
-  - `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
-  - `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
-  - `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.
 
-- **Test**: A single invocation of the OpenSearch Benchmark binary.
 
 
diff --git a/_benchmark/quickstart.md b/_benchmark/quickstart.md
diff --git a/_benchmark/tutorials/index.md b/_benchmark/tutorials/index.md
@@ -0,0 +1,10 @@
+---
+layout: default
+title: Tutorials
+nav_order: 10
+has_children: true
+---
+
+# Tutorial
+
+This section of the OpenSearch Benchmark documentation provides a set of tutorials for those who want to learn more advanced OpenSearch Benchmark concepts.
diff --git a/_benchmark/tutorials/sigv4.md b/_benchmark/tutorials/sigv4.md
@@ -0,0 +1,32 @@
+---
+layout: default
+title: AWS Signature Version 4 support
+nav_order: 70
+parent: Tutorials
+---
+
+# Running OpenSearch Benchmark with AWS Signature Version 4
+
+OpenSearch Benchmark supports AWS Signature Version 4 authentication. To run Benchmark with Signature Version 4, use the following steps:
+
+1. Set up an [IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) and provide it access to the OpenSearch cluster using Signature Version 4 authentication.
+
+2. Set up the following environment variables for your IAM user:
+
+   ```bash
+   OSB_AWS_ACCESS_KEY_ID=<<IAM USER AWS ACCESS KEY ID>
+   OSB_AWS_SECRET_ACCESS_KEY=<IAM USER AWS SECRET ACCESS KEY>
+   OSB_REGION=<YOUR REGION>
+   OSB_SERVICE=aos
+   ```
+   {% include copy.html %}
+
+3. Customize and run the following `execute-test` command with the ` --client-options=amazon_aws_log_in:environment` flag. This flag tells OpenSearch Benchmark the location of your exported credentials.
+
+   ```bash
+   opensearch-benchmark execute-test \
+   --target-hosts=<CLUSTER ENDPOINT> \
+   --pipeline=benchmark-only \
+   --workload=geonames \
+   --client-options=timeout:120,amazon_aws_log_in:environment \
+   ```
diff --git a/_benchmark/user-guide/concepts.md b/_benchmark/user-guide/concepts.md
@@ -0,0 +1,171 @@
+---
+layout: default
+title: Concepts
+nav_order: 3
+parent: User guide
+---
+
+# Concepts
+
+Before using OpenSearch Benchmark, familiarize yourself with the following concepts.
+
+## Core concepts and definitions
+
+- **Workload**: The description of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains any indexes, data files, and operations invoked when the workflow runs. You can list the available workloads by using `opensearch-benchmark list workloads` or view any included workloads in the [OpenSearch Benchmark Workloads repository](https://github.com/opensearch-project/opensearch-benchmark-workloads/). For more information about the elements of a workload, see [Anatomy of a workload](#anatomy-of-a-workload). For information about building a custom workload, see [Creating custom workloads]({{site.url}}{{site.baseurl}}/benchmark/creating-custom-workloads/).
+
+- **Pipeline**: A series of steps occurring before and after a workload is run that determines benchmark results. OpenSearch Benchmark supports three pipelines:
+  - `from-sources`: Builds and provisions OpenSearch, runs a benchmark, and then publishes the results.
+  - `from-distribution`: Downloads an OpenSearch distribution, provisions it, runs a benchmark, and then publishes the results.
+  - `benchmark-only`: The default pipeline. Assumes an already running OpenSearch instance, runs a benchmark on that instance, and then publishes the results.
+
+- **Test**: A single invocation of the OpenSearch Benchmark binary.
+
+A workload is a specification of one or more benchmarking scenarios. A workload typically includes the following:
+
+- One or more data streams that are ingested into indexes.
+- A set of queries and operations that are invoked as part of the benchmark.
+
+## Anatomy of a workload
+
+The following example workload shows all of the essential elements needed to create a `workload.json` file. You can run this workload in your own benchmark configuration to understand how all of the elements work together:
+
+```json
+{
+  "description": "Tutorial benchmark for OpenSearch Benchmark",
+  "indices": [
+    {
+      "name": "movies",
+      "body": "index.json"
+    }
+  ],
+  "corpora": [
+    {
+      "name": "movies",
+      "documents": [
+        {
+          "source-file": "movies-documents.json",
+          "document-count": 11658903, # Fetch document count from command line
+          "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
+        }
+      ]
+    }
+  ],
+  "schedule": [
+    {
+      "operation": {
+        "operation-type": "create-index"
+      }
+    },
+    {
+      "operation": {
+        "operation-type": "cluster-health",
+        "request-params": {
+          "wait_for_status": "green"
+        },
+        "retry-until-success": true
+      }
+    },
+    {
+      "operation": {
+        "operation-type": "bulk",
+        "bulk-size": 5000
+      },
+      "warmup-time-period": 120,
+      "clients": 8
+    },
+    {
+      "operation": {
+        "name": "query-match-all",
+        "operation-type": "search",
+        "body": {
+          "query": {
+            "match_all": {}
+          }
+        }
+      },
+      "iterations": 1000,
+      "target-throughput": 100
+    }
+  ]
+}
+```
+
+A workload usually includes the following elements:
+
+- [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/): Defines the relevant indexes and index templates used for the workload.
+- [corpora]({{site.url}}{{site.baseurl}}/benchmark/workloads/corpora/): Defines all document corpora used for the workload.
+- `schedule`: Defines operations and the order in which the operations run inline. Alternatively, you can use `operations` to group operations and the `test_procedures` parameter to specify the order of operations. 
+- `operations`: **Optional**. Describes which operations are available for the workload and how they are parameterized. 
+
+### Indices
+
+To create an index, specify its `name`. To add definitions to your index, use the `body` option and point it to the JSON file containing the index definitions. For more information, see [indices]({{site.url}}{{site.baseurl}}/benchmark/workloads/indices/).
+
+### Corpora
+
+The `corpora` element requires the name of the index containing the document corpus, for example, `movies`, and a list of parameters that define the document corpora. This list includes the following parameters:
+
+-  `source-file`: The file name that contains the workload's corresponding documents. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a `base_url`, use a compressed file format: `.zip`, `.bz2`, `.gz`, `.tar`, `.tar.gz`, `.tgz`, or `.tar.bz2`. The compressed file must have one JSON file containing the name. 
+-  `document-count`: The number of documents in the `source-file`, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent-child relationship, specify the number of parent documents. 
+- `uncompressed-bytes`: The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. 
+- `compressed-bytes`: The size, in bytes, of the source file before decompression. This can help you assess the amount of time needed for the cluster to ingest documents.
+
+### Operations
+
+The `operations` element lists the OpenSearch API operations performed by the workload. For example, you can set an operation to `create-index`, an index in the test cluster to which OpenSearch Benchmark can write documents. Operations are usually listed inside of `schedule`.
+
+### Schedule
+
+The `schedule` element contains a list of actions and operations that are run by the workload. Operations run according to the order in which they appear in the `schedule`. The following example illustrates a `schedule` with multiple operations, each defined by its `operation-type`: 
+
+```json
+  "schedule": [
+    {
+      "operation": {
+        "operation-type": "create-index"
+      }
+    },
+    {
+      "operation": {
+        "operation-type": "cluster-health",
+        "request-params": {
+          "wait_for_status": "green"
+        },
+        "retry-until-success": true
+      }
+    },
+    {
+      "operation": {
+        "operation-type": "bulk",
+        "bulk-size": 5000
+      },
+      "warmup-time-period": 120,
+      "clients": 8
+    },
+    {
+      "operation": {
+        "name": "query-match-all",
+        "operation-type": "search",
+        "body": {
+          "query": {
+            "match_all": {}
+          }
+        }
+      },
+      "iterations": 1000,
+      "target-throughput": 100
+    }
+  ]
+}
+```
+
+According to this schedule, the actions will run in the following order:
+
+1. The `create-index` operation creates an index. The index remains empty until the `bulk` operation adds documents with benchmarked data.
+2. The `cluster-health` operation assesses the health of the cluster before running the workload. In this example, the workload waits until the status of the cluster's health is `green`.
+   - The `bulk` operation runs the `bulk` API to index `5000` documents simultaneously.
+   - Before benchmarking, the workload waits until the specified `warmup-time-period` passes. In this example, the warmup period is `120` seconds.
+5. The `clients` field defines the number of clients that will run the remaining actions in the schedule concurrently.
+6. The `search` runs a `match_all` query to match all documents after they have been indexed by the `bulk` API using the 8 clients specified.
+   - The `iterations` field indicates the number of times each client runs the `search` operation. The report generated by the benchmark automatically adjusts the percentile numbers based on this number. To generate a precise percentile, the benchmark needs to run at least 1,000 iterations.
+   - Lastly, the `target-throughput` field defines the number of requests per second each client performs, which, when set, can help reduce the latency of the benchmark. For example, a `target-throughput` of 100 requests divided by 8 clients means that each client will issue 12 requests per second.
diff --git a/_benchmark/configuring-benchmark.md → ...hmark/user-guide/configuring-benchmark.md b/_benchmark/configuring-benchmark.md → ...hmark/user-guide/configuring-benchmark.md
@@ -2,7 +2,8 @@
 layout: default
 title: Configuring OpenSearch Benchmark
 nav_order: 7
-has_children: false
+parent: User guide
+redirect_from: /benchmark/configuring-benchmark/
 ---
 
 # Configuring OpenSearch Benchmark

diff --git a/_benchmark/creating-custom-workloads.md → ...k/user-guide/creating-custom-workloads.md b/_benchmark/creating-custom-workloads.md → ...k/user-guide/creating-custom-workloads.md
@@ -2,7 +2,8 @@
 layout: default
 title: Creating custom workloads
 nav_order: 10
-has_children: false
+parent: User guide
+redirect_from: /benchmark/creating-custom-workloads/
 ---
 
 # Creating custom workloads

diff --git a/_benchmark/user-guide/index.md b/_benchmark/user-guide/index.md
@@ -0,0 +1,10 @@
+---
+layout: default
+title: User guide
+nav_order: 5
+has_children: true
+---
+
+# OpenSearch Benchmark User Guide
+
+The OpenSearch Benchmark User Guide includes core [concepts]({{site.url}}{{site.baseurl}}/benchmark/user-guide/concepts/), [installation]({{site.url}}{{site.baseurl}}/benchmark/installing-benchmark/) instructions, and [configuration options]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to help you get the most out of OpenSearch Benchmark.
diff --git a/_benchmark/installing-benchmark.md → ...chmark/user-guide/installing-benchmark.md b/_benchmark/installing-benchmark.md → ...chmark/user-guide/installing-benchmark.md
@@ -2,7 +2,8 @@
 layout: default
 title: Installing OpenSearch Benchmark
 nav_order: 5
-has_children: false
+parent: User guide
+redirect_from: /benchmark/installing-benchmark/
 ---
 
 # Installing OpenSearch Benchmark
@@ -150,6 +151,59 @@ run -v $HOME/benchmarks:/opensearch-benchmark/.benchmark opensearchproject/opens
 
 See [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/) to learn more about the files and subdirectories located in `/opensearch-benchmark/.benchmark`.
 
+## Provisioning an OpenSearch cluster with a test
+
+OpenSearch Benchmark is compatible with JDK versions 17, 16, 15, 14, 13, 12, 11, and 8.
+{: .note}
+
+If you installed OpenSearch with PyPi, you can also provision a new OpenSearch cluster by specifying a `distribution-version` in the `execute-test` command.
+
+If you plan on having Benchmark provision a cluster, you'll need to inform Benchmark of the location of the `JAVA_HOME` path for the Benchmark cluster. To set the `JAVA_HOME` path and provision a cluster:
+
+1. Find the `JAVA_HOME` path you're currently using. Open a terminal and enter `/usr/libexec/java_home`.
+
+2. Set your corresponding JDK version environment variable by entering the path from the previous step. Enter `export JAVA17_HOME=<Java Path>`.
+
+3. Run the `execute-test` command and indicate the distribution version of OpenSearch you want to use: 
+
+  ```bash
+  opensearch-benchmark execute-test --distribution-version=2.3.0 --workload=geonames --test-mode 
+  ```
+
+## Directory structure
+
+After running OpenSearch Benchmark for the first time, you can search through all related files, including configuration files, in the `~/.benchmark` directory. The directory includes the following file tree:
+
+```
+# ~/.benchmark Tree
+.
+├── benchmark.ini
+├── benchmarks
+│   ├── data
+│   │   └── geonames
+│   ├── distributions
+│   │   ├── opensearch-1.0.0-linux-x64.tar.gz
+│   │   └── opensearch-2.3.0-linux-x64.tar.gz
+│   ├── test_executions
+│   │   ├── 0279b13b-1e54-49c7-b1a7-cde0b303a797
+│   │   └── 0279c542-a856-4e88-9cc8-04306378cd38
+│   └── workloads
+│       └── default
+│           └── geonames
+├── logging.json
+├── logs
+│   └── benchmark.log
+```
+
+* `benchmark.ini`: Contains any adjustable configurations for tests. For information about how to configure OpenSearch Benchmark, see [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/).
+* `data`: Contains all the data corpora and documents related to OpenSearch Benchmark's [official workloads](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/geonames).
+* `distributions`: Contains all the OpenSearch distributions downloaded from [OpenSearch.org](http://opensearch.org/) and used to provision clusters.
+* `test_executions`: Contains all the test `execution_id`s from previous runs of OpenSearch Benchmark.
+* `workloads`: Contains all files related to workloads, except for the data corpora.
+* `logging.json`: Contains all of the configuration options related to how logging is performed within OpenSearch Benchmark.
+* `logs`: Contains all the logs from OpenSearch Benchmark runs. This can be helpful when you've encountered errors during runs.
+
+
 ## Next steps
 
 - [Configuring OpenSearch Benchmark]({{site.url}}{{site.baseurl}}/benchmark/configuring-benchmark/)