diff --git a/_benchmark/glossary.md b/_benchmark/glossary.md new file mode 100644 index 0000000000..f86591d3d9 --- /dev/null +++ b/_benchmark/glossary.md @@ -0,0 +1,21 @@ +--- +layout: default +title: Glossary +nav_order: 10 +--- + +# OpenSearch Benchmark glossary + +The following terms are commonly used in OpenSearch Benchmark: + +- **Corpora**: A collection of documents. +- **Latency**: If `target-throughput` is disabled (has no value or a value of `0)`, then latency is equal to service time. If `target-throughput` is enabled (has a value of 1 or greater), then latency is equal to the service time plus the amount of time the request waits in the queue before being sent. +- **Metric keys**: The metrics stored by OpenSearch Benchmark, based on the configuration in the [metrics record]({{site.url}}{{site.baseurl}}/benchmark/metrics/metric-records/). +- **Operations**: In workloads, a list of API operations performed by a workload. +- **Pipeline**: A series of steps occurring both before and after running a workload that determines benchmark results. +- **Schedule**: A list of two or more operations performed in the order they appear when a workload is run. +- **Service time**: The amount of time taken for `opensearch-py`, the primary client for OpenSearch Benchmark, to send a request and receive a response from the OpenSearch cluster. It includes the amount of time taken for the server to process a request as well as for network latency, load balancer overhead, and deserialization/serialization. +- **Summary report**: A report generated at the end of a test based on the metric keys defined in the workload. +- **Test**: A single invocation of the OpenSearch Benchmark binary. +- **Throughput**: The number of operations completed in a given period of time. +- **Workload**: A collection of one or more benchmarking tests that use a specific document corpus to perform a benchmark against a cluster. The document corpus contains any indexes, data files, or operations invoked when the workload runs. \ No newline at end of file diff --git a/_clients/index.md b/_clients/index.md index fc8c23d912..a15f0539d2 100644 --- a/_clients/index.md +++ b/_clients/index.md @@ -53,8 +53,8 @@ To view the compatibility matrix for a specific client, see the `COMPATIBILITY.m Client | Recommended version :--- | :--- -[Elasticsearch Java low-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4/jar) | 7.13.4 -[Elasticsearch Java high-level REST client](https://search.maven.org/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4/jar) | 7.13.4 +[Elasticsearch Java low-level REST client](https://central.sonatype.com/artifact/org.elasticsearch.client/elasticsearch-rest-client/7.13.4) | 7.13.4 +[Elasticsearch Java high-level REST client](https://central.sonatype.com/artifact/org.elasticsearch.client/elasticsearch-rest-high-level-client/7.13.4) | 7.13.4 [Elasticsearch Python client](https://pypi.org/project/elasticsearch/7.13.4/) | 7.13.4 [Elasticsearch Node.js client](https://www.npmjs.com/package/@elastic/elasticsearch/v/7.13.0) | 7.13.0 [Elasticsearch Ruby client](https://rubygems.org/gems/elasticsearch/versions/7.13.0) | 7.13.0 diff --git a/_field-types/supported-field-types/knn-vector.md b/_field-types/supported-field-types/knn-vector.md index 4c00b94de8..da784aeefe 100644 --- a/_field-types/supported-field-types/knn-vector.md +++ b/_field-types/supported-field-types/knn-vector.md @@ -175,7 +175,7 @@ By default, k-NN vectors are `float` vectors, in which each dimension is 4 bytes Byte vectors are supported only for the `lucene` and `faiss` engines. They are not supported for the `nmslib` engine. {: .note} -In [k-NN benchmarking tests](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). +In [k-NN benchmarking tests](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch), the use of `byte` rather than `float` vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected (note that recall can depend on various factors, such as the [quantization technique](#quantization-techniques) and data distribution). When using `byte` vectors, expect some loss of precision in the recall compared to using `float` vectors. Byte vectors are useful in large-scale applications and use cases that prioritize a reduced memory footprint in exchange for a minimal loss of recall. {: .important} @@ -411,7 +411,7 @@ As an example, assume that you have 1 million vectors with a dimension of 256 an ### Quantization techniques -If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. +If your vectors are of the type `float`, you need to first convert them to the `byte` type before ingesting the documents. This conversion is accomplished by _quantizing the dataset_---reducing the precision of its vectors. There are many quantization techniques, such as scalar quantization or product quantization (PQ), which is used in the Faiss engine. The choice of quantization technique depends on the type of data you're using and can affect the accuracy of recall values. The following sections describe the scalar quantization algorithms that were used to quantize the [k-NN benchmarking test](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch) data for the [L2](#scalar-quantization-for-the-l2-space-type) and [cosine similarity](#scalar-quantization-for-the-cosine-similarity-space-type) space types. The provided pseudocode is for illustration purposes only. #### Scalar quantization for the L2 space type diff --git a/_install-and-configure/plugins.md b/_install-and-configure/plugins.md index 3a5d6a1834..e96b29e822 100644 --- a/_install-and-configure/plugins.md +++ b/_install-and-configure/plugins.md @@ -181,7 +181,7 @@ Continue with installation? [y/N]y ### Install a plugin using Maven coordinates -The `opensearch-plugin install` tool also allows you to specify Maven coordinates for available artifacts and versions hosted on [Maven Central](https://search.maven.org/search?q=org.opensearch.plugin). The tool parses the Maven coordinates you provide and constructs a URL. As a result, the host must be able to connect directly to the Maven Central site. The plugin installation fails if you pass coordinates to a proxy or local repository. +The `opensearch-plugin install` tool also allows you to specify Maven coordinates for available artifacts and versions hosted on [Maven Central](https://central.sonatype.com/namespace/org.opensearch.plugin). The tool parses the Maven coordinates you provide and constructs a URL. As a result, the host must be able to connect directly to the Maven Central site. The plugin installation fails if you pass coordinates to a proxy or local repository. #### Usage ```bash diff --git a/_monitoring-your-cluster/pa/index.md b/_monitoring-your-cluster/pa/index.md index bb4f9c6c30..156e985e8b 100644 --- a/_monitoring-your-cluster/pa/index.md +++ b/_monitoring-your-cluster/pa/index.md @@ -60,7 +60,7 @@ private-key-file-path = specify_path The Performance Analyzer plugin is included in the installations for [Docker]({{site.url}}{{site.baseurl}}/opensearch/install/docker/) and [tarball]({{site.url}}{{site.baseurl}}/opensearch/install/tar/), but you can also install the plugin manually. -To install the Performance Analyzer plugin manually, download the plugin from [Maven](https://search.maven.org/search?q=org.opensearch.plugin) and install it using the standard [plugin installation]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) process. Performance Analyzer runs on each node in a cluster. +To install the Performance Analyzer plugin manually, download the plugin from [Maven](https://central.sonatype.com/namespace/org.opensearch.plugin) and install it using the standard [plugin installation]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) process. Performance Analyzer runs on each node in a cluster. To start the Performance Analyzer root cause analysis (RCA) agent on a tarball installation, run the following command: diff --git a/_search-plugins/knn/disk-based-vector-search.md b/_search-plugins/knn/disk-based-vector-search.md new file mode 100644 index 0000000000..82da30a0ac --- /dev/null +++ b/_search-plugins/knn/disk-based-vector-search.md @@ -0,0 +1,193 @@ +--- +layout: default +title: Disk-based vector search +nav_order: 16 +parent: k-NN search +has_children: false +--- + +# Disk-based vector search +**Introduced 2.17** +{: .label .label-purple} + +For low-memory environments, OpenSearch provides _disk-based vector search_, which significantly reduces the operational costs for vector workloads. Disk-based vector search uses [binary quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#binary-quantization), compressing vectors and thereby reducing the memory requirements. This memory optimization provides large memory savings at the cost of slightly increased search latency while still maintaining strong recall. + +To use disk-based vector search, set the [`mode`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#vector-workload-modes) parameter to `on_disk` for your vector field type. This parameter will configure your index to use secondary storage. + +## Creating an index for disk-based vector search + +To create an index for disk-based vector search, send the following request: + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "innerproduct", + "data_type": "float", + "mode": "on_disk" + } + } + } +} +``` +{% include copy-curl.html %} + +By default, the `on_disk` mode configures the index to use the `faiss` engine and `hnsw` method. The default [`compression_level`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#compression-levels) of `32x` reduces the amount of memory the vectors require by a factor of 32. To preserve the search recall, rescoring is enabled by default. A search on a disk-optimized index runs in two phases: The compressed index is searched first, and then the results are rescored using full-precision vectors loaded from disk. + +To reduce the compression level, provide the `compression_level` parameter when creating the index mapping: + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "innerproduct", + "data_type": "float", + "mode": "on_disk", + "compression_level": "16x" + } + } + } +} +``` +{% include copy-curl.html %} + +For more information about the `compression_level` parameter, see [Compression levels]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#compression-levels). Note that for `4x` compression, the `lucene` engine will be used. +{: .note} + +If you need more granular fine-tuning, you can override additional k-NN parameters in the method definition. For example, to improve recall, increase the `ef_construction` parameter value: + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "innerproduct", + "data_type": "float", + "mode": "on_disk", + "method": { + "params": { + "ef_construction": 512 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The `on_disk` mode only works with the `float` data type. +{: .note} + +## Ingestion + +You can perform document ingestion for a disk-optimized vector index in the same way as for a regular vector index. To index several documents in bulk, send the following request: + +```json +POST _bulk +{ "index": { "_index": "my-vector-index", "_id": "1" } } +{ "my_vector_field": [1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5], "price": 12.2 } +{ "index": { "_index": "my-vector-index", "_id": "2" } } +{ "my_vector_field": [2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5], "price": 7.1 } +{ "index": { "_index": "my-vector-index", "_id": "3" } } +{ "my_vector_field": [3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5], "price": 12.9 } +{ "index": { "_index": "my-vector-index", "_id": "4" } } +{ "my_vector_field": [4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5], "price": 1.2 } +{ "index": { "_index": "my-vector-index", "_id": "5" } } +{ "my_vector_field": [5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-vector-index", "_id": "6" } } +{ "my_vector_field": [6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5], "price": 10.3 } +{ "index": { "_index": "my-vector-index", "_id": "7" } } +{ "my_vector_field": [7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5], "price": 5.5 } +{ "index": { "_index": "my-vector-index", "_id": "8" } } +{ "my_vector_field": [8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5, 8.5], "price": 4.4 } +{ "index": { "_index": "my-vector-index", "_id": "9" } } +{ "my_vector_field": [9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5], "price": 8.9 } +``` +{% include copy-curl.html %} + +## Search + +Search is also performed in the same way as in other index configurations. The key difference is that, by default, the `oversample_factor` of the rescore parameter is set to `3.0` (unless you override the `compression_level`). For more information, see [Rescoring quantized results using full precision]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#rescoring-quantized-results-using-full-precision). To perform vector search on a disk-optimized index, provide the search vector: + +```json +GET my-vector-index/_search +{ + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5], + "k": 5 + } + } + } +} +``` +{% include copy-curl.html %} + +Similarly to other index configurations, you can override k-NN parameters in the search request: + +```json +GET my-vector-index/_search +{ + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5], + "k": 5, + "method_parameters": { + "ef_search": 512 + }, + "rescore": { + "oversample_factor": 10.0 + } + } + } + } +} +``` +{% include copy-curl.html %} + +[Radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/) does not support disk-based vector search. +{: .note} + +## Model-based indexes + +For [model-based indexes]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model), you can specify the `on_disk` parameter in the training request in the same way that you would specify it during index creation. By default, `on_disk` mode will use the [Faiss IVF method]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#supported-faiss-methods) and a compression level of `32x`. To run the training API, send the following request: + +```json +POST /_plugins/_knn/models/_train/test-model +{ + "training_index": "train-index-name", + "training_field": "train-field-name", + "dimension": 8, + "max_training_vector_count": 1200, + "search_size": 100, + "description": "My model", + "space_type": "innerproduct", + "mode": "on_disk" +} +``` +{% include copy-curl.html %} + +This command assumes that training data has been ingested into the `train-index-name` index. For more information, see [Building a k-NN index from a model]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). +{: .note} + +You can override the `compression_level` for disk-optimized indexes in the same way as for regular k-NN indexes. + + +## Next steps + +- For more information about binary quantization, see [Binary quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#binary-quantization). +- For more information about k-NN vector workload modes, see [Vector workload modes]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#vector-workload-modes). \ No newline at end of file diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index 508f9e6535..a911dc91c9 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -436,7 +436,7 @@ GET my-vector-index/_search "my_vector_field": { "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], "k": 10, - "method_params": { + "method_parameters": { "ef_search": 10 }, "rescore": { diff --git a/_security/configuration/index.md b/_security/configuration/index.md index e351e8865f..f68667d92d 100644 --- a/_security/configuration/index.md +++ b/_security/configuration/index.md @@ -3,7 +3,7 @@ layout: default title: Configuration nav_order: 2 has_children: true -has_toc: false +has_toc: true redirect_from: - /security-plugin/configuration/ - /security-plugin/configuration/index/ @@ -11,21 +11,105 @@ redirect_from: # Security configuration -The plugin includes demo certificates so that you can get up and running quickly. To use OpenSearch in a production environment, you must configure it manually: +The Security plugin includes demo certificates so that you can get up and running quickly. To use OpenSearch with the Security plugin in a production environment, you must make changes to the demo certificates and other configuration options manually. -1. [Replace the demo certificates]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/docker/#configuring-basic-security-settings). -1. [Reconfigure `opensearch.yml` to use your certificates]({{site.url}}{{site.baseurl}}/security/configuration/tls). -1. [Reconfigure `config.yml` to use your authentication backend]({{site.url}}{{site.baseurl}}/security/configuration/configuration/) (if you don't plan to use the internal user database). -1. [Modify the configuration YAML files]({{site.url}}{{site.baseurl}}/security/configuration/yaml). -1. If you plan to use the internal user database, [set a password policy in `opensearch.yml`]({{site.url}}{{site.baseurl}}/security/configuration/yaml/#opensearchyml). -1. [Apply changes using the `securityadmin` script]({{site.url}}{{site.baseurl}}/security/configuration/security-admin). -1. Start OpenSearch. -1. [Add users, roles, role mappings, and tenants]({{site.url}}{{site.baseurl}}/security/access-control/index/). +## Replace the demo certificates -If you don't want to use the plugin, see [Disable security]({{site.url}}{{site.baseurl}}/security/configuration/disable-enable-security/). +OpenSearch ships with demo certificates intended for quick setup and demonstration purposes. For a production environment, it's critical to replace these with your own trusted certificates, using the following steps, to ensure secure communication: -The Security plugin has several default users, roles, action groups, permissions, and settings for OpenSearch Dashboards that use kibana in their names. We will change these names in a future release. +1. **Generate your own certificates:** Use tools like OpenSSL or a certificate authority (CA) to generate your own certificates. For more information about generating certificates with OpenSSL, see [Generating self-signed certificates]({{site.url}}{{site.baseurl}}/security/configuration/generate-certificates/). +2. **Store the generated certificates and private key in the appropriate directory:** Generated certificates are typically stored in `/config/`. For more information, see [Add certificate files to opensearch.yml]({{site.url}}{{site.baseurl}}/security/configuration/generate-certificates/#add-certificate-files-to-opensearchyml). +3. **Set the following file permissions:** + - Private key (.key files): Set the file mode to `600`. This restricts access so that only the file owner (the OpenSearch user) can read and write to the file, ensuring that the private key remains secure and inaccessible to unauthorized users. + - Public certificates (.crt, .pem files): Set the file mode to `644`. This allows the file owner to read and write to the file, while other users can only read it. + +For additional guidance on file modes, see the following table. + + | Item | Sample | Numeric | Bitwise | + |-------------|---------------------|---------|--------------| + | Public key | `~/.ssh/id_rsa.pub` | `644` | `-rw-r--r--` | + | Private key | `~/.ssh/id_rsa` | `600` | `-rw-------` | + | SSH folder | `~/.ssh` | `700` | `drwx------` | + +For more information, see [Configuring basic security settings]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/docker/#configuring-basic-security-settings). + +## Reconfigure `opensearch.yml` to use your certificates + +The `opensearch.yml` file is the main configuration file for OpenSearch; you can find the file at `/config/opensearch.yml`. Use the following steps to update this file to point to your custom certificates: + +In `opensearch.yml`, set the correct paths for your certificates and keys, as shown in the following example: + ``` + plugins.security.ssl.transport.pemcert_filepath: /path/to/your/cert.pem + plugins.security.ssl.transport.pemkey_filepath: /path/to/your/key.pem + plugins.security.ssl.transport.pemtrustedcas_filepath: /path/to/your/ca.pem + plugins.security.ssl.http.enabled: true + plugins.security.ssl.http.pemcert_filepath: /path/to/your/cert.pem + plugins.security.ssl.http.pemkey_filepath: /path/to/your/key.pem + plugins.security.ssl.http.pemtrustedcas_filepath: /path/to/your/ca.pem + ``` +For more information, see [Configuring TLS certificates]({{site.url}}{{site.baseurl}}/security/configuration/tls/). + +## Reconfigure `config.yml` to use your authentication backend + +The `config.yml` file allows you to configure the authentication and authorization mechanisms for OpenSearch. Update the authentication backend settings in `/config/opensearch-security/config.yml` according to your requirements. + +For example, to use LDAP as your authentication backend, add the following settings: + + ``` + authc: + basic_internal_auth: + http_enabled: true + transport_enabled: true + order: 1 + http_authenticator: + type: basic + challenge: true + authentication_backend: + type: internal + ``` +For more information, see [Configuring the Security backend]({{site.url}}{{site.baseurl}}/security/configuration/configuration/). + +## Modify the configuration YAML files + +Determine whether any additional YAML files need modification, for example, the `roles.yml`, `roles_mapping.yml`, or `internal_users.yml` files. Update the files with any additional configuration information. For more information, see [Modifying the YAML files]({{site.url}}{{site.baseurl}}/security/configuration/yaml/). + +## Set a password policy + +When using the internal user database, we recommend enforcing a password policy to ensure that strong passwords are used. For information about strong password policies, see [Password settings]({{site.url}}{{site.baseurl}}/security/configuration/yaml/#password-settings). + +## Apply changes using the `securityadmin` script + +The following steps do not apply to first-time users because the security index is automatically initialized from the YAML configuration files when OpenSearch starts. +{: .note} + +After initial setup, if you make changes to your security configuration or disable automatic initialization by setting `plugins.security.allow_default_init_securityindex` to `false` (which prevents security index initialization from `yaml` files), you need to manually apply changes using the `securityadmin` script: + +1. Find the `securityadmin` script. The script is typically stored in the OpenSearch plugins directory, `plugins/opensearch-security/tools/securityadmin.[sh|bat]`. + - Note: If you're using OpenSearch 1.x, the `securityadmin` script is located in the `plugins/opendistro_security/tools/` directory. + - For more information, see [Basic usage](https://opensearch.org/docs/latest/security/configuration/security-admin/#basic-usage). +2. Run the script by using the following command: + ``` + ./plugins/opensearch-security/tools/securityadmin.[sh|bat] + ``` +3. Check the OpenSearch logs and configuration to ensure that the changes have been successfully applied. + +For more information about using the `securityadmin` script, see [Applying changes to configuration files]({{site.url}}{{site.baseurl}}/security/configuration/security-admin/). + +## Add users, roles, role mappings, and tenants + +If you don't want to use the Security plugin, you can disable it by adding the following setting to the `opensearch.yml` file: + +``` +plugins.security.disabled: true +``` + +You can then enable the plugin by removing the `plugins.security.disabled` setting. + +For more information about disabling the Security plugin, see [Disable security]({{site.url}}{{site.baseurl}}/security/configuration/disable-enable-security/). + +The Security plugin has several default users, roles, action groups, permissions, and settings for OpenSearch Dashboards that contain "Kibana" in their names. We will change these names in a future version. {: .note } -For a full list of `opensearch.yml` Security plugin settings, Security plugin settings, see [Security settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/security-settings/). +For a full list of `opensearch.yml` Security plugin settings, see [Security settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/security-settings/). {: .note} + diff --git a/_security/configuration/yaml.md b/_security/configuration/yaml.md index 1686c8332e..2694e3a24f 100644 --- a/_security/configuration/yaml.md +++ b/_security/configuration/yaml.md @@ -265,7 +265,7 @@ kibana_server: ## roles.yml -This file contains any initial roles that you want to add to the Security plugin. Aside from some metadata, the default file is empty, because the Security plugin has a number of static roles that it adds automatically. +This file contains any initial roles that you want to add to the Security plugin. By default, this file contains predefined roles that grant usage to plugins within the default distribution of OpenSearch. The Security plugin will also add a number static roles automatically. ```yml --- diff --git a/_tuning-your-cluster/index.md b/_tuning-your-cluster/index.md index 99db78565f..fa0973395f 100644 --- a/_tuning-your-cluster/index.md +++ b/_tuning-your-cluster/index.md @@ -192,11 +192,27 @@ To better understand and monitor your cluster, use the [CAT API]({{site.url}}{{s ## (Advanced) Step 6: Configure shard allocation awareness or forced awareness +To further fine-tune your shard allocation, you can set custom node attributes for shard allocation awareness or forced awareness. + ### Shard allocation awareness -If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone that’s different from their primary shard. +You can set custom node attributes on OpenSearch nodes to be used for shard allocation awareness. For example, you can set the `zone` attribute on each node to represent the zone in which the node is located. You can also use the `zone` attribute to ensure that the primary shard and its replica shards are allocated in a balanced manner across available, distinct zones. In this scenario, maximum shard copies per zone would equal `ceil (number_of_shard_copies/number_of_distinct_zones)`. + +OpenSearch, by default, allocates shard copies of a single shard across different nodes. When only 1 zone is available, such as after a zone failure, OpenSearch allocates replica shards to the only remaining zone---it considers only available zones (attribute values) when calculating the maximum number of allowed shard copies per zone. + +For example, if your index has a total of 5 shard copies (1 primary and 4 replicas) and nodes in 3 distinct zones, then OpenSearch will perform the following to allocate all 5 shard copies: + +- Allocate no more than 2 shards per zone, which will require at least 2 nodes in 2 zones. +- Allocate the last shard in the third zone, with at least 1 node needed in the third zone. -With shard allocation awareness, if the nodes in one of your zones fail, you can be assured that your replica shards are spread across your other zones. It adds a layer of fault tolerance to ensure your data survives a zone failure beyond just individual node failures. +Alternatively, if you have 3 nodes in the first zone and 1 node in each remaining zone, then OpenSearch will allocate: + +- 2 shard copies in the first zone. +- 1 shard copy in the remaining 2 zones. + +The final shard copy will remain unallocated due to the lack of nodes. + +With shard allocation awareness, if the nodes in one of your zones fail, you can be assured that your replica shards are spread across your other zones, adding a layer of fault tolerance to ensure that your data survives zone failures. To configure shard allocation awareness, add zone attributes to `opensearch-d1` and `opensearch-d2`, respectively: @@ -219,6 +235,8 @@ PUT _cluster/settings } ``` +You can also use multiple attributes for shard allocation awareness by providing the attributes as a comma-separated string, for example, `zone,rack`. + You can either use `persistent` or `transient` settings. We recommend the `persistent` setting because it persists through a cluster reboot. Transient settings don't persist through a cluster reboot. Shard allocation awareness attempts to separate primary and replica shards across multiple zones. However, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.