From 709a54f59cc96f615f2d66572222df64e436ffb1 Mon Sep 17 00:00:00 2001 From: Vikasht34 Date: Tue, 17 Sep 2024 08:19:49 -0700 Subject: [PATCH 1/5] Documentation for Binary Quantization Support with KNN Vector Search (#8281) * Documentation for Binary Quantization Support with KNN Vector Search Signed-off-by: VIKASH TIWARI * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _search-plugins/knn/knn-vector-quantization.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: VIKASH TIWARI Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../knn/knn-vector-quantization.md | 174 +++++++++++++++++- 1 file changed, 173 insertions(+), 1 deletion(-) diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md index fbdcb4ad2e..508f9e6535 100644 --- a/_search-plugins/knn/knn-vector-quantization.md +++ b/_search-plugins/knn/knn-vector-quantization.md @@ -11,7 +11,7 @@ has_math: true By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. -OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, and product quantization (PQ). +OpenSearch supports many varieties of quantization. In general, the level of quantization will provide a trade-off between the accuracy of the nearest neighbor search and the size of the memory footprint consumed by the vector search. The supported types include byte vectors, 16-bit scalar quantization, product quantization (PQ), and binary quantization(BQ). ## Byte vectors @@ -310,3 +310,175 @@ For example, assume that you have 1 million vectors with a dimension of 256, `iv ```r 1.1*((8 / 8 * 64 + 24) * 1000000 + 100 * (2^8 * 4 * 256 + 4 * 512 * 256)) ~= 0.171 GB ``` + +## Binary quantization + +Starting with version 2.17, OpenSearch supports BQ with binary vector support for the Faiss engine. BQ compresses vectors into a binary format (0s and 1s), making it highly efficient in terms of memory usage. You can choose to represent each vector dimension using 1, 2, or 4 bits, depending on the desired precision. One of the advantages of using BQ is that the training process is handled automatically during indexing. This means that no separate training step is required, unlike other quantization techniques such as PQ. + +### Using BQ +To configure BQ for the Faiss engine, define a `knn_vector` field and specify the `mode` as `on_disk`. This configuration defaults to 1-bit BQ and both `ef_search` and `ef_construction` set to `100`: + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "l2", + "data_type": "float", + "mode": "on_disk" + } + } + } +} +``` +{% include copy-curl.html %} + +To further optimize the configuration, you can specify additional parameters, such as the compression level, and fine-tune the search parameters. For example, you can override the `ef_construction` value or define the compression level, which corresponds to the number of bits used for quantization: + +- **32x compression** for 1-bit quantization +- **16x compression** for 2-bit quantization +- **8x compression** for 4-bit quantization + +This allows for greater control over memory usage and recall performance, providing flexibility to balance between precision and storage efficiency. + +To specify the compression level, set the `compression_level` parameter: + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "space_type": "l2", + "data_type": "float", + "mode": "on_disk", + "compression_level": "16x", + "method": { + "params": { + "ef_construction": 16 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +The following example further fine-tunes the configuration by defining `ef_construction`, `encoder`, and the number of `bits` (which can be `1`, `2`, or `4`): + +```json +PUT my-vector-index +{ + "mappings": { + "properties": { + "my_vector_field": { + "type": "knn_vector", + "dimension": 8, + "method": { + "name": "hnsw", + "engine": "faiss", + "space_type": "l2", + "params": { + "m": 16, + "ef_construction": 512, + "encoder": { + "name": "binary", + "parameters": { + "bits": 1 + } + } + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +### Search using binary quantized vectors + +You can perform a k-NN search on your index by providing a vector and specifying the number of nearest neighbors (k) to return: + +```json +GET my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], + "k": 10 + } + } + } +} +``` +{% include copy-curl.html %} + +You can also fine-tune search by providing the `ef_search` and `oversample_factor` parameters. +The `oversample_factor` parameter controls the factor by which the search oversamples the candidate vectors before ranking them. Using a higher oversample factor means that more candidates will be considered before ranking, improving accuracy but also increasing search time. When selecting the `oversample_factor` value, consider the trade-off between accuracy and efficiency. For example, setting the `oversample_factor` to `2.0` will double the number of candidates considered during the ranking phase, which may help achieve better results. + +The following request specifies the `ef_search` and `oversample_factor` parameters: + +```json +GET my-vector-index/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector_field": { + "vector": [1.5, 5.5, 1.5, 5.5, 1.5, 5.5, 1.5, 5.5], + "k": 10, + "method_params": { + "ef_search": 10 + }, + "rescore": { + "oversample_factor": 10.0 + } + } + } + } +} +``` +{% include copy-curl.html %} + + +#### HNSW memory estimation + +The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as `1.1 * (dimension + 8 * m)` bytes/vector, where `m` is the maximum number of bidirectional links created for each element during the construction of the graph. + +As an example, assume that you have 1 million vectors with a dimension of 256 and an `m` of 16. The following sections provide memory requirement estimations for various compression values. + +##### 1-bit quantization (32x compression) + +In 1-bit quantization, each dimension is represented using 1 bit, equivalent to a 32x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000 + ~= 0.176 GB +``` + +##### 2-bit quantization (16x compression) + +In 2-bit quantization, each dimension is represented using 2 bits, equivalent to a 16x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000 + ~= 0.211 GB +``` + +##### 4-bit quantization (8x compression) + +In 4-bit quantization, each dimension is represented using 4 bits, equivalent to an 8x compression factor. The memory requirement can be estimated as follows: + +```r +Memory = 1.1 * ((256 * 4 / 8) + 8 * 16) * 1,000,000 + ~= 0.282 GB +``` From db292d93250fb737d0648fa8c80d86e4a44981b1 Mon Sep 17 00:00:00 2001 From: Bhavana Ramaram Date: Tue, 17 Sep 2024 13:01:44 -0500 Subject: [PATCH 2/5] Get offline batch inference details using task API in m (#8305) * get offline batch inference details using task API in ml Signed-off-by: Bhavana Ramaram * Doc review Signed-off-by: Fanit Kolchina * Typo fix Signed-off-by: Fanit Kolchina * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/api/model-apis/batch-predict.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _ml-commons-plugin/api/model-apis/batch-predict.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Add parameter values Signed-off-by: Fanit Kolchina * Extra spaces Signed-off-by: Fanit Kolchina --------- Signed-off-by: Bhavana Ramaram Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../api/model-apis/batch-predict.md | 140 +++++++++++++----- 1 file changed, 102 insertions(+), 38 deletions(-) diff --git a/_ml-commons-plugin/api/model-apis/batch-predict.md b/_ml-commons-plugin/api/model-apis/batch-predict.md index b32fbb108d..c1dc7348fe 100644 --- a/_ml-commons-plugin/api/model-apis/batch-predict.md +++ b/_ml-commons-plugin/api/model-apis/batch-predict.md @@ -31,7 +31,13 @@ POST /_plugins/_ml/models//_batch_predict ## Prerequisites -Before using the Batch Predict API, you need to create a connector to the externally hosted model. For example, to create a connector to an OpenAI `text-embedding-ada-002` model, send the following request: +Before using the Batch Predict API, you need to create a connector to the externally hosted model. For each action, specify the `action_type` parameter that describes the action: + +- `batch_predict`: Runs the batch predict operation. +- `batch_predict_status`: Checks the batch predict operation status. +- `cancel_batch_predict`: Cancels the batch predict operation. + +For example, to create a connector to an OpenAI `text-embedding-ada-002` model, send the following request. The `cancel_batch_predict` action is optional and supports canceling the batch job running on OpenAI: ```json POST /_plugins/_ml/connectors/_create @@ -68,6 +74,22 @@ POST /_plugins/_ml/connectors/_create "Authorization": "Bearer ${credential.openAI_key}" }, "request_body": "{ \"input_file_id\": \"${parameters.input_file_id}\", \"endpoint\": \"${parameters.endpoint}\", \"completion_window\": \"24h\" }" + }, + { + "action_type": "batch_predict_status", + "method": "GET", + "url": "https://api.openai.com/v1/batches/${parameters.id}", + "headers": { + "Authorization": "Bearer ${credential.openAI_key}" + } + }, + { + "action_type": "cancel_batch_predict", + "method": "POST", + "url": "https://api.openai.com/v1/batches/${parameters.id}/cancel", + "headers": { + "Authorization": "Bearer ${credential.openAI_key}" + } } ] } @@ -123,45 +145,87 @@ POST /_plugins/_ml/models/lyjxwZABNrAVdFa9zrcZ/_batch_predict #### Example response +The response contains the task ID for the batch predict operation: + ```json { - "inference_results": [ - { - "output": [ - { - "name": "response", - "dataAsMap": { - "id": "batch_", - "object": "batch", - "endpoint": "/v1/embeddings", - "errors": null, - "input_file_id": "file-", - "completion_window": "24h", - "status": "validating", - "output_file_id": null, - "error_file_id": null, - "created_at": 1722037257, - "in_progress_at": null, - "expires_at": 1722123657, - "finalizing_at": null, - "completed_at": null, - "failed_at": null, - "expired_at": null, - "cancelling_at": null, - "cancelled_at": null, - "request_counts": { - "total": 0, - "completed": 0, - "failed": 0 - }, - "metadata": null - } - } - ], - "status_code": 200 - } - ] + "task_id": "KYZSv5EBqL2d0mFvs80C", + "status": "CREATED" } ``` -For the definition of each field in the result, see [OpenAI Batch API](https://platform.openai.com/docs/guides/batch). Once the batch inference is complete, you can download the output by calling the [OpenAI Files API](https://platform.openai.com/docs/api-reference/files) and providing the file name specified in the `id` field of the response. \ No newline at end of file +To check the status of the batch predict job, provide the task ID to the [Tasks API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/). You can find the job details in the `remote_job` field in the task. Once the prediction is complete, the task `state` changes to `COMPLETED`. + +#### Example request + +```json +GET /_plugins/_ml/tasks/KYZSv5EBqL2d0mFvs80C +``` +{% include copy-curl.html %} + +#### Example response + +The response contains the batch predict operation details in the `remote_job` field: + +```json +{ + "model_id": "JYZRv5EBqL2d0mFvKs1E", + "task_type": "BATCH_PREDICTION", + "function_name": "REMOTE", + "state": "RUNNING", + "input_type": "REMOTE", + "worker_node": [ + "Ee5OCIq0RAy05hqQsNI1rg" + ], + "create_time": 1725491751455, + "last_update_time": 1725491751455, + "is_async": false, + "remote_job": { + "cancelled_at": null, + "metadata": null, + "request_counts": { + "total": 3, + "completed": 3, + "failed": 0 + }, + "input_file_id": "file-XXXXXXXXXXXX", + "output_file_id": "file-XXXXXXXXXXXXX", + "error_file_id": null, + "created_at": 1725491753, + "in_progress_at": 1725491753, + "expired_at": null, + "finalizing_at": 1725491757, + "completed_at": null, + "endpoint": "/v1/embeddings", + "expires_at": 1725578153, + "cancelling_at": null, + "completion_window": "24h", + "id": "batch_XXXXXXXXXXXXXXX", + "failed_at": null, + "errors": null, + "object": "batch", + "status": "in_progress" + } +} +``` + +For the definition of each field in the result, see [OpenAI Batch API](https://platform.openai.com/docs/guides/batch). Once the batch inference is complete, you can download the output by calling the [OpenAI Files API](https://platform.openai.com/docs/api-reference/files) and providing the file name specified in the `id` field of the response. + +### Canceling a batch predict job + +You can also cancel the batch predict operation running on the remote platform using the task ID returned by the batch predict request. To add this capability, set the `action_type` to `cancel_batch_predict` in the connector configuration when creating the connector. + +#### Example request + +```json +POST /_plugins/_ml/tasks/KYZSv5EBqL2d0mFvs80C/_cancel_batch +``` +{% include copy-curl.html %} + +#### Example response + +```json +{ + "status": "OK" +} +``` From 22975b9e1a5d72684bf69e76f2896dd9875ce96a Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 17 Sep 2024 17:04:50 -0400 Subject: [PATCH 3/5] Add 2.17 version (#8308) Signed-off-by: Fanit Kolchina --- _config.yml | 6 +++--- _data/versions.json | 7 ++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/_config.yml b/_config.yml index 8a43e2f61a..4ead6344c2 100644 --- a/_config.yml +++ b/_config.yml @@ -5,9 +5,9 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com permalink: /:path/ -opensearch_version: '2.16.0' -opensearch_dashboards_version: '2.16.0' -opensearch_major_minor_version: '2.16' +opensearch_version: '2.17.0' +opensearch_dashboards_version: '2.17.0' +opensearch_major_minor_version: '2.17' lucene_version: '9_11_1' # Build settings diff --git a/_data/versions.json b/_data/versions.json index 4f7e55c21b..c14e91fa0c 100644 --- a/_data/versions.json +++ b/_data/versions.json @@ -1,10 +1,11 @@ { - "current": "2.16", + "current": "2.17", "all": [ - "2.16", + "2.17", "1.3" ], "archived": [ + "2.16", "2.15", "2.14", "2.13", @@ -25,7 +26,7 @@ "1.1", "1.0" ], - "latest": "2.16" + "latest": "2.17" } From a1a15c04ea1e453f02d1f4ce1c23e03a9f1bbae7 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 17 Sep 2024 17:08:49 -0400 Subject: [PATCH 4/5] Add release notes 2.17 (#8311) Signed-off-by: Fanit Kolchina --- ...arch-documentation-release-notes-2.17.0.md | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 release-notes/opensearch-documentation-release-notes-2.17.0.md diff --git a/release-notes/opensearch-documentation-release-notes-2.17.0.md b/release-notes/opensearch-documentation-release-notes-2.17.0.md new file mode 100644 index 0000000000..d9ed51737c --- /dev/null +++ b/release-notes/opensearch-documentation-release-notes-2.17.0.md @@ -0,0 +1,36 @@ +# OpenSearch Documentation Website 2.17.0 Release Notes + +The OpenSearch 2.17.0 documentation includes the following additions and updates. + +## New documentation for 2.17.0 + +- Get offline batch inference details using task API in m [#8305](https://github.com/opensearch-project/documentation-website/pull/8305) +- Documentation for Binary Quantization Support with KNN Vector Search [#8281](https://github.com/opensearch-project/documentation-website/pull/8281) +- add offline batch ingestion tech doc [#8251](https://github.com/opensearch-project/documentation-website/pull/8251) +- Add documentation changes for disk-based k-NN [#8246](https://github.com/opensearch-project/documentation-website/pull/8246) +- Derived field updates for 2.17 [#8244](https://github.com/opensearch-project/documentation-website/pull/8244) +- Add changes for multiple signing keys [#8243](https://github.com/opensearch-project/documentation-website/pull/8243) +- Add documentation changes for Snapshot Status API [#8235](https://github.com/opensearch-project/documentation-website/pull/8235) +- Update flow framework additional fields in previous_node_inputs [#8233](https://github.com/opensearch-project/documentation-website/pull/8233) +- Add documentation changes for shallow snapshot v2 [#8207](https://github.com/opensearch-project/documentation-website/pull/8207) +- Add documentation for context and ABC templates [#8197](https://github.com/opensearch-project/documentation-website/pull/8197) +- Create documentation for snapshots with hashed prefix path type [#8196](https://github.com/opensearch-project/documentation-website/pull/8196) +- Adding documentation for remote index use in AD [#8191](https://github.com/opensearch-project/documentation-website/pull/8191) +- Doc update for concurrent search [#8181](https://github.com/opensearch-project/documentation-website/pull/8181) +- Adding new cluster search setting docs [#8180](https://github.com/opensearch-project/documentation-website/pull/8180) +- Add new settings for remote publication [#8176](https://github.com/opensearch-project/documentation-website/pull/8176) +- Grouping Top N queries documentation [#8173](https://github.com/opensearch-project/documentation-website/pull/8173) +- Document reprovision param for Update Workflow API [#8172](https://github.com/opensearch-project/documentation-website/pull/8172) +- Add documentation for Faiss byte vector [#8170](https://github.com/opensearch-project/documentation-website/pull/8170) +- Terms query can accept encoded terms input as bitmap [#8133](https://github.com/opensearch-project/documentation-website/pull/8133) +- Update doc for adding new param in cat shards action for cancellation… [#8127](https://github.com/opensearch-project/documentation-website/pull/8127) +- Add docs on skip_validating_missing_parameters in ml-commons connector [#8118](https://github.com/opensearch-project/documentation-website/pull/8118) +- Add Split Response Processor to 2.17 Search Pipeline docs [#8081](https://github.com/opensearch-project/documentation-website/pull/8081) +- Added documentation for FGAC for Flow Framework [#8076](https://github.com/opensearch-project/documentation-website/pull/8076) +- Remove composite agg limitations for concurrent search [#7904](https://github.com/opensearch-project/documentation-website/pull/7904) +- Add doc for nodes stats search.request.took fields [#7887](https://github.com/opensearch-project/documentation-website/pull/7887) +- Add documentation for ignore_hosts config option for ip-based rate limiting [#7859](https://github.com/opensearch-project/documentation-website/pull/7859) + +## Documentation for 2.17.0 experimental features + +- Document new experimental ingestion streaming APIs [#8123](https://github.com/opensearch-project/documentation-website/pull/8123) From 842cd9e1fe6d9aa853fa13fc1ed7878d750b1fb5 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 17 Sep 2024 17:16:33 -0400 Subject: [PATCH 5/5] Add 2.17 to version history (#8309) Signed-off-by: Fanit Kolchina --- _about/version-history.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_about/version-history.md b/_about/version-history.md index fd635aff5b..47253558e9 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -9,6 +9,7 @@ permalink: /version-history/ OpenSearch version | Release highlights | Release date :--- | :--- | :--- +[2.17.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.0.md) | Includes disk-optimized vector search, binary quantization, and byte vector encoding in k-NN. Adds asynchronous batch ingestion for ML tasks. Provides search and query performance enhancements and a new custom trace source in trace analytics. Includes application-based configuration templates. For a full list of release highlights, see the Release Notes. | 17 September 2024 [2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024 [2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024