Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the K-NN Filters documentation due to recent enhancements in… #4987

Merged
merged 6 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions _search-plugins/knn/filter-search-knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ To refine k-NN results, you can filter a k-NN search using one of the following

- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned (if there are at least `k` results in total). This approach is supported by the following engines:
- Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Faiss engine with an HNSW algorithm (k-NN plugin versions 2.9 or later)
- Faiss engine with an HNSW algorithm (k-NN plugin versions 2.9 or later) or IVF algorithm (k-NN plugin versions 2.10 or later)
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
navneet1v marked this conversation as resolved.
Show resolved Hide resolved

- [Post-filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. You can use the following two filtering strategies for this approach:
- [Boolean post-filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently, and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query.
Expand All @@ -25,7 +25,7 @@ The following table summarizes the preceding filtering use cases.

Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause
:--- | :--- | :--- | :---
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`) | Inside the k-NN query clause.
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause.
Boolean filter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause. Must be a leaf clause.
The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause.
Scoring script filter | Before search (pre-filtering) | Exact | N/A | Inside the script score query clause.
Expand All @@ -42,12 +42,12 @@ Once you've estimated the number of documents in your index, the restrictiveness

| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency |
| :-- | :-- | :-- | :-- | :-- |
| 10M | 2.5 | 100 | Scoring script | Scoring script |
| 10M | 38 | 100 | Efficient k-NN filtering | Boolean filter |
| 10M | 80 | 100 | Scoring script | Efficient k-NN filtering |
| 1M | 2.5 | 100 | Efficient k-NN filtering | Scoring script |
| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering/scoring script |
| 1M | 80 | 100 | Efficient k-NN filtering | Boolean filter |
| 10M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script |
| 10M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 10M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 1M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script |
| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 1M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |

## Efficient k-NN filtering

Expand Down Expand Up @@ -261,13 +261,16 @@ For more ways to construct a filter, see [Constructing a filter](#constructing-a

### Faiss k-NN filter implementation

Starting with k-NN plugin version 2.9, you can use `faiss` filters for k-NN searches.
For k-NN searches, you can use `faiss` filters with an HNSW algorithm (k-NN plugin versions 2.9 or later) or IVF algorithm (k-NN plugin versions 2.10 or later).
navneet1v marked this conversation as resolved.
Show resolved Hide resolved

When you specify a Faiss filter for a k-NN search, the Faiss algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The algorithm uses the following variables:

- N: The number of documents in the index.
- P: The number of documents in the document subset after the filter is applied (P <= N).
- k: The maximum number of vectors to return in the response.
- R: The number of results returned after doing the Filtered Approximate Nearest Neighbor Search.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
- FT: An index-level threshold defined in the [`knn.advanced.filtered_exact_search_threshold` setting]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings/) that specifies to switch to exact search.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
- MDC: The maximum number of distance computations allowed in exact search if `f` filtered threshold is not set. This value cannot be changed.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved

The following flow chart outlines the Faiss algorithm.

Expand Down Expand Up @@ -699,4 +702,4 @@ POST /hotels-index/_search
}
}
```
{% include copy-curl.html %}
{% include copy-curl.html %}
1 change: 1 addition & 0 deletions _search-plugins/knn/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ Setting | Default | Description
`knn.plugin.enabled`| true | Enables or disables the k-NN plugin.
`knn.model.index.number_of_shards`| 1 | Number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate k-NN Search.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
`knn.model.index.number_of_replicas`| 1 | Number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
`knn.advanced.filtered_exact_search_threshold`| null | Threshold value for the filtered IDs that is used to switch to exact search during filtered ANN search. If number of filtered IDs in a segment is less than this setting's value, exact search will be done on the filtered IDs.
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
navneet1v marked this conversation as resolved.
Show resolved Hide resolved
Binary file modified images/faiss-algorithm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.