Skip to content

Latest commit

 

History

History
288 lines (243 loc) · 14 KB

knn.md

File metadata and controls

288 lines (243 loc) · 14 KB

k-NN Plugin

Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. See the plugin's documentation for more information.

Basic Approximate k-NN

In the following example we create a 5-dimensional k-NN index with random data. You can find a synchronous version of this working sample in samples/src/main/java/org/opensearch/client/samples/knn/KnnBasics.java.

$ ./gradlew :samples:run -Dsamples.mainClass=knn.KnnBasics

[Main] INFO  - Running main class: org.opensearch.client.samples.knn.KnnBasics
[KnnBasics] INFO  - Server: [email protected]
[KnnBasics] INFO  - Creating index my-index
[KnnBasics] INFO  - Indexing 10 vectors
[KnnBasics] INFO  - Waiting for indexing to finish
[KnnBasics] INFO  - Searching for vector [0.67, 0.67, 0.37, 0.0, 0.72]
[KnnBasics] INFO  - Found {values=[0.32, 0.96, 0.41, 0.04, 0.9]} with score 0.8050233
[KnnBasics] INFO  - Found {values=[0.04, 0.58, 0.13, 0.27, 0.37]} with score 0.6031363
[KnnBasics] INFO  - Found {values=[0.96, 0.88, 0.8, 0.41, 0.18]} with score 0.5640794
[KnnBasics] INFO  - Deleting index my-index

Create an Index

final var indexName = "my-index";
final var dimensions = 5;

client.indices().create(r -> r
        .index(indexName)
        .settings(s -> s.knn(true))
        .mappings(m -> m
                .properties("values", p -> p
                        .knnVector(k -> k.dimension(dimensions)))));

Index Vectors

Given the following document class definition:

public static class Doc {
    private float[] values;

    public Doc() {}

    public Doc(float[] values) {
        this.values = values;
    }

    public static Doc rand(int dimensions) {
        var values = new float[dimensions];
        for (var i = 0; i < dimensions; ++i) {
            values[i] = Math.round(Math.random() * 100.0) / 100.0f;
        }
        return new Doc(values);
    }

    // Getters/Setters & toString elided
}

Create 10 random vectors and insert them using the bulk API:

final var nVectors = 10;
var bulkRequest = new BulkRequest.Builder();
for (var i = 0; i < nVectors; ++i) {
    var id = Integer.toString(i);
    var doc = Doc.rand(dimensions);
    bulkRequest.operations(b -> b
            .index(o -> o
                    .index(indexName)
                    .id(id)
                    .document(doc)));
}

client.bulk(bulkRequest.build());

client.indices().refresh(i -> i.index(indexName));

Search for Nearest Neighbors

Create a random vector of the same size and search for its nearest neighbors.

final var searchVector = new float[dimensions];
for (var i = 0; i < dimensions; ++i) {
    searchVector[i] = Math.round(Math.random() * 100.0) / 100.0f;
}

var searchResponse = client.search(s -> s
        .index(indexName)
        .query(q -> q
                .knn(k -> k
                        .field("values")
                        .vector(searchVector)
                        .k(3))),
        Doc.class);

for (var hit : searchResponse.hits().hits()) {
    System.out.println(hit.source());
}

Approximate k-NN with a Boolean Filter

In the KnnBooleanFilter.java sample we create a 5-dimensional k-NN index with random data and a metadata field that contains a book genre (e.g. fiction). The search query is a k-NN search filtered by genre. The filter clause is outside the k-NN query clause and is applied after the k-NN search.

var searchResponse = client.search(s -> s
        .index(indexName)
        .query(q -> q
                .bool(b -> b
                        .filter(f -> f
                                .bool(b2 -> b2
                                        .must(m -> m
                                                .term(t -> t
                                                        .field("metadata.genre")
                                                        .value(v -> v.stringValue(searchGenre))))))
                        .must(m -> m
                                .knn(k -> k
                                        .field("values")
                                        .vector(searchVector)
                                        .k(5))))),
        Doc.class);
$ ./gradlew :samples:run -Dsamples.mainClass=knn.KnnBooleanFilter

[Main] INFO  - Running main class: org.opensearch.client.samples.knn.KnnBooleanFilter
[KnnBooleanFilter] INFO  - Server: [email protected]
[KnnBooleanFilter] INFO  - Creating index my-index
[KnnBooleanFilter] INFO  - Indexing 3000 vectors
[KnnBooleanFilter] INFO  - Waiting for indexing to finish
[KnnBooleanFilter] INFO  - Searching for vector [0.18, 0.71, 0.44, 0.03, 0.42] with the 'drama' genre
[KnnBooleanFilter] INFO  - Found {values=[0.21, 0.58, 0.55, 0.09, 0.45], metadata={genre=drama}} with score 0.966744
[KnnBooleanFilter] INFO  - Deleting index my-index

Approximate k-NN with an Efficient Filter

In the KnnEfficientFilter.java sample we implement the example in the k-NN documentation, which creates an index that uses the Lucene engine and HNSW as the method in the mapping, containing hotel location and parking data, then search for the top three hotels near the location with the coordinates [5, 4] that are rated between 8 and 10, inclusive, and provide parking.

var searchResponse = client.search(s -> s
        .index(indexName)
        .size(3)
        .query(q -> q
                .knn(k -> k
                        .field("location")
                        .vector(searchLocation)
                        .k(3)
                        .filter(Query.of(f -> f
                                .bool(b -> b
                                        .must(m -> m
                                                .range(r -> r
                                                        .field("rating")
                                                        .gte(JsonData.of(searchRatingMin))
                                                        .lte(JsonData.of(searchRatingMax))))
                                        .must(m -> m
                                                .term(t -> t
                                                        .field("parking")
                                                        .value(FieldValue.of(searchParking))))))))),
        Hotel.class);
$ ./gradlew :samples:run -Dsamples.mainClass=knn.KnnEfficientFilter

[Main] INFO  - Running main class: org.opensearch.client.samples.knn.KnnEfficientFilter
[KnnEfficientFilter] INFO  - Server: [email protected]
[KnnEfficientFilter] INFO  - Creating index hotels-index
[KnnEfficientFilter] INFO  - Indexing hotel {location=[5.2, 4.0], parking=true, rating=5} with id 1
[KnnEfficientFilter] INFO  - Indexing hotel {location=[5.2, 3.9], parking=false, rating=4} with id 2
[KnnEfficientFilter] INFO  - Indexing hotel {location=[4.9, 3.4], parking=true, rating=9} with id 3
[KnnEfficientFilter] INFO  - Indexing hotel {location=[4.2, 4.6], parking=false, rating=6} with id 4
[KnnEfficientFilter] INFO  - Indexing hotel {location=[3.3, 4.5], parking=true, rating=8} with id 5
[KnnEfficientFilter] INFO  - Indexing hotel {location=[6.4, 3.4], parking=true, rating=9} with id 6
[KnnEfficientFilter] INFO  - Indexing hotel {location=[4.2, 6.2], parking=true, rating=5} with id 7
[KnnEfficientFilter] INFO  - Indexing hotel {location=[2.4, 4.0], parking=true, rating=8} with id 8
[KnnEfficientFilter] INFO  - Indexing hotel {location=[1.4, 3.2], parking=false, rating=5} with id 9
[KnnEfficientFilter] INFO  - Indexing hotel {location=[7.0, 9.9], parking=true, rating=9} with id 10
[KnnEfficientFilter] INFO  - Indexing hotel {location=[3.0, 2.3], parking=false, rating=6} with id 11
[KnnEfficientFilter] INFO  - Indexing hotel {location=[5.0, 1.0], parking=true, rating=3} with id 12
[KnnEfficientFilter] INFO  - Indexing 12 documents
[KnnEfficientFilter] INFO  - Waiting for indexing to finish
[KnnEfficientFilter] INFO  - Searching for hotel near [5.0, 4.0] with rating >=8,<=10 and parking=true
[KnnEfficientFilter] INFO  - Found {location=[4.9, 3.4], parking=true, rating=9} with score 0.72992706
[KnnEfficientFilter] INFO  - Found {location=[6.4, 3.4], parking=true, rating=9} with score 0.3012048
[KnnEfficientFilter] INFO  - Found {location=[3.3, 4.5], parking=true, rating=8} with score 0.24154587
[KnnEfficientFilter] INFO  - Deleting index hotels-index

Exact k-NN with a scoring script

In the KnnScriptScore.java sample we create a 5-dimensional k-NN index with random data. The search query uses the k-NN scoring script to calculate exact nearest neighbors.

var searchResponse = client.search(s -> s
        .index(indexName)
        .query(q -> q
                .scriptScore(ss -> ss
                        .query(qq -> qq.matchAll(m -> m))
                        .script(sss -> sss
                                .inline(i -> i
                                        .source("knn_score")
                                        .lang("knn")
                                        .params("field", JsonData.of("values"))
                                        .params("query_value", JsonData.of(searchVector))
                                        .params("space_type", JsonData.of("cosinesimil")))))),
        Doc.class);
$ ./gradlew :samples:run -Dsamples.mainClass=knn.KnnScriptScore 

[Main] INFO  - Running main class: org.opensearch.client.samples.knn.KnnScriptScore
[KnnScriptScore] INFO  - Server: [email protected]
[KnnScriptScore] INFO  - Creating index my-index
[KnnScriptScore] INFO  - Indexing 10 vectors
[KnnScriptScore] INFO  - Waiting for indexing to finish
[KnnScriptScore] INFO  - Searching for vector [0.94, 0.1, 0.39, 0.63, 0.42]
[KnnScriptScore] INFO  - Found {values=[0.66, 0.23, 0.15, 0.44, 0.13]} with score 1.9564294
[KnnScriptScore] INFO  - Found {values=[0.94, 0.05, 0.86, 0.68, 0.05]} with score 1.90958
[KnnScriptScore] INFO  - Found {values=[0.88, 0.72, 0.29, 0.48, 0.56]} with score 1.8788767
[KnnScriptScore] INFO  - Found {values=[0.97, 0.99, 0.66, 0.61, 0.91]} with score 1.847905
[KnnScriptScore] INFO  - Found {values=[0.18, 0.29, 0.43, 0.63, 0.25]} with score 1.7819176
[KnnScriptScore] INFO  - Found {values=[0.35, 0.2, 0.62, 0.4, 0.96]} with score 1.7673628
[KnnScriptScore] INFO  - Found {values=[0.34, 0.59, 0.05, 0.47, 0.54]} with score 1.7316635
[KnnScriptScore] INFO  - Found {values=[0.55, 0.98, 0.07, 0.57, 0.06]} with score 1.6385877
[KnnScriptScore] INFO  - Found {values=[0.03, 0.72, 0.89, 0.83, 0.46]} with score 1.6147845
[KnnScriptScore] INFO  - Found {values=[0.17, 0.81, 0.09, 0.21, 0.3]} with score 1.4616101
[KnnScriptScore] INFO  - Deleting index my-index

Exact k-NN with the Painless scripting extensions

In the KnnPainlessScript.java sample we create a 5-dimensional k-NN index with random data. The search query uses the k-NN Painless extensions to calculate exact nearest neighbors.

var searchResponse = client.search(s -> s
        .index(indexName)
        .query(q -> q
                .scriptScore(ss -> ss
                        .query(qq -> qq.matchAll(m -> m))
                        .script(sss -> sss
                                .inline(i -> i
                                        .source("1.0 + cosineSimilarity(params.query_value, doc[params.field])")
                                        .params("field", JsonData.of("values"))
                                        .params("query_value", JsonData.of(searchVector)))))),
        Doc.class);
$ ./gradlew :samples:run -Dsamples.mainClass=knn.KnnPainlessScript

[Main] INFO  - Running main class: org.opensearch.client.samples.knn.KnnPainlessScript
[KnnPainlessScript] INFO  - Server: [email protected]
[KnnPainlessScript] INFO  - Creating index my-index
[KnnPainlessScript] INFO  - Indexing 10 vectors
[KnnPainlessScript] INFO  - Waiting for indexing to finish
[KnnPainlessScript] INFO  - Searching for vector [0.57, 0.86, 0.37, 0.07, 0.38]
[KnnPainlessScript] INFO  - Found {values=[1.0, 0.6, 0.66, 0.03, 0.18]} with score 1.8911908
[KnnPainlessScript] INFO  - Found {values=[0.4, 0.39, 0.63, 0.09, 0.39]} with score 1.8776901
[KnnPainlessScript] INFO  - Found {values=[0.32, 0.98, 0.7, 0.7, 0.77]} with score 1.8616674
[KnnPainlessScript] INFO  - Found {values=[0.93, 0.35, 0.27, 0.45, 0.81]} with score 1.789043
[KnnPainlessScript] INFO  - Found {values=[0.81, 0.36, 0.87, 0.78, 0.56]} with score 1.7457235
[KnnPainlessScript] INFO  - Found {values=[0.55, 0.19, 0.61, 0.42, 0.4]} with score 1.743325
[KnnPainlessScript] INFO  - Found {values=[0.12, 0.54, 0.09, 0.83, 0.28]} with score 1.6045148
[KnnPainlessScript] INFO  - Found {values=[0.0, 0.04, 0.63, 0.07, 0.9]} with score 1.479921
[KnnPainlessScript] INFO  - Found {values=[0.41, 0.05, 0.52, 1.0, 0.18]} with score 1.4306322
[KnnPainlessScript] INFO  - Found {values=[0.22, 0.1, 0.59, 0.89, 0.15]} with score 1.4274814
[KnnPainlessScript] INFO  - Deleting index my-index