-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] OpenSearch 2.17 K-NN efficient filtering with a Date Range Filter No Results #2339
Comments
|
@kristib did you try reproducing on a smaller index with like 10-20 docs? We can try reproducing this with an example index. Also when you say you upgraded the index, can you please tell from which version of Opensearch you upgraded from? |
I will try to determine more factors for reproducing it. We are seeing the issue when we have an alias for at least 4 or so indexes that have around 1 million documents. I haven't been able to reproduce it on a small index. We upgraded from 2.13 to 2.17. Thanks! |
@kristib one thing I would suggest you to try is, instead of sending the alias in the search if you pass the all the indices behind the alias as a comma separated values in the URL of search. indices: my-index1, my-index2 Alias based search
Putting all indices instead of alias
|
The issue seems weirdly related to querying using an alias or across indexes, along with the range knn filter. I just tried the comma separated list too and still see the issue. To reproduce, I created an index with these settings and mappings:
I then populated that index with around 35 documents. Everything works fine with my knn query using this index. Next, I created a second index with the same settings and mappings, testindex2. I populated it with 20 other documents. Everything works fine with my knn query using this index. Then I create an alias and add testindex1 and testindex2 to the alias.
If I query the alias with my knn query, the hits in the response start to vary and return 0 sometimes, which isn't expected behavior.
I tried testindex1,textindex2/_search and see the same unexpected results. Also, if I remove the knn filter range portion of the query, the query results look as expected |
Thanks @kristib for providing the reproducing steps. We will start looking into this. This could be an issue from Opensearch too, but we can say more once we complete the deep-dive(@buddharajusahil is doing the deep-dive). |
Please assign this issue to me |
Hi @kristib I tried recreating the issue on OpenSearch 2.17 but was unable to. I used the same index settings except with dimension 4. Here are the commands I used in sequence:
I was able to get consistent search results using these commands. Do you have more context to your issue? |
@buddharajusahil thanks for investigating! I tried your exact steps and could not reproduce with that set of data. However, I tried modifying the dates and I could reproduce it. Here's a set of data where I see inconsistent results returned. Sometimes I even get results outside of the date range.
Using this query:
Is it related to the date format? |
Hi @kristib. Thanks for the data! With your commands I was able to reproduce the issue, will look into it! Additionally, I seem to get inconsistent results even when querying the individual indexes, so maybe not an alias problem. |
ok thanks! maybe it is just more noticeable across indexes. I'm experimenting with indexing different date formats to try to help isolate where it occurs |
I see there is a PR in progress, thanks for continuing to work on solving this issue! I was wondering, is there any workaround for this issue? Like is there any way I could modify the query I used in this example #2339 (comment) to get the correct results? Or if I modify the dates I'm indexing (like if I index docs with epochmilli date values instead), would that work around this issue? I'm guessing the answer is there isn't a workaround but just thought I'd ask :), otherwise I need to start the process of downgrading/migrating our cluster to an older version of Opensearch and wait until 2.19 is available in AWS. |
@navneet1v @buddharajusahil Is there any workaround to this issue, besides downgrading to a version before 2.17? |
Hi @kristib . Unfortunately since this was an all around filter problem, there is no work around I can think of other than using post-filtering, which will have slightly different behavior. |
@kristib it would be better if you can reach out to AWS team on this. You can mention this GH issue and see what they respond. Also, can you help confirm that with the fix added by @buddharajusahil you are not seeing the same issue? |
Somehow this issue got resolved when the PR was merged. opening it to ensure that we close once we confirm from @kristib that issue is fixed. |
What is the bug?
We have a cluster with about 60 million documents spread across weekly indexes. The weekly indexes are all added to a "documents" alias. Each weekly index has about 1 million documents, and each doc has a faiss hnsw knn_vector field with 256 dimensions.
We upgraded to OpenSearch 2.17 last week, and some of our k-NN queries no longer return hits. We have isolated the issue to k-NN queries (that are using efficient filtering) that have a range filter for a date field.
Here is the vector field (embeddings.OAI_TE3L_256) mapping:
Here is an example query that returns 0 hits:
The filtered subset for this query should result in 968 docs, which is less than k, and I believe should result in an exact knn search through the efficient knn filtering algorithm, however it does not seem like we are seeing this behavior. Instead, we are just getting 0 hits.
Overall, we are seeing inconsistent query results. For example:
What is the expected behavior?
K-NN efficient filtering should work as described https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/ for knn range filters on a date field.
What is your host/environment?
The text was updated successfully, but these errors were encountered: