Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: explain how to use Vector Search maxResultCount combined with additional criteria #658

Open
1 task done
jonny07 opened this issue Aug 2, 2024 · 2 comments
Open
1 task done
Labels
documentation Improvements or additions to documentation

Comments

@jonny07
Copy link

jonny07 commented Aug 2, 2024

Is there an existing issue?

Build info

  • objectbox version: [4.0.1]
  • Flutter/Dart version: [Flutter 3.22.3, Dart 3.4.4]
  • Build OS: [Windows 11]
  • Deployment OS or device: [Android 10, API29, Huawei P30 Pro]

Steps to reproduce

TODO Tell us exactly how to reproduce the problem.

  1. create a query for vector search, I did with embeddings, want to have 2 results:
    final query = box
    .query(Message_.embedding.nearestNeighborsF32(search_embedding, 2))
    .build();
    => Works fine

  2. combine the search with a second criteria.
    final query = box
    .query(
    Message_.embedding.nearestNeighborsF32(search_embedding, 2)
    .and( Message_.chatid.equals(character))
    ).build();

=> Results given are 0 or 1 or 2 results. Expected are 2 results.

I assume that the first condition is fulfilled, it searches for 2 results independent of the second criteria. Then second criteria is applied and from 2 results only 1 or 0 or 2 remain.

Expected behavior

Find 2 results with vector search also with additional conditions.

Actual behavior

Amount of results vary, depending on if the found results of step 1 fulfill the second criteria or not.

Note after analysis

In your documentation the following I assume will also not work as expected:
https://docs.objectbox.io/on-device-vector-search
final query = box
.query(City_.location.nearestNeighborsF32(madrid, 2)
.and(City_.name.startsWith("B")))
.build();

Just figured out that I could probably use "limit" from here https://docs.objectbox.io/queries in the "query" and leave the limit out in the "vector search" (leave it out is not possible, I just set it to several million). Would just be a question to you how this will work regarding ressources, how the vector search is implemented, but assume that will work.
Another workaround for me might be to work with a stream and interrupt the stream after 2 results.

So it might not be a bug, maybe just the documentation above needs to be adopted.

@jonny07 jonny07 added the bug Something isn't working label Aug 2, 2024
@greenrobot-team greenrobot-team removed the bug Something isn't working label Aug 5, 2024
@greenrobot-team
Copy link
Member

greenrobot-team commented Aug 5, 2024

Thanks for this issue! Note that the maxResultCount parameter only applies to the results of the nearest neighbor search, see also the API documentation on how to use it. It does not apply to the final query. As you have guessed, use the limit API for that. This is also hinted at in the API documentation.

leave it out is not possible, I just set it to several million

The maxResultCount parameter exists to improve performance of the nearest neighbor search. The higher the allowed number of results, the longer the nearest neighbor search sub-query will take to compute (obviously with a larger impact if the data set is large).

We should probably copy this from the API documentation and add this to the web documentation at https://docs.objectbox.io/on-device-vector-search

@greenrobot-team greenrobot-team added the documentation Improvements or additions to documentation label Aug 5, 2024
@greenrobot-team greenrobot-team changed the title Vector Search combined with additional criteria not working correct Docs: explain how to use Vector Search maxResultCount combined with additional criteria Aug 5, 2024
@jonny07
Copy link
Author

jonny07 commented Aug 5, 2024

Thanks a lot for your feedback!
It currently works fine for me and I'm very happy with objectbox, thanks a lot for your great work!
Currently my database is quite small. So my solution with setting the "maxResultCount" Parameter of nearest neighbour search very high will probably lead to performance problems for large databases. Using the stream as I mentioned before will probably also not work, as also for this I would need to set the "maxResultCount" parameter - which is unknown when I combine it with other criteria.
I assume for the algorithm to work on very large databases in combination with other criteria it would need to be implemented in a way to just search till enough results are found, without the user specifying "maxResultCount". This could e.g. be done by first applying all other criteria and on the result on that search perform the nearest neighbour till enough results are found or evaluating both in parallel till the wanted amounts are found. For me, I could also create an own database, so e.g. 20x same databases and then just search one of the databases, so I don't need to combine nearest neighbour with other criteria then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants