-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add support for Neural and Hybrid queries via the DSL builder API #735
Comments
Heres some addition information about the commit I referenced, def add_neural_search(self, embedding_field, query_text, model_id, k=10):
"""
Add a neural search condition to the query.
Args:
embedding_field (str): The field under which neural search parameters are placed.
query_text (str): The search query string for neural search.
model_id (str): The ID of the neural model used for generating embeddings.
k (int): The number of nearest neighbors (k) to return.
"""
if not model_id:
raise ValueError("Model ID must be provided for neural search.")
neural_query = Q("neural",
embedding_field=embedding_field,
query_text=query_text,
model_id=model_id,
k=k)
# this really shouldnt have any means of combination. if we have nueral query it just overwrites
self.query = neural_query working test def test_neural_search():
# prepare fixtures
search_instance_fixture = Search('movies')
search_instance_fixture.add_neural_search(
embedding_field='passage_embedding',
query_text="find similar movies",
model_id="model123",
k=5
)
expected_query_fixture = {
"neural": {
"passage_embedding": {
"query_text": "find similar movies",
"model_id": "model123",
"k": 5
}
}
}
# execute
built_query = search_instance_fixture.build()
assert built_query == expected_query_fixture, "Neural search not matching" |
Thanks! This looks great. At a high level, we want as much code as possible generated from https://github.com/opensearch-project/opensearch-api-specification and all the interesting stuff to be hand-rolled here, like you're proposing. Check whether some of these request objects be expressed in the API, produce auto-generated code, and then be used by the high level constructs? In either case, make a PR with your proposal, update user guides, etc.? |
Just got approval to work on this from my company-- we've added it to our next sprint which starts friday. I'll begin working on this feature request on Monday of next week~ Thank you! |
Just an update-- Not 100% confident on my implementation yet, and still need to do a large refactor but the functionality is working-- it's just messy. Once i'm confident i'll work on creating user guides for use with the higher level Search client. a link to the current dif: main...MikeyCymantix:opensearch-py:main |
Is your feature request related to a problem?
there seems to be a gap between the OpenSearch-Python 'high level Search Client's' functionality. Specifically with respect to the 'Hybrid' and 'Neural' search queries. These query mechanisms are definitely advanced, and are mainly used with anything regarding the ml_common/NLP functionalities that have been rolled out.
It would be great to support these kinds of search queries in the High Level python search client. Attached is a commit where I added support for neural query types.
without support for these types, we would be forced to resort to manually constructing the DSL ourselves-- ideally things that the high level search client should abstract away. Not only is this confusing (like why arnt these queries supported), but also increases the surface area for bugs.
What solution would you like?
Support for Advanced Query Types
The addition of 'Neural' and 'Hybrid' query types to the OpenSearch Python client's high-level Search API would be great.
Implementation Details
Neural search is particularly unique because it involves dynamically specified embedding fields rather than a static field such as "passage_embedding" often cited in the documentation. This flexibility pretty important-- and I wasn't quite sure how to reflect that in the code. However, i included a working prototype-- its based pretty much off of the FunctionScore query which also has an init method attached to it
main...MikeyCymantix:opensearch-py:Cymantix_MichaelAlmeida/neural_query
What alternatives have you considered?
We can construct the DSL Manually for these types of queries and it would be fine.
Do you have any additional context?
The error I originally encountered appeared in this code.
The text was updated successfully, but these errors were encountered: