Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expand_nested_docs Parameter support to NMSLIB engine #2331

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

heemin32
Copy link
Collaborator

Description

Add support for the expand_nested_docs parameter in the nmslib engine. As nmslib does not support multi-vector functionality, this feature may have limited usefulness but poses no downside to including it.

Related Issues

N/A

Check List

  • New functionality includes testing.
  • [] New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

As nmslib does not support multi-vector functionality, this feature may have limited usefulness but poses no downside to including it.

Can you explain a bit more on this?

@heemin32
Copy link
Collaborator Author

heemin32 commented Dec 12, 2024

As nmslib does not support multi-vector functionality, this feature may have limited usefulness but poses no downside to including it.

Can you explain a bit more on this?

For nmslib, searches are performed at the nested document level without deduplicating per parent documents. As a result, multiple inner hits can occur as long as the nested documents belong to the top-K results. In this context, the demand for this functionality may be lower compared to other engines.

For instance, consider an extreme scenario where there are two documents, each containing 10 nested documents. If a query is executed with k=5, it is possible that all five closest vectors come from just one of the two documents. In this case, expanding nested documents would include five additional nested documents with scores lower than the minimum score of the already retrieved nested documents.

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add/update the IT of expand nested docs variable for nmslib engine too.

@heemin32
Copy link
Collaborator Author

Please add/update the IT of expand nested docs variable for nmslib engine too.

Added.

@heemin32 heemin32 requested a review from navneet1v December 14, 2024 00:10
@heemin32 heemin32 force-pushed the unified branch 2 times, most recently from 8868d00 to f298037 Compare December 14, 2024 04:19
@heemin32 heemin32 requested a review from navneet1v December 14, 2024 04:23
@heemin32 heemin32 force-pushed the unified branch 2 times, most recently from 0e8326d to 1b8e8fb Compare December 14, 2024 06:48
@heemin32 heemin32 merged commit a5fb171 into opensearch-project:main Dec 16, 2024
30 of 31 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2331-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a5fb171b065747a39e7aaae3de330c0fda0800ca
# Push it to GitHub
git push --set-upstream origin backport/backport-2331-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2331-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants