Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the bug when a segment has no vector field present for disk based vector search #2281

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

navneet1v
Copy link
Collaborator

@navneet1v navneet1v commented Nov 19, 2024

Description

Fixing the bug when a segment has no vector field present for disk based vector search

The check will ensure that if there are segments with no vector field the disk based vector search is not crashing.

Whats the fix:

So I added couple of things in the code which will not only fix the bug but will also provide some speedup to the DiskAnn Queries in certain cases.

  1. When the rescore pass happens in disk based vector search, only those segments are hit which has docs to be rescored. Earlier all the segments were getting rescore call even when they don't have to rescore the docs. This will provide some speed up to the query and also fix the bug.

Dev Testing

Create Index

PUT my-knn-index-61
{
	"settings": {
		"index": {
			"knn": true,
			"number_of_shards": 1,
			"number_of_replicas": 0,
			"refresh_interval": "1s"
		}
	},
	"mappings": {
		"properties": {
			"my_vector1": {
				"type": "knn_vector",
				"dimension": 8,
				"mode": "on_disk",
				"compression_level": "32x"
			}
		}
	}
}

Ingest 2 documents

PUT _bulk?refresh=true
{ "index": { "_index": "my-knn-index-61", "_id": "17" } }
{ "my_vector1": [-6.78, 5.34, -8.12, 6.78, -4.12, 7.89, -3.45, 8.34] }
{ "index": { "_index": "my-knn-index-61", "_id": "74" } }
{ "my_vector1": [7.34, -6.45, 5.12, -7.78, 6.89, -4.34, 8.12, -5.67] }
{ "index": { "_index": "my-knn-index-61", "_id": "5" } }
{ "my_vector1": [-5.78, 7.12, -6.45, 8.34, -4.12, 7.89, -6.78, 5.34] }
{ "index": { "_index": "my-knn-index-61", "_id": "17644" } }
{ "my_vector1": [6.45, -8.34, 5.67, -7.89, 3.12, -6.78, 8.45, -4.12] }
{ "index": { "_index": "my-knn-index-61", "_id": "177322" } }
{ "my_vector1": [-7.12, 6.78, -4.56, 8.34, -5.67, 7.12, -3.34, 6.45] }

Delete a document

DELETE my-knn-index-61/_doc/17

Segments

GET _cat/segments/my-knn-index-61
[
	{
		"index": "my-knn-index-61",
		"shard": "0",
		"prirep": "p",
		"ip": "127.0.0.1",
		"segment": "_0",
		"generation": "0",
		"docs.count": "5",
		"docs.deleted": "0",
		"size": "4.3kb",
		"size.memory": "0",
		"committed": "false",
		"searchable": "true",
		"version": "9.12.0",
		"compound": "true"
	},
	{
		"index": "my-knn-index-61",
		"shard": "0",
		"prirep": "p",
		"ip": "127.0.0.1",
		"segment": "_1",
		"generation": "1",
		"docs.count": "0",
		"docs.deleted": "1",
		"size": "3.1kb",
		"size.memory": "0",
		"committed": "false",
		"searchable": "true",
		"version": "9.12.0",
		"compound": "true"
	}
]

Search Again with error

GET my-knn-index-61/_search
{
	"query":{
		"knn":{
			"my_vector1": {
				"vector": [1,1,1,1,1,1,1,1],
				"k": 10
			}
		}
	}
}

Response

{
	"took": 61,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 5,
			"relation": "eq"
		},
		"max_score": 0.0031684334,
		"hits": [
			{
				"_index": "my-knn-index-61",
				"_id": "177",
				"_score": 0.0031684334,
				"_source": {
					"my_vector1": [
						-7.12,
						6.78,
						-4.56,
						8.34,
						-5.67,
						7.12,
						-3.34,
						6.45
					]
				}
			},
			{
				"_index": "my-knn-index-61",
				"_id": "173",
				"_score": 0.0029043476,
				"_source": {
					"my_vector1": [
						-6.78,
						5.34,
						-8.12,
						6.78,
						-4.12,
						7.89,
						-3.45,
						8.34
					]
				}
			},
			{
				"_index": "my-knn-index-61",
				"_id": "175",
				"_score": 0.002883079,
				"_source": {
					"my_vector1": [
						-5.78,
						7.12,
						-6.45,
						8.34,
						-4.12,
						7.89,
						-6.78,
						5.34
					]
				}
			},
			{
				"_index": "my-knn-index-61",
				"_id": "174",
				"_score": 0.002864083,
				"_source": {
					"my_vector1": [
						7.34,
						-6.45,
						5.12,
						-7.78,
						6.89,
						-4.34,
						8.12,
						-5.67
					]
				}
			},
			{
				"_index": "my-knn-index-61",
				"_id": "176",
				"_score": 0.0027358374,
				"_source": {
					"my_vector1": [
						6.45,
						-8.34,
						5.67,
						-7.89,
						4.12,
						-6.78,
						8.45,
						-4.12
					]
				}
			}
		]
	}
}

Related Issues

Ref: #2278

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@naveentatikonda naveentatikonda merged commit 2d1a408 into opensearch-project:main Nov 19, 2024
42 checks passed
@navneet1v navneet1v changed the title Fixing the bug when a segment has no vector field present for disk ba… Fixing the bug when a segment has no vector field present for disk based vector search Nov 19, 2024
@navneet1v
Copy link
Collaborator Author

Since this problem happened due to fieldinfo being null will raise a separate PR for fixing that so that in future we don't face this issue.

@navneet1v navneet1v added backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch labels Nov 20, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 20, 2024
…sed vector search (#2281)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 2d1a408)
navneet1v added a commit that referenced this pull request Nov 20, 2024
…sed vector search (#2281) (#2282)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 2d1a408)

Co-authored-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants