Remote Vector Index Build Component — Object Store Upload/Download #2392
Labels
Features
Introduces a new unit of functionality that satisfies a requirement
indexing-improvements
This label should be attached to all the github issues which will help improving the indexing time.
Roadmap:Vector Database/GenAI
Project-wide roadmap label
See #2391 for background information
Overview
Following up on the RFCs, this is the first part of the low-level design for the Vector Index Build Component. The Vector Index Build Component is a logical component we further split into 2 subcomponents and their respective responsibilities:
This document contains the low level design for [1] Object Store I/O Component, covering how we can upload vectors and download graph files from a remote object store, as well as how we can configure the object store. The low level design for the remote vector service client is in a separate issue.
Alternatives Considered
The specific problem we are addressing in this design is [1] how to upload vectors to a remote object store from the vector engine and [2] how to download a graph file from a remote object store to the vector engine.
For discussion on high level architectural alternatives see: #2293
1. [Recommended] Integrate Repository Service with Vector Engine
This approach involves consuming the RepositoriesService in the k-NN plugin, which will then use the BlobContainer interface to read/write blobs to the remote repository.
Pros:
Cons:
2. Custom object store upload/download client
Instead of using the existing interfaces, we build our own custom blob interface for uploading vectors and download graph files. Additionally we will also need to implement all of the object store clients.
Pros:
Cons:
Repository Service Integration
Since the OpenSearch Plugin interface makes the RepositoriesService, the service responsible for maintaining and providing access to snapshot repositories, available to plugins, the k-NN plugin can consume this and use it to read/write to any supported remote object store.
High Level Class Diagram:
Repository Configuration, Creation, & Validation
In order to use the RepositoriesService, we need to configure and create our own vector repository. For comparison, at a high level remote store does this by exposing node attribute settings (see docs) that are used to create and register a repository on the boot up of OpenSearch nodes and on the formation of the OpenSearch cluster. The key difference between our use case and remote store is that the vector repository does not need to be registered before system indices are created, meaning the vector repository creation can technically happen post node startup.
More specifically, remote store does the following (see Remote Store Repository Creation Design):
For our use case the problem here is the map in [3] is not extensible for plugins as there is an explicit isEmpty check on it. Therefore if we use the same logic from RemoteStoreNodeService#createAndVerifyRepositories to register our vector repository this would interfere with remote store registering it’s own repositories.
Given this problem, the following are the possible ways we can configure, create, and validate a vector repository from the k-NN plugin.
Consume RepositoriesService in NativeEnginesKnnVectorsWriter
Since we want to perform blob upload/download during the merge/flush operations, the KnnVectorsWriter class needs to have a reference to the RepositoriesService in order to perform the upload/downloads. The following refactoring will be required:
Vector Input Stream Conversion
The BlobContainer#writeBlob and BlobContainer#readBlob methods both take the data to be written in the form of an InputStream, so we will need to implement logic to buffer KNNVectorValues into an InputStream. Depending on object upload performance analysis and benchmarking this may require a follow-up deep dive to do this process more efficiently
In the POC where vectors are buffered 1 by 1, the transfer of ~1.6m 768 dimension vectors only takes ~1 minute to complete, so we can revisit the performance aspect here as needed.
See: 10M 768D dataset without source and recovery source(best time): GPU based Vector Index Build : POC with OpenSearch Vector Engine
Vector File Format
One of the key design choices will be how we format the vector file being written to the object store. Moreover, one of the key decisions from the RFC is we want the remote vector build service to be OpenSearch/Lucene version agnostic (see: link)
The following are possible ways we can format the uploaded vector file:
Lucence .vec format for reference
Blob Name & Blob Path
With the repository interface a single bucket will be used for all of the vector blobs on a given domain. Therefore we need to design the blob name and blob path in such a way to prevent key collision resulting in the same vector blob being concurrently written. For snapshots (and remote store), only the primary shard data is uploaded to the repository, so the segments of a snapshot are uploaded to indices//. For more details see: OpenSearch Blog on snapshot structure. However, since we want to support both segment replication and document replication, we also have to account for both the primary and replica shard of a given index performing graph builds at the same time. In other words adding to the file path is not sufficient for collision prevention.
For the blob name, the same shard may be performing flush/merge for multiple segments at the same time, so we can deduplicate on the segment name:
Segment names are never re-used by Lucene, so we do not have to worry about a future segment having the same name (ref: Lucene docs)
For the blob path, we have a few options to choose from:
Pros
1. This is the simplest solution and we do not have to worry about any replica deduplication logic
2. From a cost perspective it doesn’t make sense for users to use the remote vector build service when replicas are enabled anyways
Cons
1. Not allowing the remote vector build service when replicas are configured is not a very intuitive user experience
Pros
1. This gracefully handles the replica key collision without randomly generating any new information
Cons
1. If the invariant of 1 replica per node ever changes then that will break this implementation. However, it is unlikely this would ever change.
2. In case of merge/flush operation failure and retry, there may be an edge case where the same segment name is re-created.
Pros
1. In case multiple replicas per node are supported in the future, this will still work
2. Handled any edge cases where the same segment name may get recreated
Cons
1. The blob path/name becomes non-deterministic which may make it more difficult to debug issues and handle retries, especially if we want to move to an async graph build architecture later we would need to keep a mapping between the uuid and the shard.
The constructed graph file will be able to use the same blobVectorFilePath + blobVectorName, we can use a different file extension as there is a 1:1 mapping between the uploaded vector blob and the downloaded graph.
Feature Controls
This section covers the implementation details of how we can [1] enable and disable the feature and [2] configure thresholds with which we decide whether or not to use the remote vector build GPU fleet or use the local CPU build path.
Feature Enablement:
Given the recommended solution in Repository Configuration, Creation, & Validation, we will not use any node attributes for this feature.
We also want to provide intelligent logic to automatically decide for the customer whether or not to use the remote GPU build feature. This will come in the form of some dynamic cluster and index settings for which we will provide smart default values based on benchmarking analysis.
Metrics
This section will cover metrics specific to the vector blob upload and graph blob download. Other metrics related to triggering the vector build will be covered in Vector Index Build Component — Remote Vector Service Client. As we are dealing with only blob upload/download here, we can scope down the metrics to the following:
Today the k-NN stats API only supports cluster and node level stats, so we can gather these metrics on a cluster/node level and expose them via the k-nn stats API.
As a separate item we should explore supporting index/shard level k-nn stats as it would be valuable to see specifically which indices are using and benefiting the most from the remote vector build service.
Failure Scenarios
Similar to Metrics, this section will also specifically focus on the failure scenarios for blob upload/download. At a high level, we need to gracefully handle all failures to fall back to the CPU graph build path as we cannot leave the segment without a graph file.
Since we are integrating with the existing BlobContainer interface, both retries and exceptions are already well defined by the interface:
We will explore the additional failure scenarios related to vector graph build in the client design: Vector Index Build Component — Remote Vector Service Client
Performance Benchmarks
We will perform additional performance benchmarks related to blob upload/download following up on the initial POC numbers here In the POC where vectors are buffered 1 by 1, the transfer of ~1.6m 768 dimension vectors only takes ~1…, and based on benchmark results we can adjust vector input stream conversions as needed.
End to end and remote vector build client benchmarking will be covered in a separate document.
Future Optimizations
Below are some future optimizations we can look into based on performance analysis:
The text was updated successfully, but these errors were encountered: