Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Remote Vector Index Build Component in OpenSearch Vector Engine #2391

Open
5 tasks
jed326 opened this issue Jan 14, 2025 · 0 comments
Open
5 tasks

[Meta] Remote Vector Index Build Component in OpenSearch Vector Engine #2391

jed326 opened this issue Jan 14, 2025 · 0 comments
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. Roadmap:Vector Database/GenAI Project-wide roadmap label

Comments

@jed326
Copy link
Contributor

jed326 commented Jan 14, 2025

Description

This is a meta issue to track the work needed to design and implement the component described by the previous RFC:

Background

Following content is copied from previous RFCs for reference

Components Definition

  1. Vector Index Build Component: This is new component that is being proposed as part of this design. It will be responsible for building a vector index for a segment and then notifying Vector Engine that index is ready to download. This index build component can have both GPU and CPU machines or any customer hardware to build the vector index.
  2. Opensearch Vector Engine(k-NN plugin): This nothing but k-NN plugin which is responsible for providing vector related capabilities in Opensearch.
  3. Object Store/Intermediate Vector Store: This an intermediate storage component which will temporarily store the vectors and index for different components. The store will ensure that it has right deletion polices to remove these vectors. This is not the same store where we store the segments for Remote store feature in Opensearch. This will be a separate store.

Assumptions

Below are some of the assumptions taken while designing the integration of Vector Engine with remote Index Build Service.

  1. This design assumes that there is an IndexBuildService hosted at a endpoint where Vector Engine can submit the request of create Index.
  2. Vector engine has all the details on how to connect to object store to stream vectors and download index.
  3. Add what will be provided from customer
  4. The building of an index via a remote endpoint will supported with indices that uses NativeKNNVectorsFormat and with old indices which uses DocValuesFormat(old indices format due to the limitations on access of different index/mapping related attributes at the codec level)

Roles and Responsibilities

  1. Given the details of the object store, upload vectors data to the store and download the index once it is created.
  2. Given any endpoint for Index build service call the remote endpoint to create the vector index per segment.
  3. Have intelligent logic to take a decision when to use Index build service endpoint vs Local compute to build the index.
  4. Once the index is built download the index from remote store and put it along with other segment related files.

Tasks

High Level Designs:

POC Implementations:

  • POC - Remote Object Store Upload/Download
  • POC - Remote Vector Build Service Client

Low Level Designs:

  • LLD - Bringing It All Together

Final Implementations (PR Tracking)

@jed326 jed326 added Features Introduces a new unit of functionality that satisfies a requirement indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. Roadmap:Vector Database/GenAI Project-wide roadmap label labels Jan 14, 2025
@jed326 jed326 self-assigned this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement indexing-improvements This label should be attached to all the github issues which will help improving the indexing time. Roadmap:Vector Database/GenAI Project-wide roadmap label
Projects
Status: New
Status: Backlog
Development

No branches or pull requests

2 participants