Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Kinetica vector DB service #2058

Open
2 tasks done
am-kinetica opened this issue Nov 19, 2024 · 9 comments
Open
2 tasks done

[FEA]: Kinetica vector DB service #2058

am-kinetica opened this issue Nov 19, 2024 · 9 comments
Labels
external This issue was filed by someone outside of the Morpheus team feature request New feature or request

Comments

@am-kinetica
Copy link

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

We at Kinetica would like to provide an implementation of VectorDBService that works with the Kinetica database. The idea is to enable the write to vector db stage of a pipeline output the data to Kinetica DB as it does to Milvus right now.

Describe your ideal solution

A new module similar to milvus_vector_db_service.py.

Additional context

This would enable Kinetica to use the the nv_ingest microservice to be configured to be used with the Kinetica database.

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@am-kinetica am-kinetica added the feature request New feature or request label Nov 19, 2024
@morpheus-bot-test morpheus-bot-test bot added Needs Triage Need team to review and classify external This issue was filed by someone outside of the Morpheus team labels Nov 19, 2024
@morpheus-bot-test
Copy link

Hi @am-kinetica!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the meantime, feel free to add any relevant information to this issue.

@efajardo-nv
Copy link
Contributor

Thanks @am-kinetica. This sounds great. Looking forward to the pull request.

@efajardo-nv efajardo-nv removed the Needs Triage Need team to review and classify label Nov 19, 2024
@am-kinetica
Copy link
Author

am-kinetica commented Nov 21, 2024

@efajardo-nv

class VectorDBServiceFactory:

    @typing.overload
    @classmethod
    def create_instance(
            cls, service_name: typing.Literal["milvus"], *args: typing.Any,
            **kwargs: dict[str,
                           typing.Any]) -> "morpheus_llm.service.vdb.milvus_vector_db_service.MilvusVectorDBService":
        pass

    @classmethod
    @handle_service_exceptions
    def create_instance(cls, service_name: str, *args: typing.Any, **kwargs: dict[str, typing.Any]):
        """
        Factory for creating instances of vector database service classes. This factory allows dynamically
        creating instances of vector database service classes based on the provided service name.
        Each service name corresponds to a specific implementation class.

        Parameters
        ----------
        service_name : str
            The name of the vector database service to create.
        *args : typing.Any
            Variable-length argument list to pass to the service constructor.
        **kwargs : dict[str, typing.Any]
            Arbitrary keyword arguments to pass to the service constructor.

        Returns
        -------
            An instance of the specified vector database service class.

        Raises
        ------
        ValueError
            If the specified service name is not found or does not correspond to a valid service class.
        """
        module_name = f"morpheus_llm.service.vdb.{service_name}_vector_db_service"
        module = importlib.import_module(module_name)
        class_name = f"{service_name.capitalize()}VectorDBService"
        class_ = getattr(module, class_name)
        instance = class_(*args, **kwargs)
        return instance

Why does the create_instance method return an instance of MilvusVectorDBService instead of returning VectorDBService ? Any particular reason ?

@efajardo-nv
Copy link
Contributor

@am-kinetica Thanks for catching that. That's correct. Should be VectorDBService but shouldn't affect anything since it's overloaded. We'll get that updated.

@am-kinetica
Copy link
Author

@am-kinetica Thanks for catching that. That's correct. Should be VectorDBService but shouldn't affect anything since it's overloaded. We'll get that updated.

Thanks, doesn't affect anything, just looks restrictive.

@efajardo-nv
Copy link
Contributor

efajardo-nv commented Nov 21, 2024

@am-kinetica I was mistaken. The use of the typing.overload decorator here is actually to allow for more precise type checking when milvus is passed to create_instance (i.e. expected return type would be MilvusVectorDBService). Looks like we need one for FaissVectorDBService as well.

@am-kinetica
Copy link
Author

@am-kinetica I was mistaken. The use of the typing.overload decorator here is actually to allow for more precise type checking when milvus is passed to create_instance (i.e. expected return type would be MilvusVectorDBService). Looks like we need one for FaissVectorDBService as well.

Alright. That makes sense, I am going to put one in for KineticaVectorDBService as well.

@am-kinetica
Copy link
Author

am-kinetica commented Nov 26, 2024

@efajardo-nv
@bsuryadevara

Could you please provide me with some sample JSON file which would work as an input to the milvus_vector_db_service using write_to_vector_db stage ? Since I am not finding any such example setting up the right input to a pipeline that I am trying to build using Milvus is becoming a challenge.

@am-kinetica
Copy link
Author

am-kinetica commented Nov 26, 2024

@efajardo-nv
@bsuryadevara

I have tried creating a sample input from the zilliz interface that looks like:

(morpheus) root@300-303-u28-vm04-v100:/workspace/examples/sample_milvus_pipeline# cat test.json
{
  "collectionName": "test_collection",
  "data": [
    {
      "id": 81,
      "metadata": "vozltxssn7l",
      "vector": [
        0.2659727795719654,
        0.8355436908247349,
        0.18610434690032607
      ]
    },
    {
      "id": 82,
      "metadata": "vozltxssn7l",
      "vector": [
        0.2659727795719654,
        0.8355436908247349,
        0.18610434690032607
      ]
    }
  ]
}

This works perfectly on zilliz but in the pipeline throws an error saying Unable to upload dataframe entries to vector database: 'id'.

Any help would be much appreciated.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external This issue was filed by someone outside of the Morpheus team feature request New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants