Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP on threadpool impl of query_namespaces #405

Closed

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Oct 25, 2024

Problem

I need a non-async interface providing a similar functionality as the query_namespaces currently implemented in GRPCIndexAsyncio.

Solution

Background info:

  • The multiprocessing ThreadPool is currently setup during the instantiation of Index. The constructor takes a param for number of pool_threads with a default value of 1.
  • When calling query with optional kwarg, async_req=True, the underlying (generated from openapi spec) code will execute the request using pool.apply_async. The result of these calls is an ApplyResult[QueryResult]. To get the results, from these we call .get() on the ApplyResult.

New query_namespaces on Index:

  • Copy some basic edge case stuff from the asyncio implementation. E.g. must have namespaces arg.
  • For each namespace, call query with async_req=True
  • Gather the resulting ApplyResults into a results array
  • Pass these results into QueryResultsAggregator, recently implemented and tested in [Feat] asyncio for grpc data interactions #398
  • It's expected this will be slower than the Asyncio implementation because we can't begin building the combined results heap until all requests have completed
  • Added some retry logic for requests made with thread pools

Todo:

  • Performance testing

Usage

from pinecone import Pinecone
import random

pc = Pinecone(
    api_key=os.environ['PINECONE_API_KEY'],
)

index = pc.Index(
    host="https://indexhost/",
    pool_threads=10
)

query_vec = [random.random()] * dimension

combined_results = index.query_namespaces(
    vector=query_vec,
    namespaces=["ns1", "ns2", "ns3", "ns4"],
    include_values=True,
    include_metadata=True,
    filter={"publication_date": {"$eq":"Last3Months"}},
    top_k=100
)

Dev build for manual testing

  • pip install pinecone==6.0.0.dev3

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

attempts = 0
while attempts < retries:
try:
return func(*args, **kwargs) # Attempt to call __call_api

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a python expert, but isn;t there some library that provides a decorator to retry functions? seems like this should be a solved problem

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in the standard library afaik, and I'm very hesitant to add third-party dependencies because of the overall dependency hell situation in python.

# aggregate them correctly. So we'll temporarily set topK to 2 for the
# subqueries, and then we'll take the topK=1 results from the aggregated
# results.
overall_topk = top_k if top_k is not None else 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this magic number 10?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a default value. Could be anything, but the API requires a value to be passed.

@jhamon
Copy link
Collaborator Author

jhamon commented Nov 12, 2024

Pursuing this in a different branch #409

@jhamon jhamon closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants