Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] PineconeGrpcFuture implements concurrent.futures.Future #410

Merged
merged 8 commits into from
Nov 4, 2024

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Oct 31, 2024

Problem

GRPCIndex has long had limited and poorly documented support for async operations via the futures interface of the grpc library. I've recently been trying to implement query_namespaces using these futures, and discovered that unfortunately the grpc futures implementation is not compatible with the concurrent.futures package in the standard library. This makes them pretty much useless for anything at all complicated because the grpc library doesn't provide any utils for synchronization or waiting.

Solution

A class called PineconeGrpcFuture was added in the past as a minimal wrapper around the future that is emitted by grpc. These futures objects are used to represent asynchronous computation, and allow you to regisiter callbacks with add_done_callback. This is similar to calling then() on a javascript promise.

The original purpose of our PineconeGrpcFuture wrapper class seems to have been to implement some basic (very basic) error mapping, but for this diff I decided to extend the class to implement the concurrent.futures.Future interface. This allows the instances of PineconeGrpcFuture to be used with concurrent.futures.as_completed and concurrent.futures.wait utilities, which makes them dramatically more ergonomic to deal with. Unfortunately the grpc future is not compatible with the concurrent.future package out of the box.

For the unit tests of PineconeGrpcFuture, I had to make heavy use of mocking because all the various grpc classes are tightly coupled and can't be simply setup without performing actual network calls. This doesn't give me huge confidence it's actually working as expected, so as a sanity check I added some additional integration test coverage for upsert, fetch, and delete using concurrent.futures.wait.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Added unit and integration tests

@jhamon jhamon marked this pull request as ready for review November 4, 2024 17:35
@@ -19,9 +19,27 @@ class MockUpsertDelegate:
def __init__(self, upsert_response: UpsertResponse):
self.response = upsert_response

def result(self, timeout):
def result(self, timeout=None):
Copy link

@aulorbe aulorbe Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just OOC, did you set timeout to None here b/c you automatically set it to 5s if it's None in future.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MockUpsertDelegate is a standin for the grpc future returned from the grpc runner. That object has this interface: https://grpc.github.io/grpc/python/_modules/grpc.html#Future.result

I adjusted the signature to match the result args in the interface.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks!

Copy link

@aulorbe aulorbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, super clean and modular. Thanks for letting me take a look :)

Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really clean approach, LGTM. Thanks for all of the testing coverage too, very thorough.

And of course thanks for breaking these up into smaller prs.

else:
return PineconeException("Unknown GRPC error")

# def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or uncomment? Assuming these were for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for calling out. I'll clean these up.

namespace: Optional[str] = None,
async_req: Optional[bool] = False,
**kwargs,
) -> Union[FetchResponse, PineconeGrpcFuture]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding... Union is a way to have an "Either" return type? Or is it stating more the future ultimately has a FetchResponse shape once the promise is resolved?

Other question... is this ultimately preferred in Python over having two functions? (e.g. fetch and fetch_async)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding... Union is a way to have an "Either" return type? Or is it stating more the future ultimately has a FetchResponse shape once the promise is resolved?

Yes, Union[FetchResponse, PineconeGrpcFuture] means you will get back either a FetchResponse or PineconeGrpcFuture.

Other question... is this ultimately preferred in Python over having two functions? (e.g. fetch and fetch_async)?

I would strongly prefer to have methods with clear return types that don't require any casting or checking to know what you're getting, but if you look elsewhere in this class you'll see this is the general pattern used elsewhere in this class that has been present since before my time. It's something I would like to address in the future, but can't do it right now without creating a breaking change.

Eventually I would like to have something more like this:

# Sync interface, implemented for REST and GRPC
class SyncIndex(abc.ABC):
   def fetch(...) -> FetchResponse
class Index(SyncIndex):
   def fetch(...) -> FetchResponse
class GRPCIndex(SyncIndex):
   def fetch(...) -> FetchResponse

# Asyncio interface, implemented for REST and GRPC
class AsyncioIndex(abc.ABC):
   async def fetch(...) -> Awaitable[FetchResponse]
class AsyncioIndex(AsyncioIndex):
   async def fetch(...) -> Awaitable[FetchResponse]
class AsyncioIndexGRPC(AsyncioIndex):
   async def fetch(...) -> Awaitable[FetchResponse]


# Plus legacy async approaches, currently accessed with async_req=True kwarg
# These return types differ, so can't easily be grouped under one interface. Maybe with
# generics, which have some limited support in Python.

# Async, REST, using multiprocessing
class IndexAsync:
   def fetch(...) -> ApplyResult[FetchResponse]

# Async, GRPC, using futures
class GRPCIndexAsync:
   def fetch(...) -> PineconeGrpcFuture

@jhamon jhamon merged commit 463c30d into main Nov 4, 2024
85 checks passed
@jhamon jhamon deleted the jhamon/grpc-futures branch November 4, 2024 23:57
@haruska
Copy link
Contributor

haruska commented Nov 5, 2024

@jhamon thanks for the explanation. As an aside:

Just for my understanding... Union is a way to have an "Either" return type? Or is it stating more the future ultimately has a FetchResponse shape once the promise is resolved?

Yes, Union[FetchResponse, PineconeGrpcFuture] means you will get back either a FetchResponse or PineconeGrpcFuture.

The naming is a bit confusing. I would expect it to be some kind of set union. However, the entire concept of an "Either" response type as you mentioned is super odd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants