Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Haystack OPEA Integration #222

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gadmarkovits
Copy link

This RFC is used to discuss an implementation of an OPEA integration for Haystack.

Signed-off-by: Gad Markovits <[email protected]>
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the RFC @gadmarkovits. Some questions inline.


4. GenAIEval

The evaluation, benchmark, and scorecard suite for OPEA, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like a dangling sentence .. what if anything will be delivered as part of the integration here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, this shouldn't be here - removed.


2. OPEA Text Embedder

This component will receive text input and embed it using an OPEA embedding microservice.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between text versus document embedder. If the text is long, it too might need chunking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're very similar, it's mainly done to conform with similar Haystack integrations and allow for embedding of both raw text and Document objects.


Following a discussion with Haystack's technical team, it was agreed that a ChatQnA example, using this OPEA integration, would be a good way to showcase its capabilities. To support this, several component wrappers need to be implemented in the first version of the integration (other wrappers will be added gradually):

1. OPEA Document Embedder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any inclusions/exclusions with respect to document types? Word, pdf, ppt, images, ..?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding documents that are not purely textual is beyond the scope of this integration. We can think about adding document parsers/preprocessors as additional wrappers to OPEA's dataprep components at a later stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants