Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to the SentenceWindowRetriever #8557

Open
sjrl opened this issue Nov 19, 2024 · 0 comments
Open

Updates to the SentenceWindowRetriever #8557

sjrl opened this issue Nov 19, 2024 · 0 comments
Assignees
Labels
P1 High priority, add to the next sprint

Comments

@sjrl
Copy link
Contributor

sjrl commented Nov 19, 2024

Is your feature request related to a problem? Please describe.
We are finding that the SentenceWindowRetriever is a powerful RAG tool for some client projects. We are especially testing it out in with some early prospects. While using it @JasperLS and I identified some aspects that could make it easier to use.

Describe the solution you'd like

  • Ensure that the documents in context_documents are sorted by split_idx_start. We noticed that the documents are only sorted when merging them into one text blob in the merge_documents_text function.
  • Would it be possible to also export a list of merged documents. So have an output with type List[Document]. This would make it easier to use in a downstream prompt builder since we typically want to also use the metadata of the resulting merged Document.

I realize for the second request it is possible to use a nested loop to overcome this in a Prompt so I wouldn't say it's a hard requirement, but it would be more convenient than working with a List[List[Documents]]

@julian-risch julian-risch added the P1 High priority, add to the next sprint label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint
Projects
None yet
Development

No branches or pull requests

3 participants