How to retrieve the source documents actually used for an answer in a generative RAG pipeline? #8441

fhamborg · 2024-10-07T10:44:38Z

fhamborg
Oct 7, 2024

In a basic generative RAG pipeline (cf. https://haystack.deepset.ai/tutorials/27_first_rag_pipeline), how does haystack support retrieving the documents actually used by the LLM to produce the answer? I noticed that I can get the pipeline to return not only the LLM's answer but also the documents returned by the retriever by using for example an AnswerBuilder (as opposed to without it only the last component's result, i.e., the LLM's answer, is returned).

However, this will yield all documents fed into the LLM, i.e., retrieved by the DocumentRetriever. But how to get only those that the answer is actually built upon? I noticed that there is a reference_pattern parameter (cf. https://docs.haystack.deepset.ai/docs/answerbuilder) which sounds somewhat related, but I couldn't find any information as to how to use it (if it is relevant at all).

Answered by davidsbatista

Oct 10, 2024

Hi @fhamborg, you need to specify that in the prompt, here is an example:

You are an expert in the subject matter.
You provide answers based on the following context.

Instructions:
- Answer the question truthfully using the information provided.
- If multiple documents contain relevant information, combine them to
form a comprehensive answer.
- Reference your sources clearly by number, for example, [1] for the
first source.
- Do not use information outside of the provided sources.
- If no relevant information is found, state that directly.

Documents:

{% for document in documents %}
    Document[{{ loop.index }}]:
    {{ document.content }}
{% endfor %}

Question: {{ question }}
Answer:

…

View full answer

davidsbatista · 2024-10-10T13:31:26Z

davidsbatista
Oct 10, 2024
Maintainer

Hi @fhamborg, you need to specify that in the prompt, here is an example:

You are an expert in the subject matter.
You provide answers based on the following context.

Instructions:
- Answer the question truthfully using the information provided.
- If multiple documents contain relevant information, combine them to
form a comprehensive answer.
- Reference your sources clearly by number, for example, [1] for the
first source.
- Do not use information outside of the provided sources.
- If no relevant information is found, state that directly.

Documents:

{% for document in documents %}
    Document[{{ loop.index }}]:
    {{ document.content }}
{% endfor %}

Question: {{ question }}
Answer:

The main aspects are: give a reference to each content/document in the prompt and instruct the LLM to reference it.

0 replies

fhamborg · 2024-10-10T15:43:17Z

fhamborg
Oct 10, 2024
Author

Thanks a lot :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to retrieve the source documents actually used for an answer in a generative RAG pipeline? #8441

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to retrieve the source documents actually used for an answer in a generative RAG pipeline? #8441

fhamborg Oct 7, 2024

Replies: 2 comments

davidsbatista Oct 10, 2024 Maintainer

fhamborg Oct 10, 2024 Author

fhamborg
Oct 7, 2024

davidsbatista
Oct 10, 2024
Maintainer

fhamborg
Oct 10, 2024
Author