How to get the id of the original splitted Document #3017

wilsonlimaneto · 2022-08-10T12:12:59Z

wilsonlimaneto
Aug 10, 2022

Hello! I was reading the documentation/API/Q&A but still could not find a way so I can split a original Document (parent), save those new splitted Documents (with new Ids and own embbedings) into Milvus, run a query and return the splitted documents but also their parent Ids (parent Document). My basic idea is to save smaller fragments of a long text into milvus and query for the whole Document later. Thanks.

Answered by TuanaCelik

Aug 12, 2022

Hi @wilsonlimaneto - sorry this has taken a couple of days to respond. It sounds to me like you might make use if the PreProcessor. You could 2 things: 1. Either use the split_length argument when you're first preprocessing the files to create Documents - or 2. Since you already seem to have Documents, you could use the split function. This one works on a single document.

As for their ids, you could make use of the meta field to store the parent ids, or, you could make use of the id_hash_keys to construct one with the parent id included also.

Here is the API reference for that: https://haystack.deepset.ai/reference/preprocessor

Hope this helps. Let me know 😊

View full answer

TuanaCelik · 2022-08-12T07:18:03Z

TuanaCelik
Aug 12, 2022

Hi @wilsonlimaneto - sorry this has taken a couple of days to respond. It sounds to me like you might make use if the PreProcessor. You could 2 things: 1. Either use the split_length argument when you're first preprocessing the files to create Documents - or 2. Since you already seem to have Documents, you could use the split function. This one works on a single document.

As for their ids, you could make use of the meta field to store the parent ids, or, you could make use of the id_hash_keys to construct one with the parent id included also.

Here is the API reference for that: https://haystack.deepset.ai/reference/preprocessor

Hope this helps. Let me know 😊

1 reply

wilsonlimaneto Aug 12, 2022
Author

Thank you so much. Actually, that was exactly what I ended up doing: using the meta field to keep track of the original Document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the id of the original splitted Document #3017

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to get the id of the original splitted Document #3017

wilsonlimaneto Aug 10, 2022

Replies: 1 comment · 1 reply

TuanaCelik Aug 12, 2022

wilsonlimaneto Aug 12, 2022 Author

wilsonlimaneto
Aug 10, 2022

Replies: 1 comment 1 reply

TuanaCelik
Aug 12, 2022

wilsonlimaneto Aug 12, 2022
Author