Components expecting `List[...]` input should use `Iterable[...]` instead #8494

bendavis78 · 2024-10-26T00:00:42Z

For example, let's say I have a generator that yields my documents:

def get_documents():
    for some_thing in some_other_generator():
        yield some_thing

If I try to use the OpenAIDocumentEmbedder with this, I get an error that I must use a list. Therefore I am forced to exhaust the python generator and load all documents into memory.

Many components in the haystack library require List when an Iterable would do just fine. For example, OpenAIDocumentEmbedder could be updated to use python generators instead of raw lists, making the process more memory effecitent.

The text was updated successfully, but these errors were encountered:

bendavis78 · 2024-11-18T01:04:08Z

Here's another example of this issue I came across:

Because there are a large number of douments in my source data, I'm having to batch my runs (running pipeline.run() for each batch). This is fine when the number of documents remains constant throughout the pipeline (I would like to have been able to write a fetcher component that can yield documents as input to the pipeline, but since it's the first step in the pipeline I can easily pull that out into a separate function).

Now, lets say we add a splitter to our pipeline. Ideally the splitter could yield each chunk, and the embedder could just consume chunks in batches of 32. However, the way components are designed currently makes this impossible.

Because of this, we now have two distinct batch sizes to consider: 1) the number of input documents prior to splitting, and 2) the number of documents (chunks) after splitting that are sent in each request to the embedding model. This can result in uneven batches being sent to the embedder, potentially causing more requests than are actually needed.

bendavis78 changed the title ~~Components expecting List[...] input should use Iterable~~ Components expecting List[...] input should use Iterable[...] instead Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Components expecting `List[...]` input should use `Iterable[...]` instead #8494

Components expecting `List[...]` input should use `Iterable[...]` instead #8494

bendavis78 commented Oct 26, 2024 •

edited

Loading

bendavis78 commented Nov 18, 2024 •

edited

Loading

Components expecting List[...] input should use Iterable[...] instead #8494

Components expecting List[...] input should use Iterable[...] instead #8494

Comments

bendavis78 commented Oct 26, 2024 • edited Loading

bendavis78 commented Nov 18, 2024 • edited Loading

Components expecting `List[...]` input should use `Iterable[...]` instead #8494

Components expecting `List[...]` input should use `Iterable[...]` instead #8494

bendavis78 commented Oct 26, 2024 •

edited

Loading

bendavis78 commented Nov 18, 2024 •

edited

Loading