feat(document-search): Option to choose between image and text embeddings for ImageElements #175

ludwiktrammer · 2024-11-06T11:33:15Z

Motivation

ImageElement always have an image and often also have text (added during processing). You can't have a single embedding that include both visual and textual information, so currently (since #172 ) we try to create two embeddings (vector store entries) for each image element. That way we can retrieve them using both.

We should add a method to decide which part of ImageElement we want to create an embedding for: textual, visual or both.

This decision should probably be made on the level of processing - because creating textual representation of the image (with OCR / LLM) and then deciding to not use it during ingestion would be wasteful.

Implementation idea

Deciding whether to embed the textual representation

To support flow where only the visual representation is embedded, and textual representation is not needed:

The description and ocr_extracted_text properties of ImageElement should be made optional
The return type of get_key() method of Element should be made optional
ImageElement should return the value None from get_key() for images with no textual representation
DocumentSearch's ingest_elements method should skip creating a textual embedding for elements that have get_key() returning None

User would then choose whether to embed the textual representation of the image by choosing the appropriate processor (one that creates the textual representation or one that doesn't)

Deciding whether to embed the visual representation

This part doesn't currently need any changes to our codebase - users already can decide whether they want a visual embedding by choosing an appropriate embedding model.

Supporting cases where both representations are to be embedded

Sometimes the configuration (described above) may indicate that both textual and visual representations should be embedded - which would mean creating two embeddings for a single Element. Currently we don't have a good support for such case. To enable this we should:

Add a embedding_type field to VectorStoreEntry (current options: text, image)
Change the VectorStoreEntry.id field so it depends both on Element.id and VectorStoreEntry.embedding_type (so the uuid hashing should probably be moved from Element.id here?)

Supporting cases where neither representation are to be embedded

Sometimes the configuration (described above) may indicate that no implementation should be embedded - this should generate a warning. This shouldn't generate an exception, because we still want to try to ingest the rest of the documents.

The text was updated successfully, but these errors were encountered:

kdziedzic68 · 2024-11-26T11:03:00Z

if the element id already depends on element_type isn't it enough to make id of vector store entry dependent on it ?

ludwiktrammer · 2024-11-27T09:09:47Z

if the element id already depends on element_type isn't it enough to make id of vector store entry dependent on it ?

No, because in both cases (embedding based on visual representation and embedding based on visual representation) it's a vector based on ImageElement, so the element type will be the same.

ludwiktrammer added document search Changes to the document search package feature New feature or request labels Nov 6, 2024

ludwiktrammer added this to ragbits Nov 6, 2024

ludwiktrammer moved this to Backlog in ragbits Nov 6, 2024

ludwiktrammer self-assigned this Nov 6, 2024

ludwiktrammer mentioned this issue Nov 6, 2024

feat(document-search): Support for ingesting images #172

Merged

ludwiktrammer moved this from Backlog to Ready in ragbits Nov 18, 2024

ludwiktrammer removed their assignment Nov 18, 2024

mhordynski added this to the Ragbits 0.5 milestone Nov 18, 2024

mhordynski assigned kdziedzic68 Nov 22, 2024

kdziedzic68 linked a pull request Nov 26, 2024 that will close this issue

feat(document-search): Option to choose between image and text embeddings for ImageElements #205

Merged

kdziedzic68 closed this as completed in #205 Nov 29, 2024

github-project-automation bot moved this from In review to Done in ragbits Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(document-search): Option to choose between image and text embeddings for ImageElements #175

feat(document-search): Option to choose between image and text embeddings for ImageElements #175

ludwiktrammer commented Nov 6, 2024 •

edited

Loading

kdziedzic68 commented Nov 26, 2024

ludwiktrammer commented Nov 27, 2024

feat(document-search): Option to choose between image and text embeddings for ImageElements #175

feat(document-search): Option to choose between image and text embeddings for ImageElements #175

Comments

ludwiktrammer commented Nov 6, 2024 • edited Loading

Motivation

Implementation idea

Deciding whether to embed the textual representation

Deciding whether to embed the visual representation

Supporting cases where both representations are to be embedded

Supporting cases where neither representation are to be embedded

kdziedzic68 commented Nov 26, 2024

ludwiktrammer commented Nov 27, 2024

ludwiktrammer commented Nov 6, 2024 •

edited

Loading