You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ImageElement always have an image and often also have text (added during processing). You can't have a single embedding that include both visual and textual information, so currently (since #172 ) we try to create two embeddings (vector store entries) for each image element. That way we can retrieve them using both.
We should add a method to decide which part of ImageElement we want to create an embedding for: textual, visual or both.
This decision should probably be made on the level of processing - because creating textual representation of the image (with OCR / LLM) and then deciding to not use it during ingestion would be wasteful.
Implementation idea
Deciding whether to embed the textual representation
To support flow where only the visual representation is embedded, and textual representation is not needed:
The description and ocr_extracted_text properties of ImageElement should be made optional
The return type of get_key() method of Element should be made optional
ImageElement should return the value None from get_key() for images with no textual representation
DocumentSearch's ingest_elements method should skip creating a textual embedding for elements that have get_key() returning None
User would then choose whether to embed the textual representation of the image by choosing the appropriate processor (one that creates the textual representation or one that doesn't)
Deciding whether to embed the visual representation
This part doesn't currently need any changes to our codebase - users already can decide whether they want a visual embedding by choosing an appropriate embedding model.
Supporting cases where both representations are to be embedded
Sometimes the configuration (described above) may indicate that both textual and visual representations should be embedded - which would mean creating two embeddings for a single Element. Currently we don't have a good support for such case. To enable this we should:
Add a embedding_type field to VectorStoreEntry (current options: text, image)
Change the VectorStoreEntry.id field so it depends both on Element.id and VectorStoreEntry.embedding_type (so the uuid hashing should probably be moved from Element.id here?)
Supporting cases where neither representation are to be embedded
Sometimes the configuration (described above) may indicate that no implementation should be embedded - this should generate a warning. This shouldn't generate an exception, because we still want to try to ingest the rest of the documents.
The text was updated successfully, but these errors were encountered:
if the element id already depends on element_type isn't it enough to make id of vector store entry dependent on it ?
No, because in both cases (embedding based on visual representation and embedding based on visual representation) it's a vector based on ImageElement, so the element type will be the same.
Motivation
ImageElement
always have an image and often also have text (added during processing). You can't have a single embedding that include both visual and textual information, so currently (since #172 ) we try to create two embeddings (vector store entries) for each image element. That way we can retrieve them using both.We should add a method to decide which part of
ImageElement
we want to create an embedding for: textual, visual or both.This decision should probably be made on the level of processing - because creating textual representation of the image (with OCR / LLM) and then deciding to not use it during ingestion would be wasteful.
Implementation idea
Deciding whether to embed the textual representation
To support flow where only the visual representation is embedded, and textual representation is not needed:
description
andocr_extracted_text
properties ofImageElement
should be made optionalget_key()
method ofElement
should be made optionalImageElement
should return the valueNone
fromget_key()
for images with no textual representationDocumentSearch
'singest_elements
method should skip creating a textual embedding for elements that haveget_key()
returningNone
User would then choose whether to embed the textual representation of the image by choosing the appropriate processor (one that creates the textual representation or one that doesn't)
Deciding whether to embed the visual representation
This part doesn't currently need any changes to our codebase - users already can decide whether they want a visual embedding by choosing an appropriate embedding model.
Supporting cases where both representations are to be embedded
Sometimes the configuration (described above) may indicate that both textual and visual representations should be embedded - which would mean creating two embeddings for a single Element. Currently we don't have a good support for such case. To enable this we should:
embedding_type
field toVectorStoreEntry
(current options: text, image)VectorStoreEntry.id
field so it depends both onElement.id
andVectorStoreEntry.embedding_type
(so the uuid hashing should probably be moved fromElement.id
here?)Supporting cases where neither representation are to be embedded
Sometimes the configuration (described above) may indicate that no implementation should be embedded - this should generate a warning. This shouldn't generate an exception, because we still want to try to ingest the rest of the documents.
The text was updated successfully, but these errors were encountered: