-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Multimodal ChatMessage #7943
Conversation
@silvanocerza let me know if this was what you had in mind for this feature. |
Pull Request Test Coverage Report for Build 9719406674Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9719403728Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9724444663Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9724642783Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
I do not know how to fix all the mypy errors, it just messes around on typing when the same variable name is used in two different iterations of a for loop as two different types. The usage is safe by the way. |
mypy seems to be really sensitive to working with different types and using isInstance. I resolved my mypy errors by explicitly casting so that mypy is now confident of the type. This applies to attr-defined and union-attr. |
Hey all, just wondering if this PR is going ahead? This is needed badly as things move to multi-modality. |
@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I've been using this fork and have some feedback:
- When running
ByteStream.from_base64_image
,mime_type
is hard_coded toimage_base64
. We should be keeping a valid mime_type here (e.g.image/jpeg
orimage/png
) so that the correct content type can be used later on. IdeallyByteStream.from_base64_image
should require amime_type
parameter so that it is set explicitly and correctly. - Because of the above issue, the current implementation of
ChatMessage.to_openai_format
builds the messages assuming that the base64 image is JPEG, and there is no easy way to change this. If any other image type is sent to the LLM, this will lead to an error. - When building messages in openai format, the
content
field of a message must be either astr
, a dict withtype=text
orlist
. This is not well-documented by OpenAI, but requests will fail if you send a message where context is a dictionary wheretype=image_url
. I have made suggestions to the code, and here is a screenshot showing what happens in each case.
No worries Carlos, I love your work in getting this PR done. As an aside, I've been getting acquainted with the library and while this appears to be better suited for multi-modal pipelines than griptape and others, it still seems like it's still quite text-centric and a little hard to work with when you want to weave text, image, audio, and even dataframe/JSON data together. I think your work here is a great start but I do wonder if some of the assumptions of many of the components make sense (e.g. that Documents are generally assumed be text data, along with a lack of tools for converting non-text Documents or Bytestream objects to chat messages or into prompt templates). It's definitely an area that could be prioritised to make things easier to extend the library in future. |
Co-authored-by: Michael Tremeer <[email protected]>
Co-authored-by: Michael Tremeer <[email protected]>
Co-authored-by: Michael Tremeer <[email protected]>
Co-authored-by: Michael Tremeer <[email protected]>
Thanks for your suggestions, I will try and implement the 'png' thing this evening. Anything you might need implemented, just say it and I will do my best. Regarding the low support for non text types, I completely agree with you, I am looking for a way to introduce object support. We could talk about how to add this support for more complex types, and I believe we can accomplish it. |
…at message on AnswerBuilder
Pull Request Test Coverage Report for Build 10074443031Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on the changes since the last review
haystack/dataclasses/chat_message.py
Outdated
TEXT = "text" | ||
IMAGE_URL = "image_url" | ||
IMAGE_BASE64_JPG = "image_base64/jpg" | ||
IMAGE_BASE64_PNG = "image_base64/png" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an update here, I don't think the approach of a different Enum for every different subtype of image is a wise idea. This will end up with dozens of different types in the future. I think a better approach would be to keep some simple content types (e.g. TEXT, IMAGE_URL, IMAGE_BASE64), and then implement a resolver that can map a given ByteStream object into one of the handful of ContentTypes. This resolver can then be called by ChatMessage._parse_byte_stream_content
to return the correct ContentType (see next comment).
haystack/dataclasses/chat_message.py
Outdated
if content.mime_type is None: | ||
raise ValueError( | ||
"Unidentified ByteStream added as part of the content of the ChatMessage." | ||
"Populate thee 'mime_type' attribute with the identifier of the content type." | ||
) | ||
|
||
content_type = ContentType(content.mime_type) | ||
if not content_type.is_valid_byte_stream_type(): | ||
raise ValueError( | ||
f"The 'mime_type' attribute of the introduced content " | ||
f"has a not valid ContentType for a ByteStream" | ||
f"Value: {content_type}. Valid content types:" | ||
+ ", ".join([c.value for c in ContentType.valid_byte_stream_types()]) | ||
) | ||
|
||
return content, content_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace this code with a resolver that takes in the ByteStream object and returns a ContentType, else raises a ValueError in case the ByteStream object is invalid, e.g.
if mime_type.startswith("image/"):
return ContentType("IMAGE_BASE64")
if mime_type.startswith("audio/"):
return ContentType("AUDIO")
raise ValueError("...")
Then the to_openai_format
can have a single process for any type of image, dynamically filling the message dictionary with the mime_type
of the ByteStream
object. This avoids a separate process for every different kind of content, which is going to multiply in the future.
@michaeltremeer I thought about it and I do not know why I didn't implement it this way :) |
Hi guys, this is a very useful PR. Thanks for that. What is the status of it? Could this be reviewed and merged soon? |
Hi @jkondek1, thanks for the support, I do not know if they are going to implement it or not, I am willing to work with them to resolve all conflicts if they want. |
Any updates on if this PR could get approved + merged? This is super useful for us. Also curious if there are any other workarounds to use openai's or claude's image type/multimodal chat messages. |
Closing, this is not the direction we want to go with multi modal chat messages. If you still want to contribute this feature feel free to do following my suggestion from #7848 (comment). |
@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator? |
Hi Anish, [
{
"type": "text",
"text": "I want to know more about this image"
},
{
"type": "image_url",
"image_url":
{
"url": "data:image/jpeg;base64," + "base64_encoded_image"
}
}
] For OpenAI reference, see https://platform.openai.com/docs/guides/text-generation/building-prompts |
Related Issues
str
-only doesn't allow user to pass image #7848Proposed Changes:
As suggested, I added the capability for
ChatMessage
to storestr
,ByteStream
and a list of both, this way you can store any content type you might want to use on a chat environment, as images or text. For now we only support text, image urls and images in base 64 as this are the original requested types, more types could be implemented if needed. I also added serialization and updatedChatPromptBuilder
to this newChatMessage
.The
ByteStream
class was updated with a method to populatemime_type
more effectively.How did you test it?
I added unit tests for all new functionality added.
Notes for the reviewer
I believe this works fine, but maybe it will be hard to explain this new functionality in the docs.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. ✅