Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multimodal ChatMessage #7943

Closed
wants to merge 20 commits into from

Conversation

CarlosFerLo
Copy link
Contributor

@CarlosFerLo CarlosFerLo commented Jun 26, 2024

Related Issues

Proposed Changes:

As suggested, I added the capability for ChatMessage to store str, ByteStream and a list of both, this way you can store any content type you might want to use on a chat environment, as images or text. For now we only support text, image urls and images in base 64 as this are the original requested types, more types could be implemented if needed. I also added serialization and updated ChatPromptBuilder to this new ChatMessage.
The ByteStream class was updated with a method to populate mime_type more effectively.

How did you test it?

I added unit tests for all new functionality added.

Notes for the reviewer

I believe this works fine, but maybe it will be hard to explain this new functionality in the docs.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes✅
  • I added unit tests and updated the docstrings ✅
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
  • I documented my code ❌
  • I ran pre-commit hooks and fixed any issue ✅

@github-actions github-actions bot added the type:documentation Improvements on the docs label Jun 28, 2024
@CarlosFerLo CarlosFerLo marked this pull request as ready for review June 28, 2024 22:55
@CarlosFerLo CarlosFerLo requested review from a team as code owners June 28, 2024 22:55
@CarlosFerLo CarlosFerLo requested review from dfokina and silvanocerza and removed request for a team June 28, 2024 22:55
@CarlosFerLo
Copy link
Contributor Author

@silvanocerza let me know if this was what you had in mind for this feature.

@coveralls
Copy link
Collaborator

coveralls commented Jun 28, 2024

Pull Request Test Coverage Report for Build 9719406674

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 54 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction New Missed Lines %
components/audio/whisper_local.py 5 92.19%
dataclasses/chat_message.py 6 95.71%
components/builders/chat_prompt_builder.py 12 88.07%
components/fetchers/link_content.py 12 78.49%
core/pipeline/pipeline.py 19 73.83%
Totals Coverage Status
Change from base Build 9678061193: -0.02%
Covered Lines: 6872
Relevant Lines: 7641

💛 - Coveralls

@coveralls
Copy link
Collaborator

coveralls commented Jun 28, 2024

Pull Request Test Coverage Report for Build 9719403728

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 54 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction New Missed Lines %
components/audio/whisper_local.py 5 92.19%
dataclasses/chat_message.py 6 95.71%
components/builders/chat_prompt_builder.py 12 88.07%
components/fetchers/link_content.py 12 78.49%
core/pipeline/pipeline.py 19 73.83%
Totals Coverage Status
Change from base Build 9678061193: -0.02%
Covered Lines: 6872
Relevant Lines: 7641

💛 - Coveralls

@coveralls
Copy link
Collaborator

coveralls commented Jun 29, 2024

Pull Request Test Coverage Report for Build 9724444663

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 54 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction New Missed Lines %
components/audio/whisper_local.py 5 92.19%
dataclasses/chat_message.py 6 95.71%
components/builders/chat_prompt_builder.py 12 88.07%
components/fetchers/link_content.py 12 78.49%
core/pipeline/pipeline.py 19 73.83%
Totals Coverage Status
Change from base Build 9678061193: -0.02%
Covered Lines: 6872
Relevant Lines: 7641

💛 - Coveralls

@coveralls
Copy link
Collaborator

coveralls commented Jun 29, 2024

Pull Request Test Coverage Report for Build 9724642783

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 54 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction New Missed Lines %
components/audio/whisper_local.py 5 92.19%
dataclasses/chat_message.py 6 95.71%
components/builders/chat_prompt_builder.py 12 88.07%
components/fetchers/link_content.py 12 78.49%
core/pipeline/pipeline.py 19 73.83%
Totals Coverage Status
Change from base Build 9678061193: -0.02%
Covered Lines: 6872
Relevant Lines: 7641

💛 - Coveralls

@CarlosFerLo
Copy link
Contributor Author

I do not know how to fix all the mypy errors, it just messes around on typing when the same variable name is used in two different iterations of a for loop as two different types. The usage is safe by the way.

@lbux
Copy link
Contributor

lbux commented Jul 2, 2024

mypy seems to be really sensitive to working with different types and using isInstance. I resolved my mypy errors by explicitly casting so that mypy is now confident of the type.

This applies to attr-defined and union-attr.

@michaeltremeer
Copy link

Hey all, just wondering if this PR is going ahead? This is needed badly as things move to multi-modality.

@CarlosFerLo
Copy link
Contributor Author

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

Copy link

@michaeltremeer michaeltremeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I've been using this fork and have some feedback:

  • When running ByteStream.from_base64_image, mime_type is hard_coded to image_base64. We should be keeping a valid mime_type here (e.g. image/jpeg or image/png) so that the correct content type can be used later on. Ideally ByteStream.from_base64_image should require a mime_type parameter so that it is set explicitly and correctly.
  • Because of the above issue, the current implementation of ChatMessage.to_openai_format builds the messages assuming that the base64 image is JPEG, and there is no easy way to change this. If any other image type is sent to the LLM, this will lead to an error.
  • When building messages in openai format, the content field of a message must be either a str, a dict with type=text or list. This is not well-documented by OpenAI, but requests will fail if you send a message where context is a dictionary where type=image_url. I have made suggestions to the code, and here is a screenshot showing what happens in each case.
Screenshot 2024-07-23 at 8 00 57 PM Screenshot 2024-07-23 at 8 01 03 PM

haystack/dataclasses/chat_message.py Outdated Show resolved Hide resolved
haystack/dataclasses/chat_message.py Outdated Show resolved Hide resolved
test/dataclasses/test_chat_message.py Outdated Show resolved Hide resolved
test/dataclasses/test_chat_message.py Outdated Show resolved Hide resolved
@michaeltremeer
Copy link

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

No worries Carlos, I love your work in getting this PR done. As an aside, I've been getting acquainted with the library and while this appears to be better suited for multi-modal pipelines than griptape and others, it still seems like it's still quite text-centric and a little hard to work with when you want to weave text, image, audio, and even dataframe/JSON data together. I think your work here is a great start but I do wonder if some of the assumptions of many of the components make sense (e.g. that Documents are generally assumed be text data, along with a lack of tools for converting non-text Documents or Bytestream objects to chat messages or into prompt templates). It's definitely an area that could be prioritised to make things easier to extend the library in future.

@CarlosFerLo
Copy link
Contributor Author

@michaeltremeer

Thanks for your suggestions, I will try and implement the 'png' thing this evening. Anything you might need implemented, just say it and I will do my best.

Regarding the low support for non text types, I completely agree with you, I am looking for a way to introduce object support. We could talk about how to add this support for more complex types, and I believe we can accomplish it.

@coveralls
Copy link
Collaborator

coveralls commented Jul 23, 2024

Pull Request Test Coverage Report for Build 10074443031

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 19 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.04%) to 90.084%

Files with Coverage Reduction New Missed Lines %
components/builders/answer_builder.py 1 98.31%
dataclasses/chat_message.py 6 95.8%
components/builders/chat_prompt_builder.py 12 88.07%
Totals Coverage Status
Change from base Build 10041545511: -0.04%
Covered Lines: 6995
Relevant Lines: 7765

💛 - Coveralls

Copy link

@michaeltremeer michaeltremeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on the changes since the last review

TEXT = "text"
IMAGE_URL = "image_url"
IMAGE_BASE64_JPG = "image_base64/jpg"
IMAGE_BASE64_PNG = "image_base64/png"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an update here, I don't think the approach of a different Enum for every different subtype of image is a wise idea. This will end up with dozens of different types in the future. I think a better approach would be to keep some simple content types (e.g. TEXT, IMAGE_URL, IMAGE_BASE64), and then implement a resolver that can map a given ByteStream object into one of the handful of ContentTypes. This resolver can then be called by ChatMessage._parse_byte_stream_content to return the correct ContentType (see next comment).

Comment on lines 116 to 131
if content.mime_type is None:
raise ValueError(
"Unidentified ByteStream added as part of the content of the ChatMessage."
"Populate thee 'mime_type' attribute with the identifier of the content type."
)

content_type = ContentType(content.mime_type)
if not content_type.is_valid_byte_stream_type():
raise ValueError(
f"The 'mime_type' attribute of the introduced content "
f"has a not valid ContentType for a ByteStream"
f"Value: {content_type}. Valid content types:"
+ ", ".join([c.value for c in ContentType.valid_byte_stream_types()])
)

return content, content_type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace this code with a resolver that takes in the ByteStream object and returns a ContentType, else raises a ValueError in case the ByteStream object is invalid, e.g.

if mime_type.startswith("image/"):
    return ContentType("IMAGE_BASE64")
if mime_type.startswith("audio/"):
    return ContentType("AUDIO")
raise ValueError("...")

Then the to_openai_format can have a single process for any type of image, dynamically filling the message dictionary with the mime_type of the ByteStream object. This avoids a separate process for every different kind of content, which is going to multiply in the future.

@CarlosFerLo
Copy link
Contributor Author

@michaeltremeer I thought about it and I do not know why I didn't implement it this way :)
I have added all the file encoding processing directly on the ChatMessage code as it is short and simple and abstracting it seems to be an overkill. But if you want I con do so, it is simple.

@jkondek1
Copy link

Hi guys, this is a very useful PR. Thanks for that. What is the status of it? Could this be reviewed and merged soon?

@CarlosFerLo
Copy link
Contributor Author

Hi @jkondek1, thanks for the support, I do not know if they are going to implement it or not, I am willing to work with them to resolve all conflicts if they want.

@anishpdoshi
Copy link

Any updates on if this PR could get approved + merged? This is super useful for us.

Also curious if there are any other workarounds to use openai's or claude's image type/multimodal chat messages.

@silvanocerza
Copy link
Contributor

Closing, this is not the direction we want to go with multi modal chat messages.

If you still want to contribute this feature feel free to do following my suggestion from #7848 (comment).

@anishpdoshi
Copy link

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

@jkondek1
Copy link

jkondek1 commented Oct 4, 2024

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

Hi Anish,
I am not sure if it helps you, but we needed to pass an image to the generator.generate() method and found out that you can put the whole "content" list (as it would appear if you were using f.e. openai client) to the method.

 [
        {
            "type": "text",
            "text": "I want to know more about this image"
        },
        {
            "type": "image_url",
            "image_url":
                {
                    "url": "data:image/jpeg;base64," + "base64_encoded_image"
                }
        }
 ]

For OpenAI reference, see https://platform.openai.com/docs/guides/text-generation/building-prompts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChatMessage content being str-only doesn't allow user to pass image
7 participants