feat: Multimodal ChatMessage #7943

CarlosFerLo · 2024-06-26T22:16:13Z

Related Issues

fixes ChatMessage content being str-only doesn't allow user to pass image #7848

Proposed Changes:

As suggested, I added the capability for ChatMessage to store str, ByteStream and a list of both, this way you can store any content type you might want to use on a chat environment, as images or text. For now we only support text, image urls and images in base 64 as this are the original requested types, more types could be implemented if needed. I also added serialization and updated ChatPromptBuilder to this new ChatMessage.
The ByteStream class was updated with a method to populate mime_type more effectively.

How did you test it?

I added unit tests for all new functionality added.

Notes for the reviewer

I believe this works fine, but maybe it will be hard to explain this new functionality in the docs.

Checklist

I have read the contributors guidelines and the code of conduct ✅
I have updated the related issue with new insights and changes✅
I added unit tests and updated the docstrings ✅
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
I documented my code ❌
I ran pre-commit hooks and fixed any issue ✅

…content

CarlosFerLo · 2024-06-28T22:57:07Z

@silvanocerza let me know if this was what you had in mind for this feature.

coveralls · 2024-06-28T23:01:45Z

Pull Request Test Coverage Report for Build 9719406674

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

coveralls · 2024-06-28T23:02:27Z

Pull Request Test Coverage Report for Build 9719403728

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

coveralls · 2024-06-29T13:24:02Z

Pull Request Test Coverage Report for Build 9724444663

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

coveralls · 2024-06-29T14:04:28Z

Pull Request Test Coverage Report for Build 9724642783

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
54 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.02%) to 89.936%

Files with Coverage Reduction	New Missed Lines	%
components/audio/whisper_local.py	5	92.19%
dataclasses/chat_message.py	6	95.71%
components/builders/chat_prompt_builder.py	12	88.07%
components/fetchers/link_content.py	12	78.49%
core/pipeline/pipeline.py	19	73.83%

Totals
Change from base Build 9678061193:	-0.02%
Covered Lines:	6872
Relevant Lines:	7641

💛 - Coveralls

CarlosFerLo · 2024-06-29T20:39:45Z

I do not know how to fix all the mypy errors, it just messes around on typing when the same variable name is used in two different iterations of a for loop as two different types. The usage is safe by the way.

lbux · 2024-07-02T03:13:12Z

mypy seems to be really sensitive to working with different types and using isInstance. I resolved my mypy errors by explicitly casting so that mypy is now confident of the type.

This applies to attr-defined and union-attr.

michaeltremeer · 2024-07-23T07:03:39Z

Hey all, just wondering if this PR is going ahead? This is needed badly as things move to multi-modality.

CarlosFerLo · 2024-07-23T10:14:49Z

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

michaeltremeer

Hi all, I've been using this fork and have some feedback:

When running ByteStream.from_base64_image, mime_type is hard_coded to image_base64. We should be keeping a valid mime_type here (e.g. image/jpeg or image/png) so that the correct content type can be used later on. Ideally ByteStream.from_base64_image should require a mime_type parameter so that it is set explicitly and correctly.
Because of the above issue, the current implementation of ChatMessage.to_openai_format builds the messages assuming that the base64 image is JPEG, and there is no easy way to change this. If any other image type is sent to the LLM, this will lead to an error.
When building messages in openai format, the content field of a message must be either a str, a dict with type=text or list. This is not well-documented by OpenAI, but requests will fail if you send a message where context is a dictionary where type=image_url. I have made suggestions to the code, and here is a screenshot showing what happens in each case.

haystack/dataclasses/chat_message.py

test/dataclasses/test_chat_message.py

michaeltremeer · 2024-07-23T11:31:46Z

@michaeltremeer heyy, it seems like everyone is on holiday, and I have no write access, so I can't merge it into main, but I expect that in the near future someone does.

No worries Carlos, I love your work in getting this PR done. As an aside, I've been getting acquainted with the library and while this appears to be better suited for multi-modal pipelines than griptape and others, it still seems like it's still quite text-centric and a little hard to work with when you want to weave text, image, audio, and even dataframe/JSON data together. I think your work here is a great start but I do wonder if some of the assumptions of many of the components make sense (e.g. that Documents are generally assumed be text data, along with a lack of tools for converting non-text Documents or Bytestream objects to chat messages or into prompt templates). It's definitely an area that could be prioritised to make things easier to extend the library in future.

Co-authored-by: Michael Tremeer <[email protected]>

CarlosFerLo · 2024-07-23T12:24:16Z

@michaeltremeer

Thanks for your suggestions, I will try and implement the 'png' thing this evening. Anything you might need implemented, just say it and I will do my best.

Regarding the low support for non text types, I completely agree with you, I am looking for a way to introduce object support. We could talk about how to add this support for more complex types, and I believe we can accomplish it.

…at message on AnswerBuilder

…istinctions

coveralls · 2024-07-23T21:55:08Z

Pull Request Test Coverage Report for Build 10074443031

Details

0 of 0 changed or added relevant lines in 0 files are covered.
19 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.04%) to 90.084%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.31%
dataclasses/chat_message.py	6	95.8%
components/builders/chat_prompt_builder.py	12	88.07%

Totals
Change from base Build 10041545511:	-0.04%
Covered Lines:	6995
Relevant Lines:	7765

💛 - Coveralls

michaeltremeer

See comments on the changes since the last review

michaeltremeer · 2024-07-24T01:42:35Z

haystack/dataclasses/chat_message.py

+    TEXT = "text"
+    IMAGE_URL = "image_url"
+    IMAGE_BASE64_JPG = "image_base64/jpg"
+    IMAGE_BASE64_PNG = "image_base64/png"


Just an update here, I don't think the approach of a different Enum for every different subtype of image is a wise idea. This will end up with dozens of different types in the future. I think a better approach would be to keep some simple content types (e.g. TEXT, IMAGE_URL, IMAGE_BASE64), and then implement a resolver that can map a given ByteStream object into one of the handful of ContentTypes. This resolver can then be called by ChatMessage._parse_byte_stream_content to return the correct ContentType (see next comment).

michaeltremeer · 2024-07-24T01:52:01Z

haystack/dataclasses/chat_message.py

+        if content.mime_type is None:
+            raise ValueError(
+                "Unidentified ByteStream added as part of the content of the ChatMessage."
+                "Populate thee 'mime_type' attribute with the identifier of the content type."
+            )
+
+        content_type = ContentType(content.mime_type)
+        if not content_type.is_valid_byte_stream_type():
+            raise ValueError(
+                f"The 'mime_type' attribute of the introduced content "
+                f"has a not valid ContentType for a ByteStream"
+                f"Value: {content_type}. Valid content types:"
+                + ", ".join([c.value for c in ContentType.valid_byte_stream_types()])
+            )
+
+        return content, content_type


I would replace this code with a resolver that takes in the ByteStream object and returns a ContentType, else raises a ValueError in case the ByteStream object is invalid, e.g.

if mime_type.startswith("image/"): return ContentType("IMAGE_BASE64") if mime_type.startswith("audio/"): return ContentType("AUDIO") raise ValueError("...")

Then the to_openai_format can have a single process for any type of image, dynamically filling the message dictionary with the mime_type of the ByteStream object. This avoids a separate process for every different kind of content, which is going to multiply in the future.

CarlosFerLo · 2024-07-24T09:54:09Z

@michaeltremeer I thought about it and I do not know why I didn't implement it this way :)
I have added all the file encoding processing directly on the ChatMessage code as it is short and simple and abstracting it seems to be an overkill. But if you want I con do so, it is simple.

jkondek1 · 2024-08-30T05:47:27Z

Hi guys, this is a very useful PR. Thanks for that. What is the status of it? Could this be reviewed and merged soon?

CarlosFerLo · 2024-09-04T11:43:06Z

Hi @jkondek1, thanks for the support, I do not know if they are going to implement it or not, I am willing to work with them to resolve all conflicts if they want.

anishpdoshi · 2024-09-16T22:02:16Z

Any updates on if this PR could get approved + merged? This is super useful for us.

Also curious if there are any other workarounds to use openai's or claude's image type/multimodal chat messages.

silvanocerza · 2024-09-24T14:20:36Z

Closing, this is not the direction we want to go with multi modal chat messages.

If you still want to contribute this feature feel free to do following my suggestion from #7848 (comment).

anishpdoshi · 2024-09-27T20:46:55Z

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

jkondek1 · 2024-10-04T15:12:41Z

@silvanocerza Is there any workaround to passing in multimodal chat messages into a haystack generator?

Hi Anish,
I am not sure if it helps you, but we needed to pass an image to the generator.generate() method and found out that you can put the whole "content" list (as it would appear if you were using f.e. openai client) to the method.

 [
        {
            "type": "text",
            "text": "I want to know more about this image"
        },
        {
            "type": "image_url",
            "image_url":
                {
                    "url": "data:image/jpeg;base64," + "base64_encoded_image"
                }
        }
 ]

For OpenAI reference, see https://platform.openai.com/docs/guides/text-generation/building-prompts

first implementation of the improved chat message

c35a8cc

github-actions bot added the topic:tests label Jun 26, 2024

CarlosFerLo added 6 commits June 29, 2024 00:07

Serialization and basic functionality of the new kind of ChatMessage …

5dd95d2

…content

add reno

46d9c3f

add ByteStream.from_base64_image method

42876a4

update to_openai_format method on ChatMessage

336d145

solve syntax error

238d159

solve syntax error

1ee2b56

github-actions bot added the type:documentation Improvements on the docs label Jun 28, 2024

CarlosFerLo marked this pull request as ready for review June 28, 2024 22:55

CarlosFerLo requested review from a team as code owners June 28, 2024 22:55

CarlosFerLo requested review from dfokina and silvanocerza and removed request for a team June 28, 2024 22:55

add typing to help pass mypy tests

2368c68

add typing to help pass mypy tests

64106f4

lbux mentioned this pull request Jul 22, 2024

Issue of ollama supporting the function call feature deepset-ai/haystack-core-integrations#913

Closed

michaeltremeer reviewed Jul 23, 2024

View reviewed changes

haystack/dataclasses/chat_message.py Outdated Show resolved Hide resolved

haystack/dataclasses/chat_message.py Outdated Show resolved Hide resolved

test/dataclasses/test_chat_message.py Outdated Show resolved Hide resolved

test/dataclasses/test_chat_message.py Outdated Show resolved Hide resolved

CarlosFerLo and others added 4 commits July 23, 2024 14:16

Wrap image_url dict in a list for OpenAI formatting

5a5ecdc

Co-authored-by: Michael Tremeer <[email protected]>

Wrap image_url in base 64 on a list

0cf62a3

Co-authored-by: Michael Tremeer <[email protected]>

Update test/dataclasses/test_chat_message.py

e164f34

Co-authored-by: Michael Tremeer <[email protected]>

Update test/dataclasses/test_chat_message.py

9458882

Co-authored-by: Michael Tremeer <[email protected]>

CarlosFerLo added 5 commits July 23, 2024 22:02

Merge branch 'main' into issue-7848b

0a8184b

update bytestream to recieve png images

503de10

add jpg and png support on ChatMessage

2a8a707

remove the internal functioning key of the metadata extracted from ch…

d64c9f1

…at message on AnswerBuilder

adapt tests on 'test_chat_prompt_builder.py' to the new jpg and png d…

ee22610

…istinctions

try and fix some mypy errors

e5a154c

michaeltremeer reviewed Jul 24, 2024

View reviewed changes

update to simplify ContentType

3c2a913

silvanocerza closed this Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multimodal ChatMessage #7943

feat: Multimodal ChatMessage #7943

CarlosFerLo commented Jun 26, 2024 •

edited

Loading

CarlosFerLo commented Jun 28, 2024

coveralls commented Jun 28, 2024 •

edited

Loading

coveralls commented Jun 28, 2024 •

edited

Loading

coveralls commented Jun 29, 2024 •

edited

Loading

coveralls commented Jun 29, 2024 •

edited

Loading

CarlosFerLo commented Jun 29, 2024

lbux commented Jul 2, 2024 •

edited

Loading

michaeltremeer commented Jul 23, 2024

CarlosFerLo commented Jul 23, 2024

michaeltremeer left a comment •

edited

Loading

michaeltremeer commented Jul 23, 2024

CarlosFerLo commented Jul 23, 2024

coveralls commented Jul 23, 2024 •

edited

Loading

michaeltremeer left a comment

michaeltremeer Jul 24, 2024

michaeltremeer Jul 24, 2024

CarlosFerLo commented Jul 24, 2024

jkondek1 commented Aug 30, 2024

CarlosFerLo commented Sep 4, 2024

anishpdoshi commented Sep 16, 2024

silvanocerza commented Sep 24, 2024

anishpdoshi commented Sep 27, 2024

jkondek1 commented Oct 4, 2024 •

edited

Loading

feat: Multimodal ChatMessage #7943

feat: Multimodal ChatMessage #7943

Conversation

CarlosFerLo commented Jun 26, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

CarlosFerLo commented Jun 28, 2024

coveralls commented Jun 28, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9719406674

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 28, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9719403728

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 29, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9724444663

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 29, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9724642783

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

CarlosFerLo commented Jun 29, 2024

lbux commented Jul 2, 2024 • edited Loading

michaeltremeer commented Jul 23, 2024

CarlosFerLo commented Jul 23, 2024

michaeltremeer left a comment • edited Loading

Choose a reason for hiding this comment

michaeltremeer commented Jul 23, 2024

CarlosFerLo commented Jul 23, 2024

coveralls commented Jul 23, 2024 • edited Loading

Pull Request Test Coverage Report for Build 10074443031

Details

💛 - Coveralls

michaeltremeer left a comment

Choose a reason for hiding this comment

michaeltremeer Jul 24, 2024

Choose a reason for hiding this comment

michaeltremeer Jul 24, 2024

Choose a reason for hiding this comment

CarlosFerLo commented Jul 24, 2024

jkondek1 commented Aug 30, 2024

CarlosFerLo commented Sep 4, 2024

anishpdoshi commented Sep 16, 2024

silvanocerza commented Sep 24, 2024

anishpdoshi commented Sep 27, 2024

jkondek1 commented Oct 4, 2024 • edited Loading

CarlosFerLo commented Jun 26, 2024 •

edited

Loading

coveralls commented Jun 28, 2024 •

edited

Loading

coveralls commented Jun 28, 2024 •

edited

Loading

coveralls commented Jun 29, 2024 •

edited

Loading

coveralls commented Jun 29, 2024 •

edited

Loading

lbux commented Jul 2, 2024 •

edited

Loading

michaeltremeer left a comment •

edited

Loading

coveralls commented Jul 23, 2024 •

edited

Loading

jkondek1 commented Oct 4, 2024 •

edited

Loading