OVModelForVisionCausalLM #883

eaidova · 2024-08-29T05:54:29Z

What does this PR do?

Enables conversion and inference for multimodality llm like llava, llava-next, falcon-vl, pixtral, internvl
Example of usage:

from PIL import Image
import requests
from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor

model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
model = OVModelForVisualCausalLM.from_pretrained(model_id)
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What are these?"},
            {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(processor.decode(output[0], skip_special_tokens=True))

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-08-29T05:59:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eaidova · 2024-09-18T04:11:40Z

@echarlaix could you please take a look? Thanks!

P.S. I'm still working on extending models coverage, but I think it makes sense to start looking on general API now

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/convert.py

AlexKoff88 · 2024-09-30T09:18:28Z

optimum/intel/openvino/modeling_visual_language.py

+        **kwargs,
+    ):
+        """
+        Export a vanilla Transformers model into an ONNX model using `transformers.onnx.export_onnx`.


Do we really export to the ONNX? The docstrings should be revised accordingly.

I accidentally copied that from https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_base.py#L534
I see the same doc string in other model classes for openvino (seq2seq, stable diffusion support e.t.c.), @echarlaix maybe it is time to revise them in all models too?

echarlaix

Looking good, thanks a lot @eaidova

optimum/exporters/openvino/stateful.py

echarlaix · 2024-10-01T16:36:56Z

optimum/exporters/openvino/convert.py

+    if model_type == "internvl-chat" and preprocessors is not None:
+        model.config.img_context_token_id = preprocessors[0].convert_tokens_to_ids("<IMG_CONTEXT>")
+
+    if hasattr(model, "image_newline"):
+        model.config.image_newline = model.image_newline.tolist()


why is this needed ?

this is tranable parameter filled by random values during model weights init, we can not capture this during model export as it is not a part of submodels inference forward pass, that is why for transferring it from pt to ov I need to use config for that. If you have better suggestion, I can try to implement this

optimum/exporters/openvino/convert.py

optimum/exporters/openvino/model_patcher.py

optimum/intel/utils/modeling_utils.py

optimum/exporters/openvino/model_configs.py

optimum/intel/openvino/modeling_visual_language.py

echarlaix

Huge work, thanks @eaidova, left couple of minor comments

optimum/intel/openvino/modeling_visual_language.py

echarlaix · 2024-10-03T10:16:52Z

P.S. I'm still working on extending models coverage, but I think it makes sense to start looking on general API now

Very nice, let's merge this PR and we can always extend support in following PRs, does it work for you @eaidova ?

eaidova · 2024-10-04T07:54:39Z

P.S. I'm still working on extending models coverage, but I think it makes sense to start looking on general API now

Very nice, let's merge this PR and we can always extend support in following PRs, does it work for you @eaidova ?

@echarlaix , yes, I agree, could you please merge?

eaidova changed the title ~~Ea/llava model~~ OVModelForVisionCausalLM Aug 29, 2024

eaidova force-pushed the ea/llava_model branch from 0aaf5f8 to 6276b0c Compare September 16, 2024 04:34

eaidova marked this pull request as ready for review September 17, 2024 19:29

eaidova force-pushed the ea/llava_model branch from de05b12 to 60ecb78 Compare September 18, 2024 04:12

eaidova commented Sep 18, 2024

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova commented Sep 18, 2024

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova commented Sep 18, 2024

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova force-pushed the ea/llava_model branch 6 times, most recently from c35d1f2 to e0da998 Compare September 26, 2024 12:32

eaidova requested review from echarlaix, AlexKoff88 and IlyasMoutawwakil September 27, 2024 04:44

eaidova added 12 commits September 27, 2024 08:52

WIP

e179106

wip2

92e35c9

llava class

c072972

fix inference part

f072ff3

llava-next wip

1511187

llava-next support

a5706e3

refactor export part

4a19922

refactoring part2, add test case

a4ce0a8

add llava next test

a43aa45

fix transformers version

890095d

internvl support

410c8b7

joined vision embeddings for llava

4f6d48e

add support transformers 4.45 preprocessing for llava

2fb59ce

eaidova force-pushed the ea/llava_model branch from 109f927 to 2fb59ce Compare September 27, 2024 06:20

fix old version compatibility

57d209f

eaidova force-pushed the ea/llava_model branch from 96d8de4 to 57d209f Compare September 27, 2024 08:55

add pixtral support

c350eff

AlexKoff88 reviewed Sep 30, 2024

View reviewed changes

optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved

AlexKoff88 reviewed Sep 30, 2024

View reviewed changes

AlexKoff88 approved these changes Sep 30, 2024

View reviewed changes

eaidova added 2 commits October 1, 2024 12:03

Merge branch 'main' into ea/llava_model

deec64c

apply review comments and fix pixtral

87533ca

echarlaix reviewed Oct 1, 2024

View reviewed changes

eaidova added 2 commits October 2, 2024 10:54

apply refactoring suggestions

2754f77

do not override library name for model parts

667dd3f

eaidova requested a review from echarlaix October 2, 2024 07:32

add compile_only and min transformers version

a1c1737

echarlaix approved these changes Oct 2, 2024

View reviewed changes

eaidova force-pushed the ea/llava_model branch 2 times, most recently from 1164b36 to a97f962 Compare October 3, 2024 05:43

apply 16bit patching and refactor vlm support

e7295b7

eaidova force-pushed the ea/llava_model branch from a97f962 to e7295b7 Compare October 3, 2024 06:57

echarlaix merged commit 749a1d6 into huggingface:main Oct 7, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OVModelForVisionCausalLM #883

OVModelForVisionCausalLM #883

eaidova commented Aug 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 29, 2024

eaidova commented Sep 18, 2024

AlexKoff88 Sep 30, 2024

eaidova Oct 1, 2024

echarlaix left a comment

echarlaix Oct 1, 2024

eaidova Oct 2, 2024

echarlaix left a comment

echarlaix commented Oct 3, 2024

eaidova commented Oct 4, 2024

OVModelForVisionCausalLM #883

OVModelForVisionCausalLM #883

Conversation

eaidova commented Aug 29, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Aug 29, 2024

eaidova commented Sep 18, 2024

AlexKoff88 Sep 30, 2024

Choose a reason for hiding this comment

eaidova Oct 1, 2024

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Oct 1, 2024

Choose a reason for hiding this comment

eaidova Oct 2, 2024

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix commented Oct 3, 2024

eaidova commented Oct 4, 2024

eaidova commented Aug 29, 2024 •

edited

Loading