-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace export_MiniCPM-V-2_6.py #957
Closed
Wovchena
wants to merge
33
commits into
openvinotoolkit:master
from
Wovchena:replace-export_MiniCPM-V-2_6.py
Closed
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
6151231
Hide VLM files and API
Wovchena 7d94e1a
Remove unused concatenate_mid_dim
Wovchena eeb818d
Initialize m_image_id in constructor similar to the reset of the fields
Wovchena 20a6954
Retrigger
Wovchena 0737db2
Move to visual_language
Wovchena 0bddfba
Correct py_vlm_pipeline.cpp include
Wovchena 1b2da2d
fix
Wovchena 7f0ef7a
Move vision_encoder, pipeline.hpp
Wovchena 457024c
Replace export_MiniCPM-V-2_6.py
Wovchena d11f18d
Downgrade optimum
Wovchena a82fe79
Everywhere python -m pip install -U optimum<1.23 --no-dependencies
Wovchena 6d37b64
Remove duplicates
Wovchena b8fd628
Fix dtype
Wovchena b5bad1f
Merge branch 'master' into replace-export_MiniCPM-V-2_6.py
Wovchena 7bdce55
fix merge
Wovchena ff4f4be
delete src/cpp/src/visual_language/vlm_pipeline.cpp
Wovchena 4112edf
fix conversion in test
Wovchena c4573b8
dont print in test
Wovchena 8c67805
skip
Wovchena 24015da
cleanup
Wovchena 8410b22
Put torchvision back
Wovchena 1fea50f
update tests requirements
Wovchena 65644db
Merge branch 'master' into replace-export_MiniCPM-V-2_6.py
Wovchena d1448ef
remove wwb req
Wovchena 67e60ac
wwb reqs
Wovchena f67ce00
req
Wovchena e2ac30e
int8
Wovchena e084e79
xfail
Wovchena d868e0f
Merge branch 'master' into replace-export_MiniCPM-V-2_6.py
Wovchena 509fb2f
Move common model parts
Wovchena 7513752
Merge branch 'master' into replace-export_MiniCPM-V-2_6.py
Wovchena 3bf2381
Merge branch 'master' into replace-export_MiniCPM-V-2_6.py
Wovchena db8fdc9
Increase timeout
Wovchena File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion
2
src/cpp/src/processor_config.cpp → .../src/visual_language/processor_config.cpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
// Copyright (C) 2023-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
#include <openvino/genai/vision_encoder.hpp> | ||
#include "clip.hpp" | ||
#include "vision_encoder.hpp" | ||
#include "visual_language/clip.hpp" | ||
#include "utils.hpp" | ||
|
||
using namespace ov::genai; | ||
|
@@ -300,8 +300,8 @@ EncodedImage llava_image_embed_make_with_bytes_slice(clip_ctx& ctx_clip, const o | |
ov::Tensor input_tensor{ov::element::f32, {1, 3, size_t(resized_preprocessed.ny), size_t(resized_preprocessed.nx)}, (void*)(resized_preprocessed.buf.data())}; | ||
ov::Tensor pixel_values = preprocess_for_encoder(input_tensor, patch_size); | ||
encoder.set_tensor("pixel_values", pixel_values); | ||
ov::Tensor patch_attention_mask{ov::element::boolean, {pixel_values.get_shape().at(0), 1, resized_source_size.height * resized_source_size.width}}; | ||
std::fill_n(patch_attention_mask.data<bool>(), patch_attention_mask.get_size(), true); | ||
ov::Tensor patch_attention_mask{ov::element::f32, {pixel_values.get_shape().at(0), 1, resized_source_size.height * resized_source_size.width}}; | ||
std::fill_n(patch_attention_mask.data<float>(), patch_attention_mask.get_size(), 1.0f); | ||
encoder.set_tensor("patch_attention_mask", patch_attention_mask); | ||
ov::Tensor position_ids = prepare_vis_position_ids(pixel_values, patch_attention_mask, {resized_source_size}, ctx_clip.patch_size, ctx_clip.image_size / ctx_clip.patch_size); | ||
encoder.set_tensor("position_ids", position_ids); | ||
|
@@ -432,7 +432,7 @@ ov::Tensor preprocess_image_llava(const ov::Tensor& image, const ProcessorConfig | |
VisionEncoder::VisionEncoder(const std::filesystem::path& model_dir, const VLMModelType model_type, const std::string& device, const ov::AnyMap device_config, ov::Core core) : | ||
model_type(model_type) { | ||
if (model_type == VLMModelType::MINICPM) { | ||
m_vision_encoder = core.compile_model(model_dir / "image_encoder.xml", device, device_config).create_infer_request(); | ||
m_vision_encoder = core.compile_model(model_dir / "openvino_vision_embeddings_model.xml", device, device_config).create_infer_request(); | ||
} else if (model_type == VLMModelType::LLAVA) { | ||
// Vision embeddings model is merged with multi modal projector at model export stage by optimum-intel | ||
m_vision_encoder = core.compile_model(model_dir / "openvino_vision_embeddings_model.xml", device, device_config).create_infer_request(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This if statement can also be omitted after switching to optimum-cli export for minicpm (as in #957 (comment) ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
src/cpp/src/vlm_config.cpp → src/cpp/src/visual_language/vlm_config.cpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be moved to
samples/requirements.txt
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a conflicting thing. The only possible thing is to add it to README.md. But I want to see if it fixes by itself before the release