Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LLaVa-1.5 in VLM Pipeline #917

Merged
merged 20 commits into from
Oct 11, 2024

Conversation

yatarkan
Copy link
Contributor

@yatarkan yatarkan commented Oct 4, 2024

Ticket: CVS-153333

@yatarkan yatarkan marked this pull request as ready for review October 8, 2024 17:27
@ilya-lavrenov ilya-lavrenov added this to the 2024.5 milestone Oct 8, 2024
@Wovchena
Copy link
Collaborator

Wovchena commented Oct 9, 2024

Please, resolve the conflict

/// @brief A model for image encoding.
ov::InferRequest m_encoder;
/// @brief A config to follow.
ProcessorConfig m_processor_config;

// LLaVa specific members
ov::InferRequest m_vision_embeddings;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define base class for VisionEncoder ? and create inherited classes for MiniCPM and LLava specifically

In this case, we don't need to put all the fields for different models into common class. Similarly, processor config can be model specific class.

}
}

ov::Tensor VLMPipeline::get_inputs_embeds_minicpm(const std::string& prompt, const std::vector<ov::Tensor>& images) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a dedicated class responsible for Inputs embedding?
Such class will be used in Continuous batching implementation

In this case, VLM Pipeline will compute input embeddings w/o knowing any model / pipeline specific and then perform inference of LLM part in auto-regressive mode

@yatarkan yatarkan requested a review from Wovchena October 9, 2024 11:18
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to perform splitting of implementations of VisionEncoder to model backends and introducing VisionTextInputsEmbedder as separate tasks after llava next / intern vl enablement in GenAI

@Wovchena Wovchena enabled auto-merge October 10, 2024 04:50
@Wovchena Wovchena added this pull request to the merge queue Oct 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Oct 10, 2024
@Wovchena
Copy link
Collaborator

There's a conflict again

@Wovchena Wovchena enabled auto-merge October 10, 2024 14:35
@ilya-lavrenov
Copy link
Contributor

Please, update a list of supported models https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md#visual-language-models

src/docs/SUPPORTED_MODELS.md Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/vision_encoder.hpp Outdated Show resolved Hide resolved
@Wovchena Wovchena enabled auto-merge October 11, 2024 11:48
@andrei-kochin andrei-kochin merged commit dbb1f7c into openvinotoolkit:master Oct 11, 2024
46 checks passed
@ilya-lavrenov ilya-lavrenov added category: visual language Visual language pipeline dependencies Pull requests that update a dependency file category: samples GenAI samples labels Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: samples GenAI samples category: visual language Visual language pipeline dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants