-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable LLaVa-1.5 in VLM Pipeline #917
Enable LLaVa-1.5 in VLM Pipeline #917
Conversation
…for llava and minicpm
42d275f
to
790981f
Compare
Please, resolve the conflict |
/// @brief A model for image encoding. | ||
ov::InferRequest m_encoder; | ||
/// @brief A config to follow. | ||
ProcessorConfig m_processor_config; | ||
|
||
// LLaVa specific members | ||
ov::InferRequest m_vision_embeddings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we define base class for VisionEncoder ? and create inherited classes for MiniCPM and LLava specifically
In this case, we don't need to put all the fields for different models into common class. Similarly, processor config can be model specific class.
} | ||
} | ||
|
||
ov::Tensor VLMPipeline::get_inputs_embeds_minicpm(const std::string& prompt, const std::vector<ov::Tensor>& images) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a dedicated class responsible for Inputs embedding?
Such class will be used in Continuous batching implementation
In this case, VLM Pipeline will compute input embeddings w/o knowing any model / pipeline specific and then perform inference of LLM part in auto-regressive mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to perform splitting of implementations of VisionEncoder to model backends and introducing VisionTextInputsEmbedder as separate tasks after llava next / intern vl enablement in GenAI
There's a conflict again |
Please, update a list of supported models https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md#visual-language-models |
Ticket: CVS-153333