Skip to content

Commit

Permalink
[Doc] Move guide for multimodal model and other improvements (vllm-pr…
Browse files Browse the repository at this point in the history
  • Loading branch information
DarkLight1337 authored Jul 6, 2024
1 parent 175c43e commit 9389380
Show file tree
Hide file tree
Showing 8 changed files with 61 additions and 67 deletions.
6 changes: 3 additions & 3 deletions docs/source/dev/input_processing/model_inputs_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Input Processing

.. currentmodule:: vllm.inputs

vLLM provides a mechanism for defining input processors for each model so that the inputs are processed
in :class:`~vllm.LLMEngine` before they are passed to model executors.
Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
:data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.

Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input
Currently, this mechanism is only utilized in :ref:`multi-modal <multi_modality>` models for preprocessing multi-modal input
data in addition to input prompt, but it can be extended to text-only language models when needed.

Guides
Expand Down
20 changes: 6 additions & 14 deletions docs/source/dev/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,17 @@ Multi-Modality

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data``
which allows you to pass in multi-modal input alongside text and token prompts.
Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.

.. note::
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
:class:`vllm.multimodal.MULTIMODAL_REGISTRY`.
the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.

By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.


# TODO: Add more instructions on how to do that once embeddings is in.

Guides
++++++

.. toctree::
:maxdepth: 1

adding_multimodal_model
..
TODO: Add more instructions on how to add new plugins once embeddings is in.
Module Contents
+++++++++++++++
Expand Down
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Documentation

models/supported_models
models/adding_model
models/enabling_multimodal_inputs
models/engine_args
models/lora
models/vlm
Expand All @@ -116,6 +117,7 @@ Documentation
automatic_prefix_caching/details

.. toctree::
:maxdepth: 2
:caption: Developer Documentation

dev/sampling_params
Expand Down
38 changes: 21 additions & 17 deletions docs/source/models/adding_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.

.. note::
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
please follow :ref:`this guide <enabling_multimodal_inputs>` after implementing the model here.

.. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out!
Expand Down Expand Up @@ -44,23 +48,23 @@ Next, you need to rewrite the :meth:`~torch.nn.Module.forward` method of your mo

.. code-block:: diff
def forward(
self,
input_ids: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[List[torch.FloatTensor]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
-) -> Union[Tuple, CausalLMOutputWithPast]:
+ positions: torch.Tensor,
+ kv_caches: List[torch.Tensor],
+ attn_metadata: AttentionMetadata,
+) -> Optional[SamplerOutput]:
def forward(
self,
input_ids: torch.Tensor,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- past_key_values: Optional[List[torch.FloatTensor]] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- labels: Optional[torch.LongTensor] = None,
- use_cache: Optional[bool] = None,
- output_attentions: Optional[bool] = None,
- output_hidden_states: Optional[bool] = None,
- return_dict: Optional[bool] = None,
- ) -> Union[Tuple, CausalLMOutputWithPast]:
+ positions: torch.Tensor,
+ kv_caches: List[torch.Tensor],
+ attn_metadata: AttentionMetadata,
+ ) -> Optional[SamplerOutput]:
1. Update the code by considering that :code:`input_ids` and :code:`positions` are now flattened tensors.
2. Replace the attention operation with either :code:`PagedAttention`, :code:`PagedAttentionWithRoPE`, or :code:`PagedAttentionWithALiBi` depending on the model's architecture.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,26 +1,21 @@
.. _adding_a_new_multimodal_model:
.. _enabling_multimodal_inputs:

Adding a New Multimodal Model
=============================
Enabling Multimodal Inputs
==========================

This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM.
This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.

.. note::
The complexity of adding a new model depends heavily on the model's architecture.
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.

.. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out!
.. seealso::
:ref:`adding_a_new_model`


1. Set up the base vLLM model
1. Update the base vLLM model
-----------------------------

As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following:
It is assumed that you have already implemented the model in vLLM according to :ref:`these steps <adding_a_new_model>`.
Further update the model as follows:

- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
- Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.

.. code-block:: diff
Expand All @@ -33,19 +28,19 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
The model class does not have to be named :code:`*ForCausalLM`.
Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.

- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
- If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward`
for each input tensor that corresponds to a multi-modal input, as shown in the following example:

.. code-block:: diff
def forward(
self,
input_ids: torch.Tensor,
positions: torch.Tensor,
kv_caches: List[torch.Tensor],
attn_metadata: AttentionMetadata,
+ pixel_values: torch.Tensor,
) -> SamplerOutput:
def forward(
self,
input_ids: torch.Tensor,
positions: torch.Tensor,
kv_caches: List[torch.Tensor],
attn_metadata: AttentionMetadata,
+ pixel_values: torch.Tensor,
) -> SamplerOutput:
2. Register input mappers
Expand All @@ -68,8 +63,8 @@ A default mapper is available for each modality in the core vLLM library. This i
:ref:`input_processing_pipeline`


3. Register maximum number of multimodal tokens
----------------------------------------------------------
3. Register maximum number of multi-modal tokens
------------------------------------------------

For each modality type that the model accepts as input, calculate the maximum possible number of tokens
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ Vision Language Models
-

If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Adding a New Multimodal Model <adding_a_new_multimodal_model>`
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
for instructions on how to implement support for your model.
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.

Expand Down
2 changes: 1 addition & 1 deletion vllm/inputs/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def dummy_data_for_profiling(self, model_config: "ModelConfig",
The model is identified by ``model_config``.
See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down
11 changes: 6 additions & 5 deletions vllm/multimodal/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ def register_input_mapper(
If `None` is provided, then the default input mapper is used instead.
See also:
:ref:`input_processing_pipeline`
:ref:`adding_a_new_multimodal_model`
- :ref:`input_processing_pipeline`
- :ref:`enabling_multimodal_inputs`
"""

def wrapper(model_cls: N) -> N:
Expand Down Expand Up @@ -192,7 +192,8 @@ def map_input(self, model_config: ModelConfig,
TypeError: If the data type is not supported.
See also:
:ref:`adding_a_new_multimodal_model`
- :ref:`input_processing_pipeline`
- :ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down Expand Up @@ -230,7 +231,7 @@ def register_max_multimodal_tokens(
If `None` is provided, then the default calculation is used instead.
See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""

def wrapper(model_cls: N) -> N:
Expand Down Expand Up @@ -260,7 +261,7 @@ def get_max_multimodal_tokens(self, model_config: ModelConfig) -> int:
The model is identified by ``model_config``.
See also:
:ref:`adding_a_new_multimodal_model`
:ref:`enabling_multimodal_inputs`
"""
# Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture
Expand Down

0 comments on commit 9389380

Please sign in to comment.