From 5d3a629894eb1e34c7eb9243cabe22e88ee9320f Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Mon, 2 Dec 2024 16:44:26 +0000 Subject: [PATCH] Update Signed-off-by: DarkLight1337 --- docs/source/usage/compatibility_matrix.rst | 6 +++--- docs/source/usage/pooling_models.rst | 8 ++++++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/source/usage/compatibility_matrix.rst b/docs/source/usage/compatibility_matrix.rst index a93632ff36fb8..79ca27fb694eb 100644 --- a/docs/source/usage/compatibility_matrix.rst +++ b/docs/source/usage/compatibility_matrix.rst @@ -39,7 +39,7 @@ Feature x Feature - :abbr:`prmpt adptr (Prompt Adapter)` - :ref:`SD ` - CUDA graph - - :abbr:`emd (Embedding Models)` + - :abbr:`pooling (Pooling Models)` - :abbr:`enc-dec (Encoder-Decoder Models)` - :abbr:`logP (Logprobs)` - :abbr:`prmpt logP (Prompt Logprobs)` @@ -151,7 +151,7 @@ Feature x Feature - - - - * - :abbr:`emd (Embedding Models)` + * - :abbr:`pooling (Pooling Models)` - ✗ - ✗ - ✗ @@ -386,7 +386,7 @@ Feature x Hardware - ✅ - ✗ - ✅ - * - :abbr:`emd (Embedding Models)` + * - :abbr:`pooling (Pooling Models)` - ✅ - ✅ - ✅ diff --git a/docs/source/usage/pooling_models.rst b/docs/source/usage/pooling_models.rst index a2554d1b0eada..01b4e5fa5e353 100644 --- a/docs/source/usage/pooling_models.rst +++ b/docs/source/usage/pooling_models.rst @@ -3,7 +3,7 @@ Using Pooling Models ==================== -vLLM provides second-class support for pooling models, including embedding, reranking and reward models. +vLLM also supports pooling models, including embedding, reranking and reward models. In vLLM, pooling models implement the :class:`~vllm.model_executor.models.VllmModelForPooling` interface. These models use a :class:`~vllm.model_executor.layers.Pooler` to aggregate the final hidden states of the input @@ -11,7 +11,11 @@ before returning them. Technically, any :ref:`generative model ` in vLLM can be converted into a pooling model by aggregating and returning the hidden states directly, skipping the generation step. -Nevertheless, you should use those that are specifically trained as pooling models. +Nevertheless, to get the best results, you should use pooling models that are specifically trained as such. + +We currently support pooling models primarily as a matter of convenience. +As shown in the :code:`Compatibility Matrix `, most vLLM features are not applicable to +pooling models as they only work on the generation or decode stage, so performance may not improve as much. Offline Inference -----------------