Update

Signed-off-by: DarkLight1337 <[email protected]>
vllm-project · Dec 2, 2024 · 5d3a629 · 5d3a629
1 parent aef7899
commit 5d3a629
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 5 deletions.
diff --git a/docs/source/usage/compatibility_matrix.rst b/docs/source/usage/compatibility_matrix.rst
@@ -39,7 +39,7 @@ Feature x Feature
      - :abbr:`prmpt adptr (Prompt Adapter)`
      - :ref:`SD <spec_decode>`
      - CUDA graph
-     - :abbr:`emd (Embedding Models)`
+     - :abbr:`pooling (Pooling Models)`
      - :abbr:`enc-dec (Encoder-Decoder Models)`
      - :abbr:`logP (Logprobs)`
      - :abbr:`prmpt logP (Prompt Logprobs)`
@@ -151,7 +151,7 @@ Feature x Feature
      - 
      - 
      - 
-   * - :abbr:`emd (Embedding Models)`
+   * - :abbr:`pooling (Pooling Models)`
      - ✗
      - ✗
      - ✗ 
@@ -386,7 +386,7 @@ Feature x Hardware
      - ✅
      - ✗
      - ✅
-   * - :abbr:`emd (Embedding Models)`
+   * - :abbr:`pooling (Pooling Models)`
      - ✅
      - ✅
      - ✅

diff --git a/docs/source/usage/pooling_models.rst b/docs/source/usage/pooling_models.rst
@@ -3,15 +3,19 @@
 Using Pooling Models
 ====================
 
-vLLM provides second-class support for pooling models, including embedding, reranking and reward models.
+vLLM also supports pooling models, including embedding, reranking and reward models.
 
 In vLLM, pooling models implement the :class:`~vllm.model_executor.models.VllmModelForPooling` interface.
 These models use a :class:`~vllm.model_executor.layers.Pooler` to aggregate the final hidden states of the input
 before returning them.
 
 Technically, any :ref:`generative model <generative_models>` in vLLM can be converted into a pooling model
 by aggregating and returning the hidden states directly, skipping the generation step.
-Nevertheless, you should use those that are specifically trained as pooling models.
+Nevertheless, to get the best results, you should use pooling models that are specifically trained as such.
+
+We currently support pooling models primarily as a matter of convenience.
+As shown in the :code:`Compatibility Matrix <compatibility_matrix>`, most vLLM features are not applicable to
+pooling models as they only work on the generation or decode stage, so performance may not improve as much.
 
 Offline Inference
 -----------------