Update

Signed-off-by: DarkLight1337 <[email protected]>
vllm-project · Dec 2, 2024 · 49fd24c · 49fd24c
1 parent aef7899
commit 49fd24c
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 4 deletions.
diff --git a/docs/source/usage/compatibility_matrix.rst b/docs/source/usage/compatibility_matrix.rst
@@ -39,7 +39,7 @@ Feature x Feature
      - :abbr:`prmpt adptr (Prompt Adapter)`
      - :ref:`SD <spec_decode>`
      - CUDA graph
-     - :abbr:`emd (Embedding Models)`
+     - :abbr:`pooling (Pooling Models)`
      - :abbr:`enc-dec (Encoder-Decoder Models)`
      - :abbr:`logP (Logprobs)`
      - :abbr:`prmpt logP (Prompt Logprobs)`
@@ -151,7 +151,7 @@ Feature x Feature
      - 
      - 
      - 
-   * - :abbr:`emd (Embedding Models)`
+   * - :abbr:`pooling (Pooling Models)`
      - ✗
      - ✗
      - ✗ 
@@ -386,7 +386,7 @@ Feature x Hardware
      - ✅
      - ✗
      - ✅
-   * - :abbr:`emd (Embedding Models)`
+   * - :abbr:`pooling (Pooling Models)`
      - ✅
      - ✅
      - ✅

diff --git a/docs/source/usage/pooling_models.rst b/docs/source/usage/pooling_models.rst
@@ -3,7 +3,7 @@
 Using Pooling Models
 ====================
 
-vLLM provides second-class support for pooling models, including embedding, reranking and reward models.
+vLLM also supports pooling models, including embedding, reranking and reward models.
 
 In vLLM, pooling models implement the :class:`~vllm.model_executor.models.VllmModelForPooling` interface.
 These models use a :class:`~vllm.model_executor.layers.Pooler` to aggregate the final hidden states of the input
@@ -13,6 +13,10 @@ Technically, any :ref:`generative model <generative_models>` in vLLM can be conv
 by aggregating and returning the hidden states directly, skipping the generation step.
 Nevertheless, you should use those that are specifically trained as pooling models.
 
+We currently support pooling models primarily as a matter of convenience.
+As shown in the :code:`Compatibility Matrix <compatibility_matrix>`, most of our vLLM's optimizations are not applicable to
+pooling models as they only work on the generation or decode stage.
+
 Offline Inference
 -----------------