Reword

Signed-off-by: DarkLight1337 <[email protected]>
vllm-project · Nov 25, 2024 · a0b769c · a0b769c
1 parent 21df44f
commit a0b769c
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst
@@ -367,7 +367,7 @@ Text Embedding
   Unlike base Qwen2, :code:`Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention.
   You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly.
 
-  On the other hand, its 1.5B variant (:code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses decoder-only attention
+  On the other hand, its 1.5B variant (:code:`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
   despite being described otherwise on its model card.
 
 Reward Modeling

diff --git a/vllm/model_executor/models/qwen2.py b/vllm/model_executor/models/qwen2.py
@@ -216,7 +216,7 @@ def __init__(
         self.post_attention_layernorm = RMSNorm(config.hidden_size,
                                                 eps=config.rms_norm_eps)
 
-        # NOTE: By default, Qwen2 is a decoder-only model.
+        # By default, Qwen2 uses causal attention as it is a decoder-only model.
         # You can override the HF config with `is_causal=False` to enable
         # bidirectional attention, which is used in some embedding models
         # (e.g. Alibaba-NLP/gte-Qwen2-7B-instruct)