Various fixes

Signed-off-by: DarkLight1337 <[email protected]>
vllm-project · Dec 2, 2024 · aef7899 · aef7899
1 parent 291ae79
commit aef7899
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 9 deletions.
diff --git a/docs/source/usage/generative_models.rst b/docs/source/usage/generative_models.rst
@@ -15,14 +15,14 @@ Offline Inference
 The :class:`~vllm.LLM` class provides various methods for offline inference.
 See :ref:`Engine Arguments <engine_args>` for a list of options when initializing the model.
 
-For generative models, the only supported `task` option is `"generate"`.
+For generative models, the only supported :code:`task` option is :code:`"generate"`.
 Usually, the task is automatically inferred so you don't have to specify this.
 
 ``LLM.generate``
 ^^^^^^^^^^^^^^^^
 
 The :class:`~vllm.LLM.generate` method is available to all generative models in vLLM.
-It is similar to `transformers.GenerationMixin.generate <https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate>`__,
+It is similar to `its counterpart in HF Transformers <https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate>`__,
 except that tokenization and detokenization are also performed automatically.
 
 .. code-block:: python
@@ -72,7 +72,7 @@ For example, to search using 5 beams and output at most 16 tokens:
 ^^^^^^^^^^^^
 
 The :class:`~vllm.LLM.chat` method implements chat functionality on top of :class:`~vllm.LLM.generate`.
-In particular, it accepts input similar to `OpenAI Chat Completions API <https://platform.openai.com/docs/api-reference/chat>__
+In particular, it accepts input similar to `OpenAI Chat Completions API <https://platform.openai.com/docs/api-reference/chat>`__
 and automatically applies the model's `chat template <https://huggingface.co/docs/transformers/en/chat_templating>`__ to format the prompt.
 
 .. important::
@@ -126,7 +126,7 @@ you can explicitly pass a chat template:
 Online Inference
 ----------------
 
-Our `OpenAI Compatible Server <../serving/openai_compatible_server>` can be used for online inference.
+Our `OpenAI Compatible Server <../serving/openai_compatible_server>`__ can be used for online inference.
 Please click on the above link for more details on how to launch the server.
 
 Completions API

diff --git a/docs/source/usage/pooling_models.rst b/docs/source/usage/pooling_models.rst
@@ -19,14 +19,13 @@ Offline Inference
 The :class:`~vllm.LLM` class provides various methods for offline inference.
 See :ref:`Engine Arguments <engine_args>` for a list of options when initializing the model.
 
-For pooling models, we support the following `task` options:
+For pooling models, we support the following :code:`task` options:
 
 - Embedding (:code:`"embed"` / :code:`"embedding"`)
-- Classification (:code:`"classify"`/ :code:`"score"`)
-  - Reranking models fall under this category.
+- Classification (:code:`"classify"`/ :code:`"score"`) -- reranking models fall under this category.
 - Reward Modeling (:code:`"reward"`)
 
-The task type determines the default :class:`~vllm.model_executor.layers.Pooler` that is used:
+The selected task determines the default :class:`~vllm.model_executor.layers.Pooler` that is used:
 
 - Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
 - Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
@@ -75,7 +74,7 @@ You can use `these tests <https://github.com/vllm-project/vllm/blob/main/tests/m
 Online Inference
 ----------------
 
-Our `OpenAI Compatible Server <../serving/openai_compatible_server>` can be used for online inference.
+Our `OpenAI Compatible Server <../serving/openai_compatible_server>`__ can be used for online inference.
 Please click on the above link for more details on how to launch the server.
 
 Embeddings API