Skip to content

Commit

Permalink
Various fixes
Browse files Browse the repository at this point in the history
Signed-off-by: DarkLight1337 <[email protected]>
  • Loading branch information
DarkLight1337 committed Dec 2, 2024
1 parent 291ae79 commit aef7899
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 9 deletions.
8 changes: 4 additions & 4 deletions docs/source/usage/generative_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ Offline Inference
The :class:`~vllm.LLM` class provides various methods for offline inference.
See :ref:`Engine Arguments <engine_args>` for a list of options when initializing the model.

For generative models, the only supported `task` option is `"generate"`.
For generative models, the only supported :code:`task` option is :code:`"generate"`.
Usually, the task is automatically inferred so you don't have to specify this.

``LLM.generate``
^^^^^^^^^^^^^^^^

The :class:`~vllm.LLM.generate` method is available to all generative models in vLLM.
It is similar to `transformers.GenerationMixin.generate <https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate>`__,
It is similar to `its counterpart in HF Transformers <https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate>`__,
except that tokenization and detokenization are also performed automatically.

.. code-block:: python
Expand Down Expand Up @@ -72,7 +72,7 @@ For example, to search using 5 beams and output at most 16 tokens:
^^^^^^^^^^^^

The :class:`~vllm.LLM.chat` method implements chat functionality on top of :class:`~vllm.LLM.generate`.
In particular, it accepts input similar to `OpenAI Chat Completions API <https://platform.openai.com/docs/api-reference/chat>__
In particular, it accepts input similar to `OpenAI Chat Completions API <https://platform.openai.com/docs/api-reference/chat>`__
and automatically applies the model's `chat template <https://huggingface.co/docs/transformers/en/chat_templating>`__ to format the prompt.

.. important::
Expand Down Expand Up @@ -126,7 +126,7 @@ you can explicitly pass a chat template:
Online Inference
----------------

Our `OpenAI Compatible Server <../serving/openai_compatible_server>` can be used for online inference.
Our `OpenAI Compatible Server <../serving/openai_compatible_server>`__ can be used for online inference.
Please click on the above link for more details on how to launch the server.

Completions API
Expand Down
9 changes: 4 additions & 5 deletions docs/source/usage/pooling_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,13 @@ Offline Inference
The :class:`~vllm.LLM` class provides various methods for offline inference.
See :ref:`Engine Arguments <engine_args>` for a list of options when initializing the model.

For pooling models, we support the following `task` options:
For pooling models, we support the following :code:`task` options:

- Embedding (:code:`"embed"` / :code:`"embedding"`)
- Classification (:code:`"classify"`/ :code:`"score"`)
- Reranking models fall under this category.
- Classification (:code:`"classify"`/ :code:`"score"`) -- reranking models fall under this category.
- Reward Modeling (:code:`"reward"`)

The task type determines the default :class:`~vllm.model_executor.layers.Pooler` that is used:
The selected task determines the default :class:`~vllm.model_executor.layers.Pooler` that is used:

- Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
- Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
Expand Down Expand Up @@ -75,7 +74,7 @@ You can use `these tests <https://github.com/vllm-project/vllm/blob/main/tests/m
Online Inference
----------------

Our `OpenAI Compatible Server <../serving/openai_compatible_server>` can be used for online inference.
Our `OpenAI Compatible Server <../serving/openai_compatible_server>`__ can be used for online inference.
Please click on the above link for more details on how to launch the server.

Embeddings API
Expand Down

0 comments on commit aef7899

Please sign in to comment.