Skip to content

Commit

Permalink
[Doc] Create a new "Usage" section (vllm-project#10827)
Browse files Browse the repository at this point in the history
Signed-off-by: DarkLight1337 <[email protected]>
  • Loading branch information
DarkLight1337 authored Dec 5, 2024
1 parent 8d370e9 commit aa39a8e
Show file tree
Hide file tree
Showing 25 changed files with 218 additions and 125 deletions.
5 changes: 1 addition & 4 deletions docs/source/design/multimodal/multimodal_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,14 @@ Multi-Modality

vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.

Multi-modal inputs can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
Multi-modal inputs can be passed alongside text and token prompts to :ref:`supported models <supported_mm_models>`
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptType`.

Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities
by following :ref:`this guide <adding_multimodal_plugin>`.

Looking to add your own multi-modal model? Please follow the instructions listed :ref:`here <enabling_multimodal_inputs>`.

..
TODO: Add usage of --limit-mm-per-prompt when multi-image input is officially supported
Guides
++++++

Expand Down
25 changes: 15 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,8 @@ Documentation
serving/deploying_with_nginx
serving/distributed_serving
serving/metrics
serving/env_vars
serving/usage_stats
serving/integrations
serving/tensorizer
serving/compatibility_matrix
serving/faq

.. toctree::
:maxdepth: 1
Expand All @@ -99,12 +95,21 @@ Documentation
models/supported_models
models/adding_model
models/enabling_multimodal_inputs
models/engine_args
models/lora
models/vlm
models/structured_outputs
models/spec_decode
models/performance

.. toctree::
:maxdepth: 1
:caption: Usage

usage/lora
usage/multimodal_inputs
usage/structured_outputs
usage/spec_decode
usage/compatibility_matrix
usage/performance
usage/faq
usage/engine_args
usage/env_vars
usage/usage_stats

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/enabling_multimodal_inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Enabling Multimodal Inputs
==========================

This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.
This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal inputs <multimodal_inputs>`.

.. seealso::
:ref:`adding_a_new_model`
Expand Down
19 changes: 17 additions & 2 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,8 @@ Sentence Pair Scoring
.. note::
These models are supported in both offline and online inference via Score API.

.. _supported_mm_models:

Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -489,8 +491,6 @@ On the other hand, modalities separated by :code:`/` are mutually exclusive.

- e.g.: :code:`T / I` means that the model supports text-only and image-only inputs, but not text-with-image inputs.

.. _supported_vlms:

Text Generation
---------------

Expand Down Expand Up @@ -646,6 +646,21 @@ Text Generation
| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.
.. important::
To enable multiple multi-modal items per text prompt, you have to set :code:`limit_mm_per_prompt` (offline inference)
or :code:`--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:

.. code-block:: python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
.. code-block:: bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
.. note::
vLLM currently only supports adding LoRA to the language backbone of multimodal models.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ We currently support the following OpenAI APIs:
- [Completions API](https://platform.openai.com/docs/api-reference/completions)
- *Note: `suffix` parameter is not supported.*
- [Chat Completions API](https://platform.openai.com/docs/api-reference/chat)
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Using VLMs](../models/vlm.rst).
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Multimodal Inputs](../usage/multimodal_inputs.rst).
- *Note: `image_url.detail` parameter is not supported.*
- We also support `audio_url` content type for audio files.
- Refer to [vllm.entrypoints.chat_utils](https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/chat_utils.py) for the exact schema.
Expand All @@ -41,7 +41,7 @@ We currently support the following OpenAI APIs:
- [Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
- Instead of `inputs`, you can pass in a list of `messages` (same schema as Chat Completions API),
which will be treated as a single prompt to the model according to its chat template.
- This enables multi-modal inputs to be passed to embedding models, see [Using VLMs](../models/vlm.rst).
- This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
- *Note: You should run `vllm serve` with `--task embedding` to ensure that the model is being run in embedding mode.*

## Score API for Cross Encoder Models
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 2 additions & 0 deletions docs/source/serving/faq.rst → docs/source/usage/faq.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _faq:

Frequently Asked Questions
===========================

Expand Down
4 changes: 2 additions & 2 deletions docs/source/models/lora.rst → docs/source/usage/lora.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _lora:

Using LoRA adapters
===================
LoRA Adapters
=============

This document shows you how to use `LoRA adapters <https://arxiv.org/abs/2106.09685>`_ with vLLM on top of a base model.

Expand Down
Loading

0 comments on commit aa39a8e

Please sign in to comment.