Skip to content

Commit

Permalink
Merge branch 'main' into tms/v1_tp
Browse files Browse the repository at this point in the history
  • Loading branch information
tlrmchlsmth committed Dec 9, 2024
2 parents edab869 + 25b79d9 commit e0aea84
Show file tree
Hide file tree
Showing 36 changed files with 791 additions and 540 deletions.
26 changes: 25 additions & 1 deletion docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -495,7 +495,7 @@ Text Generation
---------------

.. list-table::
:widths: 25 25 15 25 5 5
:widths: 25 25 15 20 5 5 5
:header-rows: 1

* - Architecture
Expand All @@ -504,144 +504,168 @@ Text Generation
- Example HF Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
- V1
* - :code:`AriaForConditionalGeneration`
- Aria
- T + I
- :code:`rhymes-ai/Aria`
-
- ✅︎
-
* - :code:`Blip2ForConditionalGeneration`
- BLIP-2
- T + I\ :sup:`E`
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
-
- ✅︎
-
* - :code:`ChameleonForConditionalGeneration`
- Chameleon
- T + I
- :code:`facebook/chameleon-7b` etc.
-
- ✅︎
-
* - :code:`FuyuForCausalLM`
- Fuyu
- T + I
- :code:`adept/fuyu-8b` etc.
-
- ✅︎
-
* - :code:`ChatGLMModel`
- GLM-4V
- T + I
- :code:`THUDM/glm-4v-9b` etc.
- ✅︎
- ✅︎
-
* - :code:`H2OVLChatModel`
- H2OVL
- T + I\ :sup:`E+`
- :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
-
- ✅︎
-
* - :code:`Idefics3ForConditionalGeneration`
- Idefics3
- T + I
- :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
- ✅︎
-
-
* - :code:`InternVLChatModel`
- InternVL 2.5, Mono-InternVL, InternVL 2.0
- T + I\ :sup:`E+`
- :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
-
- ✅︎
- ✅︎
* - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5
- T + I\ :sup:`E+`
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
-
- ✅︎
- ✅︎
* - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT
- T + I\ :sup:`E+`
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
-
- ✅︎
-
* - :code:`LlavaNextVideoForConditionalGeneration`
- LLaVA-NeXT-Video
- T + V
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
-
- ✅︎
-
* - :code:`LlavaOnevisionForConditionalGeneration`
- LLaVA-Onevision
- T + I\ :sup:`+` + V\ :sup:`+`
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
-
- ✅︎
-
* - :code:`MiniCPMV`
- MiniCPM-V
- T + I\ :sup:`E+`
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
- ✅︎
- ✅︎
-
* - :code:`MllamaForConditionalGeneration`
- Llama 3.2
- T + I\ :sup:`+`
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
-
-
-
* - :code:`MolmoForCausalLM`
- Molmo
- T + I
- :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
-
- ✅︎
- ✅︎
* - :code:`NVLM_D_Model`
- NVLM-D 1.0
- T + I\ :sup:`E+`
- :code:`nvidia/NVLM-D-72B`, etc.
-
- ✅︎
- ✅︎
* - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma
- T + I\ :sup:`E`
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
-
- ✅︎
-
* - :code:`Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision
- T + I\ :sup:`E+`
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
-
- ✅︎
- ✅︎
* - :code:`PixtralForConditionalGeneration`
- Pixtral
- T + I\ :sup:`+`
- :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
-
- ✅︎
- ✅︎
* - :code:`QWenLMHeadModel`
- Qwen-VL
- T + I\ :sup:`E+`
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
- ✅︎
- ✅︎
-
* - :code:`Qwen2AudioForConditionalGeneration`
- Qwen2-Audio
- T + A\ :sup:`+`
- :code:`Qwen/Qwen2-Audio-7B-Instruct`
-
- ✅︎
-
* - :code:`Qwen2VLForConditionalGeneration`
- Qwen2-VL
- T + I\ :sup:`E+` + V\ :sup:`E+`
- :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎
- ✅︎
-
* - :code:`UltravoxModel`
- Ultravox
- T + A\ :sup:`E+`
- :code:`fixie-ai/ultravox-v0_3`
-
- ✅︎
-

| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.
Expand Down
2 changes: 1 addition & 1 deletion requirements-common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ prometheus-fastapi-instrumentator >= 7.0.0
tiktoken >= 0.6.0 # Required for DBRX tokenizer
lm-format-enforcer >= 0.10.9, < 0.11
outlines >= 0.0.43, < 0.1
xgrammar >= 0.1.5; platform_machine == "x86_64"
xgrammar >= 0.1.6; platform_machine == "x86_64"
typing_extensions >= 4.10
filelock >= 3.16.1 # need to contain https://github.com/tox-dev/filelock/pull/317
partial-json-parser # used for parsing partial JSON outputs
Expand Down
8 changes: 4 additions & 4 deletions tests/engine/test_arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ def test_compilation_config():
args = parser.parse_args(["-O=3"])
assert args.compilation_config.level == 3

# set to json
args = parser.parse_args(["--compilation-config", '{"level": 3}'])
# set to string form of a dict
args = parser.parse_args(["--compilation-config", "{'level': 3}"])
assert args.compilation_config.level == 3

# set to json
args = parser.parse_args(['--compilation-config={"level": 3}'])
# set to string form of a dict
args = parser.parse_args(["--compilation-config={'level': 3}"])
assert args.compilation_config.level == 3


Expand Down
6 changes: 6 additions & 0 deletions vllm/compilation/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,13 @@ def configure_post_pass(self):

def __call__(self, graph: fx.GraphModule, example_inputs) -> Callable:

# when dynamo calls the backend, it means the bytecode
# transform and analysis are done
compilation_counter.num_graphs_seen += 1
from .monitor import torch_compile_start_time
dynamo_time = time.time() - torch_compile_start_time
logger.info("Dynamo bytecode transform time: %.2f s", dynamo_time)
self.compilation_configs.compilation_time += dynamo_time

# we control the compilation process, each instance can only be
# called once
Expand Down
6 changes: 3 additions & 3 deletions vllm/compilation/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ def _support_torch_compile(

def __init__(self, *, vllm_config: VllmConfig, prefix: str = '', **kwargs):
old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
self.vllm_config = vllm_config
# for CompilationLevel.DYNAMO_AS_IS , the upper level model runner
# will handle the compilation, so we don't need to do anything here.
self.do_not_compile = \
Expand All @@ -157,9 +158,6 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = '', **kwargs):
TorchCompileWrapperWithCustomDispatcher.__init__(
self, compilation_level=vllm_config.compilation_config.level)

if vllm_config.compilation_config.level == CompilationLevel.PIECEWISE:
start_monitoring_torch_compile(vllm_config.compilation_config)

cls.__init__ = __init__

def __call__(self, *args, **kwargs):
Expand All @@ -186,6 +184,8 @@ def __call__(self, *args, **kwargs):
raise ValueError(
"Unsupported dynamic dimensions"
f" {dims} for argument {k} with type {type(arg)}.")
# here, it is the starting point of the `torch.compile` process
start_monitoring_torch_compile(self.vllm_config.compilation_config)

# if we don't use custom dispatcher, we can directly call the
# compiled function and let torch.compile handle the dispatching,
Expand Down
9 changes: 7 additions & 2 deletions vllm/compilation/monitor.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
import time

from vllm.config import CompilationConfig, CompilationLevel
from vllm.logger import init_logger

logger = init_logger(__name__)

torch_compile_start_time: float = 0.0


def start_monitoring_torch_compile(compilation_config: CompilationConfig):
pass
global torch_compile_start_time
torch_compile_start_time = time.time()


def end_monitoring_torch_compile(compilation_config: CompilationConfig):
if compilation_config.level == CompilationLevel.PIECEWISE:
logger.info("graph compilation takes %.2f s in total",
logger.info("torch.compile takes %.2f s in total",
compilation_config.compilation_time)
Loading

0 comments on commit e0aea84

Please sign in to comment.