Skip to content

Commit f78fa1c

Browse files
reidliu41reidliu41
authored andcommitted
[doc] add CLI doc (vllm-project#18871)
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> Signed-off-by: amit <[email protected]>
1 parent d1785df commit f78fa1c

File tree

2 files changed

+182
-0
lines changed

2 files changed

+182
-0
lines changed

docs/.nav.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ nav:
1212
- User Guide: usage/README.md
1313
- Developer Guide: contributing/README.md
1414
- API Reference: api/README.md
15+
- CLI Reference: cli/README.md
1516
- Timeline:
1617
- Roadmap: https://roadmap.vllm.ai
1718
- Releases: https://github.com/vllm-project/vllm/releases
@@ -56,6 +57,8 @@ nav:
5657
- Contents:
5758
- glob: api/vllm/*
5859
preserve_directory_names: true
60+
- CLI Reference:
61+
- Summary: cli/README.md
5962
- Community:
6063
- community/*
6164
- Blog: https://blog.vllm.ai

docs/cli/README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# vLLM CLI Guide
2+
3+
The vllm command-line tool is used to run and manage vLLM models. You can start by viewing the help message with:
4+
5+
```
6+
vllm --help
7+
```
8+
9+
Available Commands:
10+
11+
```
12+
vllm {chat,complete,serve,bench,collect-env,run-batch}
13+
```
14+
15+
## Table of Contents
16+
17+
- [serve](#serve)
18+
- [chat](#chat)
19+
- [complete](#complete)
20+
- [bench](#bench)
21+
- [latency](#latency)
22+
- [serve](#serve-1)
23+
- [throughput](#throughput)
24+
- [collect-env](#collect-env)
25+
- [run-batch](#run-batch)
26+
- [More Help](#more-help)
27+
28+
## serve
29+
30+
Start the vLLM OpenAI Compatible API server.
31+
32+
Examples:
33+
34+
```bash
35+
# Start with a model
36+
vllm serve meta-llama/Llama-2-7b-hf
37+
38+
# Specify the port
39+
vllm serve meta-llama/Llama-2-7b-hf --port 8100
40+
41+
# Check with --help for more options
42+
# To list all groups
43+
vllm serve --help=listgroup
44+
45+
# To view a argument group
46+
vllm serve --help=ModelConfig
47+
48+
# To view a single argument
49+
vllm serve --help=max-num-seqs
50+
51+
# To search by keyword
52+
vllm serve --help=max
53+
```
54+
55+
## chat
56+
57+
Generate chat completions via the running API server.
58+
59+
Examples:
60+
61+
```bash
62+
# Directly connect to localhost API without arguments
63+
vllm chat
64+
65+
# Specify API url
66+
vllm chat --url http://{vllm-serve-host}:{vllm-serve-port}/v1
67+
68+
# Quick chat with a single prompt
69+
vllm chat --quick "hi"
70+
```
71+
72+
## complete
73+
74+
Generate text completions based on the given prompt via the running API server.
75+
76+
Examples:
77+
78+
```bash
79+
# Directly connect to localhost API without arguments
80+
vllm complete
81+
82+
# Specify API url
83+
vllm complete --url http://{vllm-serve-host}:{vllm-serve-port}/v1
84+
85+
# Quick complete with a single prompt
86+
vllm complete --quick "The future of AI is"
87+
```
88+
89+
## bench
90+
91+
Run benchmark tests for latency online serving throughput and offline inference throughput.
92+
93+
Available Commands:
94+
95+
```bash
96+
vllm bench {latency, serve, throughput}
97+
```
98+
99+
### latency
100+
101+
Benchmark the latency of a single batch of requests.
102+
103+
Example:
104+
105+
```bash
106+
vllm bench latency \
107+
--model meta-llama/Llama-3.2-1B-Instruct \
108+
--input-len 32 \
109+
--output-len 1 \
110+
--enforce-eager \
111+
--load-format dummy
112+
```
113+
114+
### serve
115+
116+
Benchmark the online serving throughput.
117+
118+
Example:
119+
120+
```bash
121+
vllm bench serve \
122+
--model meta-llama/Llama-3.2-1B-Instruct \
123+
--host server-host \
124+
--port server-port \
125+
--random-input-len 32 \
126+
--random-output-len 4 \
127+
--num-prompts 5
128+
```
129+
130+
### throughput
131+
132+
Benchmark offline inference throughput.
133+
134+
Example:
135+
136+
```bash
137+
vllm bench throughput \
138+
--model meta-llama/Llama-3.2-1B-Instruct \
139+
--input-len 32 \
140+
--output-len 1 \
141+
--enforce-eager \
142+
--load-format dummy
143+
```
144+
145+
## collect-env
146+
147+
Start collecting environment information.
148+
149+
```bash
150+
vllm collect-env
151+
```
152+
153+
## run-batch
154+
155+
Run batch prompts and write results to file.
156+
157+
Examples:
158+
159+
```bash
160+
# Running with a local file
161+
vllm run-batch \
162+
-i offline_inference/openai_batch/openai_example_batch.jsonl \
163+
-o results.jsonl \
164+
--model meta-llama/Meta-Llama-3-8B-Instruct
165+
166+
# Using remote file
167+
vllm run-batch \
168+
-i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl \
169+
-o results.jsonl \
170+
--model meta-llama/Meta-Llama-3-8B-Instruct
171+
```
172+
173+
## More Help
174+
175+
For detailed options of any subcommand, use:
176+
177+
```bash
178+
vllm <subcommand> --help
179+
```

0 commit comments

Comments
 (0)