Skip to content

Commit a6685d6

Browse files
committed
feat(quickstart): scaffold 10-minute demo workflow
Signed-off-by: samzong <[email protected]>
1 parent 8165a9c commit a6685d6

File tree

8 files changed

+735
-0
lines changed

8 files changed

+735
-0
lines changed

examples/quickstart/README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Semantic Router Quickstart
2+
3+
This quickstart walks through the minimal set of commands needed to prove that
4+
the semantic router can classify incoming chat requests, route them through
5+
Envoy, and receive OpenAI-compatible completions. The flow is optimized for
6+
local laptops and uses a lightweight mock backend by default, so the entire
7+
loop finishes in a few minutes.
8+
9+
## Prerequisites
10+
11+
- Python environment with the project’s dependencies and virtualenv activated.
12+
- `make`, `curl`, `go`, `cargo`, `rustc`, and `python3` in `PATH`.
13+
- All commands below are run from the repository root.
14+
15+
## Step-by-Step Runbook
16+
17+
0. **Download router support models**
18+
19+
These assets (ModernBERT classifiers, LoRA adapters, embeddings, etc.) are
20+
required before the router can start.
21+
22+
```bash
23+
make download-models
24+
```
25+
26+
1. **Start the OpenAI-compatible backend**
27+
28+
The router expects at least one endpoint that serves `/v1/chat/completions`.
29+
You can point to a real vLLM deployment, but the fastest option is the
30+
bundled mock server:
31+
32+
```bash
33+
pip install -r tools/mock-vllm/requirements.txt
34+
python -m uvicorn tools.mock_vllm.app:app --host 0.0.0.0 --port 8000
35+
```
36+
37+
Leave this process running; it provides instant canned responses for
38+
`openai/gpt-oss-20b`.
39+
40+
2. **Launch Envoy**
41+
42+
In a separate terminal, bring up the Envoy sidecar that listens on
43+
`http://127.0.0.1:8801/v1/*` and forwards traffic to the router’s gRPC
44+
ExtProc server.
45+
46+
```bash
47+
make run-envoy
48+
```
49+
50+
3. **Start the router with the quickstart config**
51+
52+
In another terminal, run the quickstart bootstrap. Point the health probe at
53+
the router’s local HTTP API (port 8080) so the script does not wait on the
54+
Envoy endpoint.
55+
56+
```bash
57+
QUICKSTART_HEALTH_URL=http://127.0.0.1:8080/health \
58+
./examples/quickstart/quickstart.sh --skip-download --skip-build
59+
```
60+
61+
Keep this process alive; Ctrl+C will stop the router.
62+
63+
4. **Run the quick evaluation**
64+
65+
With Envoy, the router, and the mock backend running, execute the benchmark
66+
to send a small batch of MMLU questions through the routing pipeline.
67+
68+
```bash
69+
OPENAI_API_KEY="sk-test" \
70+
./examples/quickstart/quick-eval.sh \
71+
--mode router \
72+
--samples 5 \
73+
--vllm-endpoint ""
74+
```
75+
76+
- `--mode router` restricts the run to router-transparent requests.
77+
- `--vllm-endpoint ""` disables direct vLLM comparisons.
78+
79+
5. **Inspect the results**
80+
81+
The evaluator writes all artifacts under
82+
`examples/quickstart/results/<timestamp>/`:
83+
84+
- `raw/` – individual JSON summaries per dataset/model combination.
85+
- `quickstart-summary.csv` – tabular metrics (accuracy, tokens, latency).
86+
- `quickstart-report.md` – Markdown report suitable for sharing.
87+
88+
You can re-run the evaluator with different flags (e.g., `--samples 10`,
89+
`--dataset arc`) and the outputs will land in fresh timestamped folders.
90+
91+
## Switching to a Real vLLM Backend
92+
93+
If you prefer to exercise a real language model:
94+
95+
1. Replace step 1 with a real vLLM launch (or any OpenAI-compatible server).
96+
2. Update `examples/quickstart/config-quickstart.yaml` so the `vllm_endpoints`
97+
block points to that service (IP, port, and model name).
98+
3. Re-run steps 2–4. No other changes to the quickstart scripts are needed.
99+
100+
Keep the mock server documented for quick demos; swap to full vLLM when you
101+
want latency/quality signals from the actual model.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Quickstart configuration tuned for a single-node developer setup.
2+
# Keeps routing options minimal while remaining compatible with the default assets
3+
# shipped by `make download-models`.
4+
5+
bert_model:
6+
model_id: sentence-transformers/all-MiniLM-L12-v2
7+
threshold: 0.6
8+
use_cpu: true
9+
10+
semantic_cache:
11+
enabled: false
12+
backend_type: "memory"
13+
14+
prompt_guard:
15+
enabled: false
16+
use_modernbert: true
17+
model_id: "models/jailbreak_classifier_modernbert-base_model"
18+
threshold: 0.7
19+
use_cpu: true
20+
jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"
21+
22+
classifier:
23+
category_model:
24+
model_id: "models/category_classifier_modernbert-base_model"
25+
threshold: 0.6
26+
use_cpu: true
27+
use_modernbert: true
28+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
29+
pii_model:
30+
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
31+
threshold: 0.7
32+
use_cpu: true
33+
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
34+
35+
vllm_endpoints:
36+
- name: "local-vllm"
37+
address: "127.0.0.1"
38+
port: 8000
39+
models:
40+
- "openai/gpt-oss-20b"
41+
weight: 1
42+
43+
model_config:
44+
"openai/gpt-oss-20b":
45+
preferred_endpoints: ["local-vllm"]
46+
reasoning_family: "gpt-oss"
47+
pii_policy:
48+
allow_by_default: true
49+
50+
categories:
51+
- name: general
52+
system_prompt: "You are a helpful and knowledgeable assistant. Provide concise, accurate answers."
53+
model_scores:
54+
- model: openai/gpt-oss-20b
55+
score: 0.7
56+
use_reasoning: false
57+
58+
- name: reasoning
59+
system_prompt: "You explain your reasoning with clear numbered steps before giving a final answer."
60+
model_scores:
61+
- model: openai/gpt-oss-20b
62+
score: 0.6
63+
use_reasoning: true
64+
65+
- name: safety
66+
system_prompt: "You prioritize safe completions and refuse harmful requests."
67+
model_scores:
68+
- model: openai/gpt-oss-20b
69+
score: 0.5
70+
use_reasoning: false
71+
72+
default_model: openai/gpt-oss-20b
73+
74+
reasoning_families:
75+
gpt-oss:
76+
type: "chat_template_kwargs"
77+
parameter: "thinking"
78+
79+
api:
80+
batch_classification:
81+
metrics:
82+
enabled: false
83+
84+
# Tool auto-selection is available but disabled for quickstart.
85+
tools:
86+
enabled: false
87+
top_k: 3
88+
similarity_threshold: 0.2
89+
tools_db_path: "config/tools_db.json"
90+
fallback_to_empty: true

0 commit comments

Comments
 (0)