Run-time structured generation benchmarks #549

lapp0 · 2024-01-17T15:40:01Z

Initialization benchmarks are introduced in #542

We should extend these benchmarks to measure the performance of inference.

Goal

Outlines shouldn't be a bottleneck for most inference. A reasonable goal can be set based on

1,200 output tok/s on H100 w/ Llama 2 70B on Optimum-Nvidia

Benchmarks will help us achieve and maintain that goal.

What must be benchmarked

End-to-end CFG ~~and Regex~~ guided generation
Guided generation with Ray IPC logits processing with vLLM (related VLLM tensor-parallel and RegexLogitsProcessor #524)

Proposed method

It's annoying to need a GPU to run tests. We shouldn't do actual inference in performance benchmarks.

1. Create a mock inference engine
1. Simple benchmark to ensure unguided mock inference engine takes infinitesimally small time
1. Guided benchmarks that show true throughput of outlines

rlouf · 2024-01-18T22:56:44Z

At this point I think that it would only make sense to benchmark the CFG-guided generation. Regex-guided generation is only a dictionary call at each step, so there really isn't anything we could do that would move the needle.

lapp0 added the enhancement label Jan 17, 2024

lapp0 mentioned this issue Jan 18, 2024

Fix incompleteness of regex and cfg guided generation #544

Merged

lapp0 mentioned this issue Jan 27, 2024

Add More Grammars to outlines.grammars, Benchmark and Verify Their Integrity #587

Closed

15 tasks

brandonwillard assigned kc611 May 17, 2024

brandonwillard changed the title ~~Performance Test Inference~~ Run-time structured generation benchmarks May 27, 2024

kc611 linked a pull request May 28, 2024 that will close this issue

Added benchmarks for larger regex fsm and runtime benchmarks for the same #925

Open

brandonwillard linked a pull request May 28, 2024 that will close this issue

Added benchmarks for larger regex fsm and runtime benchmarks for the same #925

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run-time structured generation benchmarks #549

Run-time structured generation benchmarks #549

lapp0 commented Jan 17, 2024 •

edited

Loading

rlouf commented Jan 18, 2024

Run-time structured generation benchmarks #549

Run-time structured generation benchmarks #549

Comments

lapp0 commented Jan 17, 2024 • edited Loading

Goal

What must be benchmarked

Proposed method

rlouf commented Jan 18, 2024

lapp0 commented Jan 17, 2024 •

edited

Loading