Batch InfeRence Runtime
A simplified local-only release of the toolchain used by Ai2 to perform large-scale inference through LLMs and VLMs.
This image orchestrates inference jobs based on a user-provided YAML config file.
It leverages:
- ray for concurrency management
- vllm for inference backend
It consumes JSONL work files and outputs result files to a chosen destination.
cd <project_root>
python3 -m venv venv
source venv/bin/activate
pip install .[batch_inference,vllm]
Records you want to run inference over must be partitioned into one or more jsonl files in a flat directory. Each row should have the following structure:
{"chat_messages": [{"role": "user", "content": "asdf"}]}
Additional fields may be provided in each row (e.g. ids, metadata), and will be preserved in output.
Author a configuration file for your job, see example file here:
https://github.com/allenai/birr/blob/main/configs/inference/example.yaml
# in activated venv
python src/birr/batch_inference/runner.py --config-file <path_to_config_file>