Get Started

LMDeploy offers functionalities such as model quantization, offline batch inference, online serving, etc. Each function can be completed with just a few simple lines of code or commands.

Installation

Install lmdeploy with pip (python 3.8+) or from source

pip install lmdeploy

The default prebuilt package is compiled on CUDA 11.8. However, if CUDA 12+ is required, you can install lmdeploy by:

export LMDEPLOY_VERSION=0.2.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl

Offline batch inference

import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

For more information on inference pipeline parameters, please refer to here.

Serving

LMDeploy's api_server enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:

lmdeploy serve api_server internlm/internlm-chat-7b

The default port of api_server is 23333. After the server is launched, you can communicate with server on terminal through api_client:

lmdeploy serve api_client http://0.0.0.0:23333

You can overview and try out api_server APIs online by swagger UI at http://0.0.0.0:23333, or you can read the API specification from here.

Quantization

LMDeploy provides the following quantization methods. Please visit the following links for the detailed guide

4bit weight-only quantization
k/v quantization
w8a8 quantization

Useful Tools

LMDeploy CLI offers the following utilities, helping users experience LLM features conveniently

Inference with Command line Interface

lmdeploy chat turbomind internlm/internlm-chat-7b

Serving with Web UI

LMDeploy adopts gradio to develop the online demo.

# install dependencies
pip install lmdeploy[serve]
# launch gradio server
lmdeploy serve gradio internlm/internlm-chat-7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_started.md

get_started.md

Get Started

Installation

Offline batch inference

Serving

Quantization

Useful Tools

Inference with Command line Interface

Serving with Web UI

Files

get_started.md

Latest commit

History

get_started.md

File metadata and controls

Get Started

Installation

Offline batch inference

Serving

Quantization

Useful Tools

Inference with Command line Interface

Serving with Web UI