AutoAWQ量化模型部署指南

方法1（推荐）

1. 直接下载量化好的模型

通过Git克隆已量化的模型仓库：

git clone https://www.modelscope.cn/models/linglingdan/MiniCPM-V_2_6_awq_int4

2. 下载编译笔者fork的autoawq

安装AutoAWQ的一个分支，该分支已提交PR，等待官方合并：

git clone https://github.com/LDLINGLINGLING/AutoAWQ.git
cd AutoAWQ
pip install -e .

3. 以上模型可以直接使用vllm进行推理，

方法2（自行量化，对训练后模型进行量化推荐这种方法）

1. 下载非模型

Hugging Face

通过Git克隆模型仓库，并确保已安装git-lfs：

git clone https://huggingface.co/openbmb/MiniCPM-V-2_6

ModelScope

也可以通过ModelScope克隆模型仓库：

git clone https://www.modelscope.cn/models/openbmb/minicpm-v-2_6

2. 下载安装我的autoawq的分支

安装AutoAWQ的一个分支，该分支已提交PR，等待官方合并：

git clone https://github.com/LDLINGLINGLING/AutoAWQ.git
cd AutoAWQ
pip install -e .

3. 开始量化

修改量化脚本参数

修改AutoAWQ/examples/minicpmv2.6_quantize.py中的参数：

 parser.add_argument('--model-path', type=str, default="/root/ld/ld_model_pretrained/Minicpmv2_6",
                        help='Path to the model directory.')
parser.add_argument('--quant-path', type=str, default="/root/ld/ld_model_pretrained/Minicpmv2_6_awq_new",
                        help='Path to save the quantized model.')
# 修改以上模型地址和量化后保存地址

运行量化脚本

运行量化脚本(需要访问huggingface)：

cd  AutoAWQ/examples
python minicpmv2.6_quantize.py

量化完成后，在quant_path下将会得到您的AWQ量化模型。

显存占用

量化过程中显存占用仅为7.3GB。

以上模型可以直接使用vllm进行推理，

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awq.md

awq.md

AutoAWQ量化模型部署指南

方法1（推荐）

1. 直接下载量化好的模型

2. 下载编译笔者fork的autoawq

3. 以上模型可以直接使用vllm进行推理，

方法2（自行量化，对训练后模型进行量化推荐这种方法）

1. 下载非模型

Hugging Face

ModelScope

2. 下载安装我的autoawq的分支

3. 开始量化

修改量化脚本参数

运行量化脚本

显存占用

Files

awq.md

Latest commit

History

awq.md

File metadata and controls

AutoAWQ量化模型部署指南

方法1（推荐）

1. 直接下载量化好的模型

2. 下载编译笔者fork的autoawq

3. 以上模型可以直接使用vllm进行推理，

方法2（自行量化，对训练后模型进行量化推荐这种方法）

1. 下载非模型

Hugging Face

ModelScope

2. 下载安装我的autoawq的分支

3. 开始量化

修改量化脚本参数

运行量化脚本

显存占用