Skip to content

Commit eb19434

Browse files
authored
Update README.md
1 parent 953f76c commit eb19434

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ pip install triton
200200

201201
#### 模型量化
202202

203+
**目前仅支持单卡部署量化模型**
203204
在显存受限的场景下,调用量化版本的模型可以显著降低推理成本。我们使用[GPTQ](https://github.com/IST-DASLab/gptq)算法和[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)中推出的OpenAI [triton](https://github.com/openai/triton) backend实现量化推理:
204205

205206
~~~python

0 commit comments

Comments
 (0)