We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents e92dae6 + eb19434 commit 5a2acc9Copy full SHA for 5a2acc9
README.md
@@ -200,6 +200,7 @@ pip install triton
200
201
#### 模型量化
202
203
+**目前仅支持单卡部署量化模型**
204
在显存受限的场景下,调用量化版本的模型可以显著降低推理成本。我们使用[GPTQ](https://github.com/IST-DASLab/gptq)算法和[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)中推出的OpenAI [triton](https://github.com/openai/triton) backend实现量化推理:
205
206
~~~python
0 commit comments