[WIP] GPTQModel select quant linear with pack #2138

LRL-ModelCloud · 2024-12-25T04:46:20Z

No description provided.

Qubitium · 2024-12-25T05:55:17Z

This PR addresses the issue where GPTQModel is unable to auto-select the fastest quant linear (Marlin) due to missing information regarding if this model is loaded via pre-trained or from_quantized. Add pack=True requirement to hf_select_quant_linear (for pretrained quantization will solve this. Without the PR, user need to manually set backend="marlin" to use Marlin. This PR allows auto selection of Marlin, if compatible, in default auto mode.

LRL-ModelCloud added 2 commits December 25, 2024 12:07

select quant_linear with pack

c762c14

up GPTQMODEL_MINIMUM_VERSION

2cf0637

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] GPTQModel select quant linear with pack #2138

[WIP] GPTQModel select quant linear with pack #2138

LRL-ModelCloud commented Dec 25, 2024

Qubitium commented Dec 25, 2024

[WIP] GPTQModel select quant linear with pack #2138

Are you sure you want to change the base?

[WIP] GPTQModel select quant linear with pack #2138

Conversation

LRL-ModelCloud commented Dec 25, 2024

Qubitium commented Dec 25, 2024