Transformer 4.46.1 compat #24

Qubitium · 2024-11-04T08:23:09Z

@HandH1998 Is there plan to bring the llama/qwen2.5 modeling code up-to-date with latest 4.46.1? Upon testing I find the modeling code is out of sync and QQQ will only run with fixed 4.38 transformers.

HandH1998 · 2024-11-08T07:43:22Z

It is a pity that I have no time to support this. But I think you can try to do it yourself as it is not so complex.

Qubitium · 2024-11-08T09:05:14Z

@HandH1998 Understood. Second question, will the vllm hqq kernel be maintained by you or someone associated with qqq or is that kernel also be left to the open source community as well?

HandH1998 · 2024-11-08T09:19:40Z

The vllm qqq kernel now is maintained by the vllm team. The open source community can also modify it for your use and only need to maintain the copyright statement and cite our paper.

Qubitium · 2024-11-09T11:13:27Z

@HandH1998 I will be doing some testing next week. If QQQ quantization quality is stable and inference is good, I will ask my team to integrate qqq into GPTQModel via QuantizeConfig.format=QQQ. Full citation will added including any files we checkpick over.

HandH1998 · 2024-11-11T03:45:45Z

That is great! If you have any question, chat with me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer 4.46.1 compat #24

Transformer 4.46.1 compat #24

Qubitium commented Nov 4, 2024

HandH1998 commented Nov 8, 2024

Qubitium commented Nov 8, 2024

HandH1998 commented Nov 8, 2024 •

edited

Loading

Qubitium commented Nov 9, 2024

HandH1998 commented Nov 11, 2024

Transformer 4.46.1 compat #24

Transformer 4.46.1 compat #24

Comments

Qubitium commented Nov 4, 2024

HandH1998 commented Nov 8, 2024

Qubitium commented Nov 8, 2024

HandH1998 commented Nov 8, 2024 • edited Loading

Qubitium commented Nov 9, 2024

HandH1998 commented Nov 11, 2024

HandH1998 commented Nov 8, 2024 •

edited

Loading