Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer 4.46.1 compat #24

Open
Qubitium opened this issue Nov 4, 2024 · 5 comments
Open

Transformer 4.46.1 compat #24

Qubitium opened this issue Nov 4, 2024 · 5 comments

Comments

@Qubitium
Copy link

Qubitium commented Nov 4, 2024

@HandH1998 Is there plan to bring the llama/qwen2.5 modeling code up-to-date with latest 4.46.1? Upon testing I find the modeling code is out of sync and QQQ will only run with fixed 4.38 transformers.

@HandH1998
Copy link
Owner

It is a pity that I have no time to support this. But I think you can try to do it yourself as it is not so complex.

@Qubitium
Copy link
Author

Qubitium commented Nov 8, 2024

@HandH1998 Understood. Second question, will the vllm hqq kernel be maintained by you or someone associated with qqq or is that kernel also be left to the open source community as well?

@HandH1998
Copy link
Owner

HandH1998 commented Nov 8, 2024

The vllm qqq kernel now is maintained by the vllm team. The open source community can also modify it for your use and only need to maintain the copyright statement and cite our paper.

@Qubitium
Copy link
Author

Qubitium commented Nov 9, 2024

@HandH1998 I will be doing some testing next week. If QQQ quantization quality is stable and inference is good, I will ask my team to integrate qqq into GPTQModel via QuantizeConfig.format=QQQ. Full citation will added including any files we checkpick over.

@HandH1998
Copy link
Owner

That is great! If you have any question, chat with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants