Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation requirements #89

Open
arthurv opened this issue Sep 13, 2024 · 4 comments
Open

Installation requirements #89

arthurv opened this issue Sep 13, 2024 · 4 comments

Comments

@arthurv
Copy link

arthurv commented Sep 13, 2024

Hi,

I tried to install ktransformers on a clean install of Linux Mint 22 (based on Ubuntu 24.04), and there are a few things that I had to add:

pip instal numpy
pip install cpufeature
pip install flash_attn
conda install -c conda-forge libstdcxx-ng

Please update the pip dependencies.

Are there any plans to increase the number of quants supported for Deepseek-Coder-V2-Instruct-0724?

@Azure-Tang
Copy link
Contributor

Okay, we will update these packages in our next release.

Regarding quantizations, we now support multiple quantization methods(Qx_k and IQ4_XS) . What else would you like?

@arthurv
Copy link
Author

arthurv commented Sep 15, 2024

I have a system with 192GB DRAM and 48GB VRAM (2x 3090). Would it be able to handle 128k context with the specs? Would it be able to handle Q5_K_M or Q6_K_M?

Also I can only set max_new_tokens in local_chat, not the ktransformers server, and I can't set the total context size anywhere.

@devprimed
Copy link

Having this issue as well, not being able to set --max_new_tokens in the container breaks downstream projects that require longer output lengths.

@Azure-Tang
Copy link
Contributor

I have a system with 192GB DRAM and 48GB VRAM (2x 3090). Would it be able to handle 128k context with the specs? Would it be able to handle Q5_K_M or Q6_K_M?

Also I can only set max_new_tokens in local_chat, not the ktransformers server, and I can't set the total context size anywhere.

Yes, we supported Q5_K_M and Q6_K_M.

About setting max_new_tokens in container, sorry for the inconvenience that this is not supported now. If you’re building from source, you can modify the max_new_tokens parameter in ktransformers/server/backend/args.py. We will include this update in the next Docker release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants