-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why vllm does not support Chinese input #246
Comments
If you use chinese-alpaca/llama, remember their tokenizers are different with the original ones. However in vllm/engine/tokenizer_utils.py it force to use the original llama tokenizer hf-internal-testing/llama-tokenizer. this produce the error. They should allow you to pass a use_fast=False to avoid this behavior, but currently it's not possible. |
thinks, i try again |
Hi, Did you successed by passing use_fast=False? |
hi boy, i not try, wait vLLM Development Roadmap #244 |
To repro: start server: `VLLM_SKIP_WARMUP=true python -m vllm.entrypoints.openai.api_server` send a request (this works fine): ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0}' ``` if request has a seed it fails: ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0, "seed" : 37}' ``` Failure happens here: [vllm-fork/vllm/model_executor/sampling_metadata.py at habana_main · HabanaAI/vllm-fork](https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/model_executor/sampling_metadata.py#L220) ``` if sampling_params.seed is not None: seq_group_metadata.state.generator = torch.Generator( device=device).manual_seed(sampling_params.seed) ``` `RuntimeError: Device type HPU is not supported for torch.Generator() api.` This PR fixes above issue by using htrandom [Intel Gaudi PyTorch Python API (habana_frameworks.torch) — Gaudi Documentation 1.17.1 documentation](https://docs.habana.ai/en/latest/PyTorch/Reference/Python_Packages.html?highlight=htrandom#random-number-generator-apis)
* Fix kernel cache miss and add RDNA configs - added Navi configurations (Related PR: ROCm/triton#640) - resolved cache miss issue during flash attention calls by fixing max_seqlen_q/k to 0 * Remove Navi autotune configs for triton FP8 support
There is a decode error in Chinese input,is token.decode
The text was updated successfully, but these errors were encountered: