Why vllm does not support Chinese input #246

929359291 · 2023-06-26T02:21:54Z

There is a decode error in Chinese input，is token.decode

shifan3 · 2023-06-26T09:09:25Z

If you use chinese-alpaca/llama, remember their tokenizers are different with the original ones. However in vllm/engine/tokenizer_utils.py it force to use the original llama tokenizer hf-internal-testing/llama-tokenizer. this produce the error. They should allow you to pass a use_fast=False to avoid this behavior, but currently it's not possible.
Before they fix this issue, as a temporary workaround, you can simply replace the tokenizer by
llm.llm_engine.tokenizer = AutoTokenizer.from_pretrained('YOUR CHINESE ALPACA/LLAMA TOKENIZER', use_fast=False)

929359291 · 2023-06-27T02:17:56Z

If you use chinese-alpaca/llama, remember their tokenizers are different with the original ones. However in vllm/engine/tokenizer_utils.py it force to use the original llama tokenizer hf-internal-testing/llama-tokenizer. this produce the error. They should allow you to pass a use_fast=False to avoid this behavior, but currently it's not possible. Before they fix this issue, as a temporary workaround, you can simply replace the tokenizer by llm.llm_engine.tokenizer = AutoTokenizer.from_pretrained('YOUR CHINESE ALPACA/LLAMA TOKENIZER', use_fast=False)

thinks, i try again

luoyangen · 2023-06-27T08:09:18Z

If you use chinese-alpaca/llama, remember their tokenizers are different with the original ones. However in vllm/engine/tokenizer_utils.py it force to use the original llama tokenizer hf-internal-testing/llama-tokenizer. this produce the error. They should allow you to pass a use_fast=False to avoid this behavior, but currently it's not possible. Before they fix this issue, as a temporary workaround, you can simply replace the tokenizer by llm.llm_engine.tokenizer = AutoTokenizer.from_pretrained('YOUR CHINESE ALPACA/LLAMA TOKENIZER', use_fast=False)

thinks, i try again

Hi, Did you successed by passing use_fast=False?
I tried but get "RuntimeError: CUDA error: device-side assert triggered".

929359291 · 2023-06-27T09:20:10Z

If you use chinese-alpaca/llama, remember their tokenizers are different with the original ones. However in vllm/engine/tokenizer_utils.py it force to use the original llama tokenizer hf-internal-testing/llama-tokenizer. this produce the error. They should allow you to pass a use_fast=False to avoid this behavior, but currently it's not possible. Before they fix this issue, as a temporary workaround, you can simply replace the tokenizer by llm.llm_engine.tokenizer = AutoTokenizer.from_pretrained('YOUR CHINESE ALPACA/LLAMA TOKENIZER', use_fast=False)

thinks, i try again

Hi, Did you successed by passing use_fast=False? I tried but get "RuntimeError: CUDA error: device-side assert triggered".

hi boy, i not try, wait vLLM Development Roadmap #244

To repro: start server: `VLLM_SKIP_WARMUP=true python -m vllm.entrypoints.openai.api_server` send a request (this works fine): ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0}' ``` if request has a seed it fails: ``` curl -v http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "facebook/opt-125m","prompt": "The future of AI is ","max_tokens": 100,"temperature": 0, "seed" : 37}' ``` Failure happens here: [vllm-fork/vllm/model_executor/sampling_metadata.py at habana_main · HabanaAI/vllm-fork](https://github.com/HabanaAI/vllm-fork/blob/habana_main/vllm/model_executor/sampling_metadata.py#L220) ``` if sampling_params.seed is not None: seq_group_metadata.state.generator = torch.Generator( device=device).manual_seed(sampling_params.seed) ``` `RuntimeError: Device type HPU is not supported for torch.Generator() api.` This PR fixes above issue by using htrandom [Intel Gaudi PyTorch Python API (habana_frameworks.torch) — Gaudi Documentation 1.17.1 documentation](https://docs.habana.ai/en/latest/PyTorch/Reference/Python_Packages.html?highlight=htrandom#random-number-generator-apis)

* Fix kernel cache miss and add RDNA configs - added Navi configurations (Related PR: ROCm/triton#640) - resolved cache miss issue during flash attention calls by fixing max_seqlen_q/k to 0 * Remove Navi autotune configs for triton FP8 support

zhuohan123 mentioned this issue Jun 26, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

WoosukKwon mentioned this issue Jun 28, 2023

[Tokenizer] Add an option to specify tokenizer #284

Merged

WoosukKwon linked a pull request Jun 28, 2023 that will close this issue

[Tokenizer] Add an option to specify tokenizer #284

Merged

WoosukKwon closed this as completed in #284 Jun 28, 2023

warlock135 mentioned this issue Nov 26, 2024

[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend #6143

Merged

billishyahao pushed a commit to billishyahao/vllm that referenced this issue Dec 31, 2024

Fix regression from vllm-project#246 (vllm-project#332)

d09f1ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why vllm does not support Chinese input #246

Why vllm does not support Chinese input #246

929359291 commented Jun 26, 2023

shifan3 commented Jun 26, 2023 •

edited

Loading

929359291 commented Jun 27, 2023

luoyangen commented Jun 27, 2023

929359291 commented Jun 27, 2023

Why vllm does not support Chinese input #246

Why vllm does not support Chinese input #246

Comments

929359291 commented Jun 26, 2023

shifan3 commented Jun 26, 2023 • edited Loading

929359291 commented Jun 27, 2023

luoyangen commented Jun 27, 2023

929359291 commented Jun 27, 2023

shifan3 commented Jun 26, 2023 •

edited

Loading