Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: "--tokenizer-mode", "mistral" not compatible with openai API tool use tests #9059

Closed
1 task done
sydnash opened this issue Oct 4, 2024 · 9 comments · Fixed by #9951 or #10333
Closed
1 task done

[Bug]: "--tokenizer-mode", "mistral" not compatible with openai API tool use tests #9059

sydnash opened this issue Oct 4, 2024 · 9 comments · Fixed by #9951 or #10333
Labels
bug Something isn't working

Comments

@sydnash
Copy link
Contributor

sydnash commented Oct 4, 2024

Your current environment

The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.31

Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.13.0-44-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A40
GPU 1: NVIDIA A40
GPU 2: NVIDIA A40
GPU 3: NVIDIA A40
GPU 4: NVIDIA A40
GPU 5: NVIDIA A40
GPU 6: NVIDIA A40
GPU 7: NVIDIA A40

Nvidia driver version: 550.54.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              1
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302P 16-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1500.000
CPU max MHz:                     3000.0000
CPU min MHz:                     1500.0000
BogoMIPS:                        6000.00
Virtualization:                  AMD-V
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        8 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-15
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es

Versions of relevant libraries:
[pip3] mypy==1.11.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu11==11.11.3.6
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu11==11.8.87
[pip3] nvidia-cuda-nvrtc-cu11==11.8.89
[pip3] nvidia-cuda-runtime-cu11==11.8.89
[pip3] nvidia-cudnn-cu11==9.1.0.70
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu11==10.9.0.58
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu11==10.3.0.86
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu11==11.4.1.48
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu11==11.7.5.86
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu11==2.20.5
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu11==11.8.86
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.4.0+cu118
[pip3] torchaudio==2.4.0+cu118
[pip3] torchvision==0.19.0+cu118
[pip3] transformers==4.45.1
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.0.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu11        11.11.3.6                pypi_0    pypi
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu11    11.8.87                  pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu11    11.8.89                  pypi_0    pypi
[conda] nvidia-cuda-runtime-cu11  11.8.89                  pypi_0    pypi
[conda] nvidia-cudnn-cu11         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu11        10.3.0.86                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu11      11.4.1.48                pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu11      11.7.5.86                pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-nccl-cu11          2.20.5                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.68                  pypi_0    pypi
[conda] nvidia-nvtx-cu11          11.8.86                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] pyzmq                     26.2.0                   pypi_0    pypi
[conda] sentence-transformers     3.0.1                    pypi_0    pypi
[conda] torch                     2.4.0+cu118              pypi_0    pypi
[conda] torchaudio                2.4.0+cu118              pypi_0    pypi
[conda] torchvision               0.19.0+cu118             pypi_0    pypi
[conda] transformers              4.45.1                   pypi_0    pypi
[conda] transformers-stream-generator 0.0.5                    pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.1.dev2827+g106909c
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PIX	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	0-15	0		N/A
GPU1	PIX	 X 	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	0-15	0		N/A
GPU2	SYS	SYS	 X 	PIX	SYS	SYS	SYS	SYS	PHB	SYS	SYS	0-15	0		N/A
GPU3	SYS	SYS	PIX	 X 	SYS	SYS	SYS	SYS	PHB	SYS	SYS	0-15	0		N/A
GPU4	SYS	SYS	SYS	SYS	 X 	PIX	SYS	SYS	SYS	PHB	PHB	0-15	0		N/A
GPU5	SYS	SYS	SYS	SYS	PIX	 X 	SYS	SYS	SYS	PHB	PHB	0-15	0		N/A
GPU6	SYS	SYS	SYS	SYS	SYS	SYS	 X 	PIX	SYS	SYS	SYS	0-15	0		N/A
GPU7	SYS	SYS	SYS	SYS	SYS	SYS	PIX	 X 	SYS	SYS	SYS	0-15	0		N/A
NIC0	SYS	SYS	PHB	PHB	SYS	SYS	SYS	SYS	 X 	SYS	SYS				
NIC1	SYS	SYS	SYS	SYS	PHB	PHB	SYS	SYS	SYS	 X 	PIX				
NIC2	SYS	SYS	SYS	SYS	PHB	PHB	SYS	SYS	SYS	PIX	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2

Model Input Dumps

No response

🐛 Describe the bug

The tests in tests/tool_use for mistral FAILED while add "--tokenizer-mode", "mistral" at the start of vllm serve.

How to reproduce this bug:

  1. Add the CLI to test data in tests/tool_use/utils.py.
  2. Change the tests/tool_use/conftest.py to only test the mistral model.

tests/tool_use/conftest.py:

# for each server config, download the model and return the config
@pytest.fixture(scope="session", params=["mistral"])
def server_config(request):
    config = CONFIGS[request.param]
    # download model and tokenizer using transformers
    snapshot_download(config["model"])
    yield CONFIGS[request.param]

tests/tool_use/utils.py:

 "mistral": {
        "model":
        "mistralai/Mistral-7B-Instruct-v0.3",
        "arguments": [
            "--tool-call-parser", "mistral", "--chat-template",
            str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),
            "--ignore-patterns=\"consolidated.safetensors\"",
            "--tokenizer-mode", "mistral"
        ],
        "system_prompt":
        "You are a helpful assistant with access to tools. If a tool"
        " that you have would be helpful to answer a user query, "
        "call the tool. Otherwise, answer the user's query directly "
        "without calling a tool. DO NOT CALL A TOOL THAT IS IRRELEVANT "
        "to the user's question - just respond to it normally."
    }

run the tests in the tests directory:

pytest -v -s tool_use/

The tests will fail.

@patrickvonplaten

I believe this is not just a bug in MistralTokenizer, but rather an incompatibility between the OpenAI API implementation and MistralTokenizer's implementation.
Based on my current testing, there are the following four issues:

  1. In MistralTokenizer, there is no vocab attribute, and it should be modified to use the get_vocab method and do not using self.model_tokenizer.tokenizer. to overwrite self.model_tokenizer.
  File "/LocalRun/jun.dai/code/github/sydnash/vllm/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 48, in __init__
    self.bot_token_id = self.model_tokenizer.vocab[self.bot_token]
TypeError: 'method' object is not subscriptable

  1. The message parameter of apply_mistral_chat_template need to change from request.messages to conversation due to this error:
ERROR 09-29 17:36:44 serving_chat.py:153]   File "/LocalRun/jun.dai/conda/envs/vllm_env/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 147, in _validate_assistant_message
ERROR 09-29 17:36:44 serving_chat.py:153]     raise InvalidAssistantMessageException(
ERROR 09-29 17:36:44 serving_chat.py:153] mistral_common.exceptions.InvalidAssistantMessageException: Assistant message must have either content or tool_calls, but not both.
  1. The tool call id must change to fit ^[a-zA-Z0-9]{9}$ regex or modify the tool call id's verify due to this error:
ERROR 09-29 17:29:25 serving_chat.py:153]   File "/LocalRun/jun.dai/conda/envs/vllm_env/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 310, in _validate_tool_call
ERROR 09-29 17:29:25 serving_chat.py:153]     raise InvalidFunctionCallException(
ERROR 09-29 17:29:25 serving_chat.py:153] mistral_common.exceptions.InvalidFunctionCallException: Tool call id was chatcmpl-tool-03e6481b146e408e9523d9c956696295 but must be a-z, A-Z, 0-9, with a length of 9.
  1. The model cannot generate the correct tool call message even doing the above change because the chat_template cannot worked in the MistralTokenizer.
[{"name": "get_current_weather", "arguments": {"city": "Dallas", "state": "TX", "unit": "fahrenheit"}}, {"name": "get_current_weather", "arguments": {"city": "Orlando", "state": "FL", "unit": "fahrenheit"}}]

The output message has no [TOOL_CALLS], which is used to identify it as a tool call message.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sydnash sydnash added the bug Something isn't working label Oct 4, 2024
@patrickvonplaten
Copy link
Contributor

Hey @sydnash,

Did you try to use --tokenizer_format mistral for tool use as shown here: https://github.com/vllm-project/vllm/blob/main/examples/offline_chat_with_tools.py

Mistral's tokenizer cannot work with:

            "--tool-call-parser", "mistral", "--chat-template",
            str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),

and that's expected actually

@sydnash
Copy link
Contributor Author

sydnash commented Oct 8, 2024

Hey @sydnash,

Did you try to use --tokenizer_format mistral for tool use as shown here: https://github.com/vllm-project/vllm/blob/main/examples/offline_chat_with_tools.py

Mistral's tokenizer cannot work with:

            "--tool-call-parser", "mistral", "--chat-template",
            str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),

and that's expected actually

I modified it like this, but the problem still remains the same.

"arguments": [
            "--tool-call-parser", "mistral", "--chat-template",
            str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),
            "--ignore-patterns=\"consolidated.safetensors\"",
            "--tokenizer-mode", "mistral", "--load-format", "mistral", "--config-format", "mistral"
        ],

In the example here https://github.com/vllm-project/vllm/blob/main/examples/offline_chat_with_tools.py you use a random tool call id with nine character, but in the openai API, its tool call id is not a nine character ID consisting only of numbers and letters.

@patrickvonplaten
Copy link
Contributor

you use a random tool call id with nine character

yes that's required when using mistral_common

"arguments": [
"--tool-call-parser", "mistral", "--chat-template",
str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),
"--ignore-patterns="consolidated.safetensors"",
"--tokenizer-mode", "mistral", "--load-format", "mistral", "--config-format", "mistral"
],

As said above mistral_common doesn't work with chat templates nor a tool call parser => can you just do:

"arguments": [
            "--ignore-patterns=\"consolidated.safetensors\"",
            "--tokenizer-mode", "mistral", "--load-format", "mistral", "--config-format", "mistral"
        ],

jinja templates are not used in mistral_common

@sydnash
Copy link
Contributor Author

sydnash commented Oct 9, 2024

jinja templates are not used in mistral_common

Yes I know this.
This issue is total about that MistralTokenizer tokenizer is not compatible with the openai API implementation in vllm, we can not use this tokenizer while work with the tool call in openai API.

The Mistral model's tool call of OpenAI API currently works without requiring the MistralTokenizer, but I’m unsure if this will continue to be the case because of the warning appears at the start of vllm serve:

FutureWarning: It is strongly recommended to run Mistral models with --tokenizer_mode "mistral" to ensure correct encoding and decoding.

However, using --tokenizer_mode "mistral" will leads to the issues mentioned above, which could be confusing for users.
Additionally, we need to consider whether the current OpenAI API implementation should be compatible with MistralTokenizer.

@patrickvonplaten
Copy link
Contributor

Can you give me a command that shows what doesn't work?

@sydnash
Copy link
Contributor Author

sydnash commented Oct 10, 2024

Can you give me a command that shows what doesn't work?

How to reproduce this bug:

  1. Change the tests/tool_use/conftest.py to only test the mistral model.
  2. Add the CLI to test data in tests/tool_use/utils.py.

tests/tool_use/conftest.py:

# for each server config, download the model and return the config
@pytest.fixture(scope="session", params=["mistral"])
def server_config(request):
    config = CONFIGS[request.param]
    # download model and tokenizer using transformers
    snapshot_download(config["model"])
    yield CONFIGS[request.param]

tests/tool_use/utils.py:

 "mistral": {
        "model":
        "mistralai/Mistral-7B-Instruct-v0.3",
        "arguments": [
            "--tool-call-parser", "mistral", "--chat-template",
            str(VLLM_PATH / "examples/tool_chat_template_mistral.jinja"),
            "--ignore-patterns=\"consolidated.safetensors\"",
            "--tokenizer-mode", "mistral"
        ],
        "system_prompt":
        "You are a helpful assistant with access to tools. If a tool"
        " that you have would be helpful to answer a user query, "
        "call the tool. Otherwise, answer the user's query directly "
        "without calling a tool. DO NOT CALL A TOOL THAT IS IRRELEVANT "
        "to the user's question - just respond to it normally."
    }

run the tests in the tests directory:

pytest -v -s tool_use/

The tests will fail.

I do not know dose these message enough. If you do not want the tool parser and template, you can remove it from the step two.

@gcalmettes
Copy link
Contributor

The message parameter of apply_mistral_chat_template need to change from request.messages to conversation due to this error:

ERROR 09-29 17:36:44 serving_chat.py:153]   File "/LocalRun/jun.dai/conda/envs/vllm_env/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 147, in _validate_assistant_message
ERROR 09-29 17:36:44 serving_chat.py:153]     raise InvalidAssistantMessageException(
ERROR 09-29 17:36:44 serving_chat.py:153] mistral_common.exceptions.InvalidAssistantMessageException: Assistant message must have either content or tool_calls, but not both.

The problem seems to actually originates from a current bug in the Pydantic library where attributes declared as iterables are replaced in the instances by pydantic-core ValidatorIterator instance. It does not come from an incompatibility between the MistralTokenizer and the openai API implementation in vllm.

When the MistralTokenizer is used, no chat template is applied, and the request messages are directly sent to be processed by mistral-common. As a result, the tool_calls field, which is defined as Iterable in the Assitant message request object definition (both in the vllm CustomChatCompletionMessageParam or the official OpenAI ChatCompletionAssistantMessageParam) is not consumed and is sent as a pydantic ValidatorIterator :

{'role': 'assistant', 'content': None, 'tool_calls': ValidatorIterator(index=0, schema=Some(DefinitionRef(DefinitionRefValidator { definition: "typed-dict" })))}

The tool_calls is then rightfully processed as an empty list by mistral-common, and the validation check (here)fails as both side are evaluated to false:

AssistantMessage(role='assistant', content=None, tool_calls=[], prefix=False)

The bug is not seen when chat templates are used, since the tools_calls iterator is consumed in the template when looping over each tool_call.

The bug is known on Pydantic side, and this indeed particularly affects the tool_calls field for LLM-based workloads using the OpenAI client (see this issue for example).

I have opened #9951 to fix this part

@gcalmettes
Copy link
Contributor

@patrickvonplaten regarding the compatibility of the --tool-call-parser mistral argument when MistralTokenizer is used, It seems that right now we can't automatically parse the tool calls into an OpenAI ChatCompletionResponse because the [TOOL_CALLS] special token is stripped from the output when the convert_tokens_to_string function is called.

Not filtering the tool_call special token when tokens are filtered to be converted makes the mistral tool call parser correctly pick up the tool_calls output by the model since the mistral tool parser checks for the presence of this token to decide if tool calls are present and need to be parsed.

    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        if isinstance(self.tokenizer, Tekkenizer):
            tokens = [
                t for t in tokens
                if t is SpecialTokens.tool_calls
                or t not in self.tokenizer._all_special_tokens
            ]
         ...

Would this be an acceptable change to do ?

The implication in the offline_chat_with_tools.py example would be to need to remove the tool_call special token before parsing the tool call object, but at the same time the presence or not of this token in the output of the model is important to know if a tool call is present or not in the model response.

What do you think ?

@gcalmettes
Copy link
Contributor

gcalmettes commented Nov 2, 2024

@sydnash with the following changes:

  • fix of pydantic bug by this PR
  • output of the [TOOL_CALLS] token via proposition in comment
  • set of the tool call ids to length 9 in the fixture of the tests

the tests are passing:

tool_use/test_tool_calls.py::test_tool_call_and_choice PASSED
tool_use/test_tool_calls.py::test_tool_call_with_results PASSED
tool_use/test_parallel_tool_calls.py::test_parallel_tool_calls[mistral] PASSED
tool_use/test_parallel_tool_calls.py::test_parallel_tool_calls_with_results[mistral] PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants