Does trtllm-serve support toolparser and guided-decoding? Any plan? #2624

dwq370 · 2024-12-25T01:48:34Z

No description provided.

dwq370 · 2024-12-26T09:31:34Z

i lauch trtllm-serve with TensorRT-LLM v0.16.0 container

trtllm-serve /models/hf_models/Qwen2.5-7B-Instruct --tokenizer /models/hf_models/Qwen2.5-7B-Instruct --max_batch_size 128 --max_num_tokens 32768 --max_seq_len 32768 --kv_cache_free_gpu_memory_fraction 0.95 --host 0.0.0.0 --port 3000

then i send a request like this:

{
    "model": "Qwen2.5-7B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "描述一下熊猫"
        }
    ],
    "max_tokens": 800,
    "temperature": 0.1,
    "stream": false,
    "guided_json": {
        "properties": {
            "name": {
                "title": "名字",
                "type": "string"
            },
            "description": {
                "title": "描述",
                "type": "string"
            },
            "type": {
                "title": "Type",
                "type": "string"
            }
        },
        "required": [
            "name",
            "description",
            "type"
        ],
        "title": "Cate",
        "type": "object"
    }
}

and get an error response like:

{
    "object": "error",
    "message": "Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Request is specified with GuidedDecodingParams, but GuidedDecoder is not setup. Please provide a valid GuidedDecodingConfig to setup GuidedDecoder. (/home/jenkins/agent/workspace/LLM/helpers/Build-x86_64/llm/cpp/tensorrt_llm/executor/executorImpl.cpp:1512)\n1       0x73e963fba95f tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 95\n2       0x73e96402a106 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x8cd106) [0x73e96402a106]\n3       0x73e96650fefb tensorrt_llm::executor::Executor::Impl::executionLoop() + 811\n4       0x73e9216dd970 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7970) [0x73e9216dd970]\n5       0x73ec34b4fa94 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9ca94) [0x73ec34b4fa94]\n6       0x73ec34bdca34 __clone + 68",
    "type": "BadRequestError",
    "param": null,
    "code": 400
}

How to use the guided_json in TesnorRT-LLM? Any plan to support guided decoding?

nv-guomingz · 2025-01-06T14:41:45Z

@LinPoly would u please take a look this question?

dwq370 mentioned this issue Dec 27, 2024

TensorRT-LLM v0.16 Release #2611

Merged

nv-guomingz assigned LinPoly Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does trtllm-serve support toolparser and guided-decoding? Any plan? #2624

Does trtllm-serve support toolparser and guided-decoding? Any plan? #2624

dwq370 commented Dec 25, 2024

dwq370 commented Dec 26, 2024 •

edited

Loading

nv-guomingz commented Jan 6, 2025

Does trtllm-serve support toolparser and guided-decoding? Any plan? #2624

Does trtllm-serve support toolparser and guided-decoding? Any plan? #2624

Comments

dwq370 commented Dec 25, 2024

dwq370 commented Dec 26, 2024 • edited Loading

nv-guomingz commented Jan 6, 2025

dwq370 commented Dec 26, 2024 •

edited

Loading