feat: support guided decoding for vllm async engine #2391

wxiwnd · 2024-10-03T06:17:44Z

Support Guided Decoding for vllm async engine
waiting for vllm release, a version bump is needed.

#1562
vllm-project/vllm#8252

qinxuye · 2024-10-11T05:01:49Z

Which version is required?

wxiwnd · 2024-10-11T05:16:24Z

Which version is required?

latest version after 0.6.2, waiting for vllm to release new version

qinxuye · 2024-10-17T09:37:46Z

vllm has release v0.6.3, is this PR ready to work?

wxiwnd · 2024-10-17T17:49:21Z

I will do the test.

…

________________________________ 寄件者: Xuye Qin ***@***.***> 寄件日期: 星期四, 10月 17, 2024 5:38:13 下午收件者: xorbitsai/inference ***@***.***> 副本: wxiwnd ***@***.***>; Author ***@***.***> 主旨: Re: [xorbitsai/inference] feat: support guided decoding for vllm async engine (PR #2391) vllm has release v0.6.3, is this PR ready to work? — Reply to this email directly, view it on GitHub<#2391 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJSDNXWKTHLIQE35VTQLT23Z36AQDAVCNFSM6AAAAABPJCVJQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGA2DOMZWGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

wxiwnd · 2024-10-22T09:57:53Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now
Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

qinxuye · 2024-10-22T14:31:50Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

wxiwnd · 2024-10-26T08:58:28Z

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

It now works properly even if vllm version < 0.6.3
All the guided encoding parameters will be ignored if vllm version is under 0.6.3

xinference/_compat.py

xinference/api/restful_api.py

Signed-off-by: wxiwnd <[email protected]>

wxiwnd · 2024-11-24T18:36:40Z

This feature has been tested on my machine and appears to be functioning properly. @qinxuye

XprobeBot added the feature label Oct 3, 2024

XprobeBot added this to the v0.15 milestone Oct 3, 2024

wxiwnd marked this pull request as draft October 3, 2024 06:18

wxiwnd marked this pull request as ready for review October 5, 2024 07:55

wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from 2968700 to cd0812a Compare October 15, 2024 10:35

wxiwnd force-pushed the feat/guided_generation branch 7 times, most recently from 4d9e044 to 852c86c Compare October 22, 2024 09:30

wxiwnd force-pushed the feat/guided_generation branch 3 times, most recently from 823887f to df849b1 Compare October 26, 2024 08:35

qinxuye reviewed Oct 30, 2024

View reviewed changes

xinference/_compat.py Outdated Show resolved Hide resolved

XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024

wxiwnd force-pushed the feat/guided_generation branch 4 times, most recently from b8025ea to eb816c1 Compare November 5, 2024 17:49

wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from d1d41bf to 9d13391 Compare November 5, 2024 18:00

qinxuye reviewed Nov 19, 2024

View reviewed changes

xinference/_compat.py Outdated Show resolved Hide resolved

xinference/api/restful_api.py Outdated Show resolved Hide resolved

wxiwnd added 3 commits November 22, 2024 15:39

feat: support guided decoding for vllm async engine

dbfcc20

Signed-off-by: wxiwnd <[email protected]>

feat: support response_format

b59fe78

Signed-off-by: wxiwnd <[email protected]>

change(restful-api): add extract_guided_params()

60e3e3e

Signed-off-by: wxiwnd <[email protected]>

wxiwnd force-pushed the feat/guided_generation branch from 9d13391 to 60e3e3e Compare November 22, 2024 07:47

wxiwnd marked this pull request as draft November 22, 2024 08:32

wxiwnd added 2 commits November 22, 2024 17:08

revert: revert to 9d13391

71e9f5a

Signed-off-by: wxiwnd <[email protected]>

refactor: use pydantic model

dbf5216

Signed-off-by: wxiwnd <[email protected]>

wxiwnd marked this pull request as ready for review November 24, 2024 18:33

wxiwnd requested a review from qinxuye November 24, 2024 18:36

XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support guided decoding for vllm async engine #2391

feat: support guided decoding for vllm async engine #2391

wxiwnd commented Oct 3, 2024 •

edited

Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading

wxiwnd commented Nov 24, 2024 •

edited

Loading

feat: support guided decoding for vllm async engine #2391

Are you sure you want to change the base?

feat: support guided decoding for vllm async engine #2391

Conversation

wxiwnd commented Oct 3, 2024 • edited Loading

qinxuye commented Oct 11, 2024

wxiwnd commented Oct 11, 2024 • edited Loading

qinxuye commented Oct 17, 2024

wxiwnd commented Oct 17, 2024 via email

wxiwnd commented Oct 22, 2024

qinxuye commented Oct 22, 2024 • edited Loading

wxiwnd commented Oct 26, 2024 • edited Loading

wxiwnd commented Nov 24, 2024 • edited Loading

wxiwnd commented Oct 3, 2024 •

edited

Loading

wxiwnd commented Oct 11, 2024 •

edited

Loading

qinxuye commented Oct 22, 2024 •

edited

Loading

wxiwnd commented Oct 26, 2024 •

edited

Loading

wxiwnd commented Nov 24, 2024 •

edited

Loading