Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support guided decoding for vllm async engine #2391

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wxiwnd
Copy link
Contributor

@wxiwnd wxiwnd commented Oct 3, 2024

Support Guided Decoding for vllm async engine
waiting for vllm release, a version bump is needed.

#1562
vllm-project/vllm#8252

@XprobeBot XprobeBot added this to the v0.15 milestone Oct 3, 2024
@wxiwnd wxiwnd marked this pull request as draft October 3, 2024 06:18
@wxiwnd wxiwnd marked this pull request as ready for review October 5, 2024 07:55
@qinxuye
Copy link
Contributor

qinxuye commented Oct 11, 2024

Which version is required?

@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 11, 2024

Which version is required?

latest version after 0.6.2, waiting for vllm to release new version

@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from 2968700 to cd0812a Compare October 15, 2024 10:35
@qinxuye
Copy link
Contributor

qinxuye commented Oct 17, 2024

vllm has release v0.6.3, is this PR ready to work?

@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 17, 2024 via email

@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 7 times, most recently from 4d9e044 to 852c86c Compare October 22, 2024 09:30
@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 22, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now
Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

@qinxuye
Copy link
Contributor

qinxuye commented Oct 22, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 3 times, most recently from 823887f to df849b1 Compare October 26, 2024 08:35
@wxiwnd
Copy link
Contributor Author

wxiwnd commented Oct 26, 2024

vllm has release v0.6.3, is this PR ready to work?

Works on my machine now Also This PR implemented response_format call, like:

curl --location --request POST 'http://ip:9997/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen1.5-32b-chat-int4",
    "messages": [
        {
            "role": "user",
            "content": "“give me a recipe in json format"
        }
    ],
    "temperature": 0,
    "max_tokens": 1000,
    "stream": true,
    "response_format": {"type": "json_object"}
}'

Can you confirm there is no exception if the vllm is an old version?

It now works properly even if vllm version < 0.6.3
All the guided encoding parameters will be ignored if vllm version is under 0.6.3

xinference/_compat.py Outdated Show resolved Hide resolved
@XprobeBot XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024
@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 4 times, most recently from b8025ea to eb816c1 Compare November 5, 2024 17:49
@wxiwnd wxiwnd force-pushed the feat/guided_generation branch 2 times, most recently from d1d41bf to 9d13391 Compare November 5, 2024 18:00
xinference/_compat.py Outdated Show resolved Hide resolved
xinference/api/restful_api.py Outdated Show resolved Hide resolved
@wxiwnd wxiwnd marked this pull request as draft November 22, 2024 08:32
@wxiwnd wxiwnd marked this pull request as ready for review November 24, 2024 18:33
@wxiwnd
Copy link
Contributor Author

wxiwnd commented Nov 24, 2024

This feature has been tested on my machine and appears to be functioning properly. @qinxuye

@wxiwnd wxiwnd requested a review from qinxuye November 24, 2024 18:36
@XprobeBot XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants