-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep track of state in RegexLogitsProcessor
using input_ids
#628
Conversation
Testing the PR branch with my experiments. |
Request: {
"prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite down the first 10 prime numbers as a comma separated list, starting with 2.\n\n### Response:\n",
"n": 1,
"best_of": 1,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"repetition_penalty": 1.0,
"temperature": 0.0,
"top_p": 1.0,
"top_k": -1,
"min_p": 0.0,
"use_beam_search": false,
"length_penalty": 1.0,
"early_stopping": false,
"stop": [],
"stop_token_ids": [],
"include_stop_str_in_output": false,
"ignore_eos": false,
"max_tokens": 50,
"logprobs": null,
"prompt_logprobs": null,
"skip_special_tokens": true,
"spaces_between_special_tokens": true,
"regex": "\\d+(\\s*,\\s*\\d+)*\\s*"
} It should generate: It does not crash, just stops after a comma where it should not based on the regex. With outlines |
This request crashes the {
"prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n",
"n": 5,
"best_of": 5,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"repetition_penalty": 1.0,
"temperature": 1.0,
"top_p": 1.0,
"top_k": -1,
"min_p": 0.0,
"use_beam_search": false,
"length_penalty": 1.0,
"early_stopping": false,
"stop": [],
"stop_token_ids": [],
"include_stop_str_in_output": false,
"ignore_eos": false,
"max_tokens": 200,
"logprobs": null,
"prompt_logprobs": null,
"skip_special_tokens": true,
"spaces_between_special_tokens": true,
"schema": {
"properties": {
"kind": {
"title": "Kind",
"type": "string"
},
"color": {
"title": "Color",
"type": "string"
},
"count": {
"title": "Count",
"type": "integer"
},
"weight": {
"title": "Weight",
"type": "number"
},
"sweet": {
"title": "Sweet",
"type": "boolean"
}
},
"required": [
"kind",
"color",
"count",
"weight",
"sweet"
],
"title": "Fruit",
"type": "object"
}
} Server side traceback:
With outlines |
Summary:
Model: python -O -u -m outlines.serve.vllm \
--model=TheBloke/deepseek-coder-33B-instruct-AWQ \
--quantization=awq \
--dtype=float16 \
--host=0.0.0.0 \
--port=8000 \
--max-model-len=16384 \
--max-num-seqs=16 \
--tensor-parallel-size=2 \
--swap-space=8 \
--gpu-memory-utilization=0.95 \
--enforce-eager \
--disable-log-requests |
Thanks so much @viktor-ferenczi will try to resolve as part of this PR (aside from #605 which you already have a PR for and seems out of scope) Edit: Marking #605 as fixed in main post since it passes smoke test |
e8260a6
to
049c059
Compare
@viktor-ferenczi I pushed a small change, it appears to be working for your test requests (example output in detail section of original post). Could you confirm whether I missed something? |
Sure, I switch to your branch and start using it. Let's see whether it works as expected. |
Your branch works way better than There is an unrelated issue where the model does not stop generating content on reaching the regex's final state. But this issue is unrelated to your branch and happens the same way on |
_patched_apply_logits_processor
RegexLogitsProcessor
using input_ids
Thank you, this is going to solve a few problems for downstream libraries. I’m not sure how LoRAX integrated Outlines, but we should make sure this doesn’t break their code before releasing. |
As an aside, I was thinking it'd be wise to highlight who is using Outlines https://github.com/outlines-dev/outlines/network/dependents |
For
outlines/vllm
previously FSM-sequence correspondence was broken, resulting FSM state being mixed between sequences, corrupting output. To alleviate this, we have_patched_apply_logits_processor
which passes a stable sequence ID to the logits processor.In this PR we eliminate
_patched_apply_logits_processor
and cache FSM state based on the states input IDs.Continuation of #539 but much simpler because vllm upgrade fixed a lot of the issues being addressed there.
Related discussions:
RegexLogitsProcessor
is incorrect #624Fixes:
states_to_token_maps
#605Already fixed:
@viktor-ferenczi can you please confirm whether this branch fixes either #610 or #605
Smoke tests
basic parallel
passed
never ending regex
passed
python3 -m outlines.serve.serve --model="microsoft/phi-2"
sometimes ending early regex
passed
python3 -m outlines.serve.serve --model="microsoft/phi-2"
output
analysis:
Observations:
Viktor's regex
passed
python3 -m outlines.serve.serve --model="microsoft/phi-2"
output:
Viktors schema
passed
python3 -m outlines.serve.serve --model="microsoft/phi-2"
output: