-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF #11389
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF #11389
Conversation
Signed-off-by: mgoin <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: mgoin <[email protected]>
Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]> Signed-off-by: Bowen Wang <[email protected]>
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]>
This PR mostly ports vllm-project#11389 to the design introduced by #358 and makes the custom caching code a little bit more robust. Currently there are two problems with guided decode: - `mask[list(allowed_tokens)] = 0` is causing crashes due to allowed_tokens containing tensors. Pretty easy fix. - The value type of `self._fsm_state` was changed from `int` to union of `int` and `outlines.state.CFGState`, which may cause `self._cached_get_mask_tensor(state_id, scores.size(-1), scores.device)` to crash, as `outlines.state.CFGState` is not hashable. This PR changes the caching mechanism so that if function arguments are not hashable, their id is taken as key. This might cause some cache misses, but that's better than crashing, as it does right now. None of the above is problem on upstream, as this stems from code introduced in #358. I've also added guided decode tests to CI suite.
…t to GBNF (vllm-project#11389) Signed-off-by: mgoin <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
@afeldman-nm found this issue while trying the offline and online structured output examples like
python examples/offline_inference_structured_outputs.py
. We were failing to convert the sql lark grammar into gbnf format for xgrammar and had that check+conversion in the wrong place to fallback to another backend.It turns out that our outlines backend actually regressed on this example as well with the upgrade to 0.1.11 (it works fine if I use
outlines==0.0.46
). I made this PR to make xgrammar fallback to outlines in this case, and get outlines back to functional for it.