Keep track of state in `RegexLogitsProcessor` using `input_ids` #628

lapp0 · 2024-02-09T19:41:15Z

For outlines/vllm previously FSM-sequence correspondence was broken, resulting FSM state being mixed between sequences, corrupting output. To alleviate this, we have _patched_apply_logits_processor which passes a stable sequence ID to the logits processor.

In this PR we eliminate _patched_apply_logits_processor and cache FSM state based on the states input IDs.

Continuation of #539 but much simpler because vllm upgrade fixed a lot of the issues being addressed there.

Related discussions:

Docstring of RegexLogitsProcessor is incorrect #624

Fixes:

Already fixed:

VLLM tensor-parallel and RegexLogitsProcessor #524 (this one can be closed, as it's was addressed previously by upgrading vllm)

@viktor-ferenczi can you please confirm whether this branch fixes either #610 or #605

Smoke tests

basic parallel

passed

import json
import vllm
from pydantic import BaseModel
from typing import List
import torch
import pandas as pd
from outlines.serve.vllm import JSONLogitsProcessor

class ConceptsList(BaseModel):
    concepts: List[str]

BASE_MODEL = "microsoft/phi-2"
llm = vllm.LLM(model=BASE_MODEL, tensor_parallel_size=1, dtype=torch.float16, max_model_len=2048)

logits_processor = JSONLogitsProcessor(ConceptsList, llm.llm_engine)

full_prompts = [
    f"Provide me a list of {i} strings with key 'concepts'"
    for i in range(20)
]

batch_results = llm.generate(
    full_prompts,
    sampling_params=vllm.SamplingParams(
        max_tokens=2048, logits_processors=[logits_processor]
    ),
)


for result in batch_results:
    for output in result.outputs:
            json.loads(output.text)

never ending regex

passed

python3 -m outlines.serve.serve --model="microsoft/phi-2"

curl http://127.0.0.1:8000/generate \
    -d '{
        "prompt": "Sequence of numbers and letters:",
        "regex": "([123]-[abc]-([def]-)?)*",
        "n": 7
}'
{"text":["Sequence of numbers and letters:1-a-1-b-1-c-1-a-","Sequence of numbers and letters:1-a-2-b-3-c-1-a-","Sequence of numbers and letters:1-a-2-b-3-c-d-1-","Sequence of numbers and letters:2-a-1-b-2-c-1-b-","Sequence of numbers and letters:2-b-3-c-d-2-b-3-","Sequence of numbers and letters:2-a-3-b-2-b-1-c-","Sequence of numbers and letters:2-a-3-b-d-2-a-3-"]}


# rules for the above to validate correct FSM-sequence correspondence:
# [123] always followed by [abc], [def] only ever preceded by [abc]

# 1-a-1-b-1-c-1-a-
# 1-a-2-b-3-c-1-a-
# 1-a-2-b-3-c-d-1-
# 2-a-1-b-2-c-1-b-
# 2-b-3-c-d-2-b-3-
# 2-a-3-b-2-b-1-c-
# 2-a-3-b-d-2-a-3-

sometimes ending early regex

passed

python3 -m outlines.serve.serve --model="microsoft/phi-2"

curl http://127.0.0.1:8000/generate \
    -d '{
        "prompt": "Sequence of numbers and letters:",
        "regex": "([123]-[abc]-([def]-)?){3}",
        "n": 16
}'

output

{"text":["Sequence of numbers and letters:1-a-2-b-3-c-d-","Sequence of numbers and letters:1-a-2-b-3-c-d-","Sequence of numbers and letters:1-a-2-b-3-c-d-","Sequence of numbers and letters:1-a-2-b-3-c-d-","Sequence of numbers and letters:1-a-2-b-3-c-d-","Sequence of numbers and letters:3-a-1-b-2-c-d-","Sequence of numbers and letters:2-a-1-b-3-c-d-","Sequence of numbers and letters:1-a-1-b-1-c-d-","Sequence of numbers and letters:2-a-3-b-d-1-c-e-","Sequence of numbers and letters:1-b-3-a-2-c-d-","Sequence of numbers and letters:3-a-d-1-b-e-2-c-","Sequence of numbers and letters:1-a-3-b-1-b-d-","Sequence of numbers and letters:3-a-f-2-b-d-1-c-","Sequence of numbers and letters:1-b-d-3-a-e-2-c-","Sequence of numbers and letters:3-c-1-b-d-1-a-e-","Sequence of numbers and letters:1-c-1-c-e-1-b-e-"]}

analysis:

1-a-2-b-3-c-d-
1-a-2-b-3-c-d-
1-a-2-b-3-c-d-
1-a-2-b-3-c-d-
1-a-2-b-3-c-d-
1-a-2-b-3-c-d-
3-a-1-b-2-c-d-
2-a-1-b-3-c-d-
1-a-1-b-1-c-d-
2-a-3-b-d-1-c-e-
1-b-3-a-2-c-d-
3-a-d-1-b-e-2-c-
1-a-3-b-1-b-d-
3-a-f-2-b-d-1-c-
1-b-d-3-a-e-2-c-
3-c-1-b-d-1-a-e-
1-c-1-c-e-1-b-e-

Observations:

All patterns are correct
Patterns don't "borrow" FSM state from one-another, they retain their own independent state
Some patterns produced more tokens than others successfully

Viktor's regex

passed

python3 -m outlines.serve.serve --model="microsoft/phi-2"

curl http://127.0.0.1:8000/generate \
    -d '{
  "prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite down the first 10 prime numbers as a comma separated list, starting with 2.\n\n### Response:\n",
  "n": 1,
  "best_of": 1,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.0,
  "temperature": 0.0,
  "top_p": 1.0,
  "top_k": -1,
  "min_p": 0.0,
  "use_beam_search": false,
  "length_penalty": 1.0,
  "early_stopping": false,
  "stop": [],
  "stop_token_ids": [],
  "include_stop_str_in_output": false,
  "ignore_eos": false,
  "max_tokens": 50,
  "logprobs": null,
  "prompt_logprobs": null,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "regex": "\\d+(\\s*,\\s*\\d+)*\\s*"
}'

output:

{"text":["You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite down the first 10 prime numbers as a comma separated list, starting with 2.\n\n### Response:\n2, 3, 5, 7, 11, 13, 17, 19, 23, 29\n"]}

Viktors schema

passed

python3 -m outlines.serve.serve --model="microsoft/phi-2"

curl http://127.0.0.1:8000/generate \
    -d '{
  "prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n",
  "n": 5,
  "best_of": 5,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.0,
  "temperature": 1.0,
  "top_p": 1.0,
  "top_k": -1,
  "min_p": 0.0,
  "use_beam_search": false,
  "length_penalty": 1.0,
  "early_stopping": false,
  "stop": [],
  "stop_token_ids": [],
  "include_stop_str_in_output": false,
  "ignore_eos": false,
  "max_tokens": 200,
  "logprobs": null,
  "prompt_logprobs": null,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "schema": {
    "properties": {
      "kind": {
        "title": "Kind",
        "type": "string"
      },
      "color": {
        "title": "Color",
        "type": "string"
      },
      "count": {
        "title": "Count",
        "type": "integer"
      },
      "weight": {
        "title": "Weight",
        "type": "number"
      },
      "sweet": {
        "title": "Sweet",
        "type": "boolean"
      }
    },
    "required": [
      "kind",
      "color",
      "count",
      "weight",
      "sweet"
    ],
    "title": "Fruit",
    "type": "object"
  }
}'

output:

{"text":["You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n{\n\"kind\": \"Apple\",\n\"color\": \"Red\",\n\"count\": 10,\n\"weight\": 0.2,\n\"sweet\": true\n}","You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n{\n    \"kind\": \"Apple\",\n    \"color\": \"Red\",\n    \"count\": 10,\n    \"weight\": 0.2,\n    \"sweet\": true\n}","You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n{\n  \"kind\": \"apple\",\n  \"color\": \"red\",\n  \"count\": 5,\n  \"weight\": 0.1,\n  \"sweet\": true\n}","You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n{\n\"kind\": \"Apple\",\n\"color\": \"Red\",\n\"count\": 10,\n\"weight\": 0.24,\n\"sweet\": true\n}","You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n{\n  \"kind\": \"Apple\",\n  \"color\": \"red\",\n  \"count\": 5,\n  \"weight\": 0.3,\n  \"sweet\": true\n}"]}

viktor-ferenczi · 2024-02-10T21:19:41Z

Testing the PR branch with my experiments.

viktor-ferenczi · 2024-02-10T23:11:21Z

Request:

{
  "prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite down the first 10 prime numbers as a comma separated list, starting with 2.\n\n### Response:\n",
  "n": 1,
  "best_of": 1,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.0,
  "temperature": 0.0,
  "top_p": 1.0,
  "top_k": -1,
  "min_p": 0.0,
  "use_beam_search": false,
  "length_penalty": 1.0,
  "early_stopping": false,
  "stop": [],
  "stop_token_ids": [],
  "include_stop_str_in_output": false,
  "ignore_eos": false,
  "max_tokens": 50,
  "logprobs": null,
  "prompt_logprobs": null,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "regex": "\\d+(\\s*,\\s*\\d+)*\\s*"
}

It should generate: 2,3,5,7,11,13,17,19,23,29
What it generates: 2,

It does not crash, just stops after a comma where it should not based on the regex.

With outlines main rev 9c74d7c8 this request works correctly and returns 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 plus a newline at the end, which matches the regex and does not end after a comma.

viktor-ferenczi · 2024-02-10T23:16:39Z

This request crashes the outlines.serve.vllm server with KeyError: -1, which appears to be a 100% reproduction of #610:

{
  "prompt": "You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n\nYou are a helpful AI assistant. You give concise answers. If you do not know something, then say so.\n### Instruction:\nWrite a JSON describing a random fruit. It must conform to the following JSON schema: {\"properties\": {\"kind\": {\"title\": \"Kind\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"type\": \"string\"}, \"count\": {\"title\": \"Count\", \"type\": \"integer\"}, \"weight\": {\"title\": \"Weight\", \"type\": \"number\"}, \"sweet\": {\"title\": \"Sweet\", \"type\": \"boolean\"}}, \"required\": [\"kind\", \"color\", \"count\", \"weight\", \"sweet\"], \"title\": \"Fruit\", \"type\": \"object\"}\n\n### Response:\n",
  "n": 5,
  "best_of": 5,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.0,
  "temperature": 1.0,
  "top_p": 1.0,
  "top_k": -1,
  "min_p": 0.0,
  "use_beam_search": false,
  "length_penalty": 1.0,
  "early_stopping": false,
  "stop": [],
  "stop_token_ids": [],
  "include_stop_str_in_output": false,
  "ignore_eos": false,
  "max_tokens": 200,
  "logprobs": null,
  "prompt_logprobs": null,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "schema": {
    "properties": {
      "kind": {
        "title": "Kind",
        "type": "string"
      },
      "color": {
        "title": "Color",
        "type": "string"
      },
      "count": {
        "title": "Count",
        "type": "integer"
      },
      "weight": {
        "title": "Weight",
        "type": "number"
      },
      "sweet": {
        "title": "Sweet",
        "type": "boolean"
      }
    },
    "required": [
      "kind",
      "color",
      "count",
      "weight",
      "sweet"
    ],
    "title": "Fruit",
    "type": "object"
  }
}

Server side traceback:

Traceback (most recent call last):
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
    task.result()
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 409, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 388, in engine_step
    request_outputs = await self.engine.step_async()
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 189, in step_async
    all_outputs = await self._run_workers_async(
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/worker/worker.py", line 213, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 542, in execute_model
    output = self.model.sample(
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 314, in sample
    next_tokens = self.sampler(self.lm_head.weight, hidden_states,
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 74, in forward
    logits = _apply_logits_processors(logits, sampling_metadata)
  File "/home/viktor/env/outlines/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 156, in _apply_logits_processors
    logits_row = logits_processor(token_ids, logits_row)
  File "/home/viktor/dep/outlines-contrib/outlines/serve/vllm.py", line 39, in __call__
    self.fsm_state[seq_id] = self.fsm.next_state(
  File "/home/viktor/dep/outlines-contrib/outlines/fsm/fsm.py", line 178, in next_state
    last_token_to_end_state = self.states_to_token_maps[state]
KeyError: -1

With outlines main rev 9c74d7c8 this request works correctly. Tested both single request and 16 requests in parallel.

viktor-ferenczi · 2024-02-11T00:08:05Z

@viktor-ferenczi can you please confirm whether this branch fixes either #610 or #605

Summary:

Crash with KeyError due to missing key in states_to_token_maps #605: Seems to be fixed, at least it has not happened with the kind of requests it did before.
Crash with KeyError: -1 on long regex constraint #610: Appears to be 100% reproducible, see above.
There seems to be an "early stopping" issue where an EOS is generated at regex states where it should not happen, see the first 10 primes example request above. It happens to other requests as well, but this is a reliable and simple example.

Model: TheBloke/deepseek-coder-33B-instruct-AWQ

python -O -u -m outlines.serve.vllm \
  --model=TheBloke/deepseek-coder-33B-instruct-AWQ \
  --quantization=awq \
  --dtype=float16 \
  --host=0.0.0.0 \
  --port=8000 \
  --max-model-len=16384 \
  --max-num-seqs=16 \
  --tensor-parallel-size=2 \
  --swap-space=8 \
  --gpu-memory-utilization=0.95 \
  --enforce-eager \
  --disable-log-requests

lapp0 · 2024-02-13T17:26:18Z

Thanks so much @viktor-ferenczi will try to resolve as part of this PR (aside from #605 which you already have a PR for and seems out of scope)

Edit: Marking #605 as fixed in main post since it passes smoke test

lapp0 · 2024-02-13T18:26:01Z

@viktor-ferenczi I pushed a small change, it appears to be working for your test requests (example output in detail section of original post). Could you confirm whether I missed something?

viktor-ferenczi · 2024-02-13T20:31:59Z

Sure, I switch to your branch and start using it. Let's see whether it works as expected.

viktor-ferenczi · 2024-02-13T23:41:43Z

Your branch works way better than main. No crashes so far and the generations are good.

There is an unrelated issue where the model does not stop generating content on reaching the regex's final state. But this issue is unrelated to your branch and happens the same way on main as well. Added #659 for this separate issue.

lapp0 · 2024-02-14T01:35:16Z

Thanks so much for testing!

I marked this as fixing #610 and #605, please correct me if I'm wrong.

rlouf · 2024-02-14T07:27:44Z

Thank you, this is going to solve a few problems for downstream libraries. I’m not sure how LoRAX integrated Outlines, but we should make sure this doesn’t break their code before releasing.

lapp0 · 2024-02-14T17:24:00Z

vLLM: acknowledged in this PR Add guided decoding for OpenAI API server vllm-project/vllm#2819 (comment)
XAgent pins to outlines==0.0.11
LoRAX copies the class and doesn't use seq_id. I'm not sure if their implementation even works with the current release. https://github.com/predibase/lorax/blob/7493680ca53558a63a10fa516ea6425337d3cd44/server/lorax_server/utils/logits_process.py#L472-L538

As an aside, I was thinking it'd be wise to highlight who is using Outlines https://github.com/outlines-dev/outlines/network/dependents

lapp0 marked this pull request as draft February 9, 2024 23:36

Andrew Lapp added 2 commits February 13, 2024 11:42

Index RegexLogitsProcessor with Input IDs, Eliminate

c9f02db

use last_seq_id

049c059

lapp0 force-pushed the fsm-tuple-ids-index branch from e8260a6 to 049c059 Compare February 13, 2024 17:42

lapp0 marked this pull request as ready for review February 13, 2024 18:25

lapp0 mentioned this pull request Feb 14, 2024

Add guided decoding for OpenAI API server vllm-project/vllm#2819

Merged

rlouf changed the title ~~Index RegexLogitsProcessor with Input IDs, Eliminate _patched_apply_logits_processor~~ Keep track of state in RegexLogitsProcessor using input_ids Feb 14, 2024

rlouf merged commit b89e8df into dottxt-ai:main Feb 14, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep track of state in `RegexLogitsProcessor` using `input_ids` #628

Keep track of state in `RegexLogitsProcessor` using `input_ids` #628

lapp0 commented Feb 9, 2024 •

edited

Loading

viktor-ferenczi commented Feb 10, 2024

viktor-ferenczi commented Feb 10, 2024 •

edited

Loading

viktor-ferenczi commented Feb 10, 2024 •

edited

Loading

viktor-ferenczi commented Feb 11, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024

viktor-ferenczi commented Feb 13, 2024

viktor-ferenczi commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 14, 2024 •

edited

Loading

rlouf commented Feb 14, 2024

lapp0 commented Feb 14, 2024

Keep track of state in RegexLogitsProcessor using input_ids #628

Keep track of state in RegexLogitsProcessor using input_ids #628

Conversation

lapp0 commented Feb 9, 2024 • edited Loading

Smoke tests

basic parallel

never ending regex

sometimes ending early regex

Viktor's regex

Viktors schema

viktor-ferenczi commented Feb 10, 2024

viktor-ferenczi commented Feb 10, 2024 • edited Loading

viktor-ferenczi commented Feb 10, 2024 • edited Loading

viktor-ferenczi commented Feb 11, 2024 • edited Loading

lapp0 commented Feb 13, 2024 • edited Loading

lapp0 commented Feb 13, 2024

viktor-ferenczi commented Feb 13, 2024

viktor-ferenczi commented Feb 13, 2024 • edited Loading

lapp0 commented Feb 14, 2024 • edited Loading

rlouf commented Feb 14, 2024

lapp0 commented Feb 14, 2024

Keep track of state in `RegexLogitsProcessor` using `input_ids` #628

Keep track of state in `RegexLogitsProcessor` using `input_ids` #628

lapp0 commented Feb 9, 2024 •

edited

Loading

viktor-ferenczi commented Feb 10, 2024 •

edited

Loading

viktor-ferenczi commented Feb 10, 2024 •

edited

Loading

viktor-ferenczi commented Feb 11, 2024 •

edited

Loading

lapp0 commented Feb 13, 2024 •

edited

Loading

viktor-ferenczi commented Feb 13, 2024 •

edited

Loading

lapp0 commented Feb 14, 2024 •

edited

Loading