Update `CFGGuide` to use `outlines.fsm.parsing`. Enable `generate.cfg` #1067

lapp0 · 2024-07-25T20:48:21Z

Rendered Docs

https://github.com/lapp0/outlines/blob/cfg-beta/docs/reference/creating_grammars.md
https://github.com/lapp0/outlines/blob/cfg-beta/docs/reference/cfg.md#disclaimer (only disclaimer section changed)

Fixes:

Fixes Allow Tokens to Span Multiple Terminals in CFG #684
Fixes Run-time structured generation benchmarks #549
Fixes Using context-free grammars to guide generation does not work #959
Fixes Add JSON grammar #634 (already was added)

Changes

`CFGGuide`

Created a stateless CFGGuide based on @brandonwillard's implementation in examples/parsing.py
Update outlines.fsm.parsing to handle some edge cases
- Implement accepts() and feed_eof() for termination checking.
- Bug fix: before fix, tokens which exceeded the bounds of the terminal, but had no matching subsequent terminal candidate were still marked as valid.
Delete CFGFSM and CFGFSM tests

Grammars

Fix ESCAPED_STRING in json.lark and common.lark

Integrations

Implement outlines.generate.cfg(...) via SequenceGeneratorAdapter
Implement outlines.processors.CFGLogitsProcessor

Testing

tests/fsm/test_cfg_guide.py

test_cfg_next_token: assert that given a sequence of prior tokens generated, the expected next tokens in a vocabulary are allowed.
test_cfg_grammar_sample: Provided a sample valid with a grammar, token_ids = tokenizer.encode(sample) Assert that token_ids can be produced by CFGGuide. Allows for a new test to be created by simply adding an example to tests/cfg_samples/

Test outlines.generate.cfg via tests/generate/test_generate.py

Update tests/fsm/test_guide.py to test for CFGGuide.must_terminate_state() and CFGGuide.can_terminate_state()

Benchmarks

benchmarks/bench_cfg_guide.py: measure CFGGuide construction time, token run time, and token run peak-memory

Analysis

Using gpt2 tokenizer: regardless of length, 10 tokens, 40 tokens, or 100 tokens, it takes ~1.2 seconds to generate a token.

Unsurprisingly get_next_instruction takes most of the time, totaling over 99.99% of the runtime. It's intuitive considering the same operation is applied for get_next_state, but for a single token instead of once for each of gpt2's 50,257 tokens.

Breakdown:

lexing takes ~58% of the time (no copying involved in lexing)
copying takes ~26% of the time
every thing else takes ~16% of the time

cProfile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  140.176  140.176 {built-in method builtins.exec}
        1    0.000    0.000  140.176  140.176 <string>:1(<module>)
        1    0.003    0.003  140.176  140.176 /home/andrew/p/outlines/profile_cfg.py:15(profile_guide_run)
       40    3.736    0.093  140.159    3.504 /home/andrew/p/outlines/outlines/fsm/guide.py:324(get_next_instruction)
  1758994    0.785    0.000   92.318    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:140(parse_from_state)
  1758994    2.055    0.000   91.533    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:482(parse_from_state)
  2917115    2.304    0.000   81.840    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:630(lex)
  2916020    8.630    0.000   79.111    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:696(next_token)
11708177/2913032   11.354    0.000   39.208    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/copy.py:66(copy)
  1759029    2.429    0.000   36.344    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:453(__copy__)
  2329598    1.455    0.000   33.410    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:693(match)
  2329598    8.935    0.000   31.644    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:562(match)
  1759029    1.509    0.000   24.687    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:145(__copy__)
  1555742    2.393    0.000   15.974    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:545(get_terminals_info)
  2312124    1.152    0.000   14.318    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:202(__new__)
  3111484    7.513    0.000   13.219    0.000 /home/andrew/p/outlines/outlines/fsm/regex.py:619(get_sub_fsms_from_seq)
  2312124    1.477    0.000   13.166    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:213(_future_new)
  8160509    1.637    0.000   12.493    0.000 {built-in method __new__ of type object at 0x7fee64db5340}
  1759029    1.347    0.000   12.287    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:427(__copy__)
2309943/1154085    2.419    0.000   10.857    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/dataclasses.py:233(wrapper)
  3518058    5.254    0.000    8.725    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/copy.py:259(_reconstruct)
  2329614    1.714    0.000    8.284    0.000 /nix/store/mrp9s742bpjwv7lb3rv3ikv8qx72nj0d-python3.11-numba-0.59.1/lib/python3.11/site-packages/numba/core/dispatcher.py:724(typeof_pyval)
  1158121    1.975    0.000    7.087    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:362(feed_token)
  2329598    5.019    0.000    6.705    0.000 /home/andrew/p/outlines/outlines/fsm/regex.py:465(walk_fsm)
  2329882    1.531    0.000    6.388    0.000 /nix/store/mrp9s742bpjwv7lb3rv3ikv8qx72nj0d-python3.11-numba-0.59.1/lib/python3.11/site-packages/numba/core/typing/typeof.py:27(typeof)
  2329598    5.240    0.000    6.380    0.000 /home/andrew/p/outlines/outlines/fsm/regex.py:694(get_token_transition_keys)
  3111484    3.793    0.000    5.577    0.000 /home/andrew/p/outlines/outlines/fsm/regex.py:646(<genexpr>)
  1759029    2.521    0.000    5.514    0.000 /home/andrew/p/outlines/outlines/models/transformers.py:96(convert_token_to_string)
  1759029    2.159    0.000    4.904    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/copy.py:128(deepcopy)
14580583/11126799    1.989    0.000    4.371    0.000 {built-in method builtins.isinstance}
2330433/2329882    1.406    0.000    3.828    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/functools.py:904(wrapper)
  1759029    1.323    0.000    2.993    0.000 /nix/store/m7bq08w4hkvyby4s2w04pv6jjh4jk13l-python3.11-transformers-4.41.0/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py:618(convert_tokens_to_string)
   602706    1.057    0.000    2.707    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/exceptions.py:179(__init__)
 27668151    2.677    0.000    2.678    0.000 {method 'get' of 'dict' objects}
 12318040    2.217    0.000    2.217    0.000 {built-in method builtins.getattr}
  1726892    0.754    0.000    2.142    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/typing.py:1327(__instancecheck__)
  3518058    2.115    0.000    2.115    0.000 {method '__reduce_ex__' of 'object' objects}
  2330433    1.183    0.000    2.066    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/functools.py:818(dispatch)
  1154003    0.682    0.000    2.058    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:252(new_borrow_pos)
  3518058    1.236    0.000    1.626    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/copyreg.py:104(__newobj__)
   602706    1.090    0.000    1.584    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/exceptions.py:55(get_context)
  1759029    0.974    0.000    1.471    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:349(__init__)
  1759029    1.451    0.000    1.451    0.000 {method 'decode' of 'tokenizers.decoders.Decoder' objects}
  1759029    1.195    0.000    1.435    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/copy.py:243(_keep_alive)
  1726892    1.068    0.000    1.395    0.000 /home/andrew/p/outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:292(feed)
  1726892    0.946    0.000    1.388    0.000 /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/typing.py:1602(__subclasscheck__)
  1154085    0.353    0.000    1.280    0.000 /home/andrew/p/outlines/outlines/fsm/parsing.py:867(get_contextual_lexer)
17163298/17163296    1.272    0.000    1.272    0.000 {built-in method builtins.len}
  4659212    1.141    0.000    1.141    0.000 /nix/store/mrp9s742bpjwv7lb3rv3ikv8qx72nj0d-python3.11-numba-0.59.1/lib/python3.11/site-packages/numba/core/serialize.py:30(_numba_unpickle)

TODO

warning in CFGGuide while these issues aren't resolved
Include rejection sampling, but no other optimizations

lapp0 · 2024-07-25T21:06:19Z

Please provide comments for these issues. I will create the New Issues once they're refined / approved.

New Issues: Direct `CFGGuide` Fixes

We should create a milestone for CFGGuide. Here are some necessary performance and correctness improvements.

Ensure parser allows ambiguous terminals

The grammar ?start: /ab*/ /bc?/ with a generation of abbbb doesn't allow a next token of c, it requires bc.

Allow skipping rule

The grammar ?start: "a" ("b" | "c")* "d" with a generation of a doesn't allow a next token of d

Incorrectly Over-Constrained

arithmetic_lots_of_ops.arithmetic.test - guide doesn't allow generation of eos token at end

Improve performance

Add benchmarks for the first 10 tokens generated, and for the last 10 of 100.
Improve performance generally (see profile in "Benchmarks" section of this issue)

New Issues: Other Enabled Improvements

Clean Up Dead Code

Remove

StopAtEosFSM
RegexFSM
Consider whether StopAtEOSGuide is useful anywhere

Ensure token decode correctness in `RegexGuide` as well

https://github.com/outlines-dev/outlines/pull/1067/files#r1693983229

Already Existing Issues

This PR enables the completion of the following issues issues

Allow CFG in `outlines.serve`

#780

We currently only allow json and regex in serve https://github.com/outlines-dev/outlines/blob/5d97ee1/outlines/serve/serve.py#L69-L70

Introduce SQL and Lark Grammars

SQL and Lark grammars and tests are already implemented in #587

Context-sensitive features such as pythons tree parser

Currently python's TreeIndenter isn't supported: #592

Fix `models.llamacpp` tokenizer

Currently has a different interface than all other tokenizers making CFG not work properly #936

Remove `Guide.is_final_state`

#885

is_final_state is ambiguous. In a separate PR we should remove is_final_state

lapp0 · 2024-07-25T21:38:17Z

outlines/fsm/parsing.py

@@ -614,6 +652,8 @@ def __init__(self, conf: "LexerConf", states, always_accept=()):
                lexer_conf.terminals = [
                    terminals_by_name[n] for n in accepts if n in terminals_by_name
                ]
+                if not lexer_conf.terminals:
+                    continue


Note: Bug-fix for case where no lexer_conf.terminals is empty (happens when EOS is only legal next terminal)

lapp0 · 2024-07-25T21:38:53Z

outlines/fsm/parsing.py

+                    token_history=lexer_state.last_token and [lexer_state.last_token],
+                    state=parser_state,
+                    terminals_by_name=self.root_lexer.terminals,
+                )


Note: Fixes the following tests in test_cfg_guide.py::test_cfg_next_token

Multiple Valid Continuations

Token is Substring of Another Token

Recursive Patterns

brandonwillard · 2024-07-25T23:01:03Z

warning in CFGGuide while these issues aren't resolved

We need a more general and persistent warning that explains that the CFG implementation is (and has always been) experimental community-contributed code. We may even need to clarify that it does not reflect the approach described in our technical report—aside from its use of incremental/partial parsing.

lapp0 · 2024-07-26T00:15:58Z

Thanks, I'll update docs/reference/cfg.md with a new section, and have the warning link there.

lapp0 · 2024-07-27T16:17:22Z

I have added rejection sampling.

It checks each token for acceptance, starting with highest logprob, completing once one sample is accepted. This is effectively greedy sampling. This behavior is documented in docs/reference/cfg.md.

It is used by default in outlines.processors / outlines.generate.cfg

Benchmarks

bench_cfg_guide.CFGGuideBenchmark.time_cfg_guide_run                                                                                                                                              ============ ============
                param1                
             ------------ ------------
                 json      227±100ms  
              arithmetic   1.54±0.04s 
             ============ ============

bench_cfg_guide.CFGGuideBenchmark.time_cfg_guide_run_rejection_sampling                                                                                                                           ============ ===========
                param1               
             ------------ -----------
                 json      44.3±20ms 
              arithmetic    75.4±6ms 
             ============ ===========

Benchmarks aren't a strong indicator though, it's performance improvement is entirely dependent on the fraction of tokens which are valid under the grammars production rules at each state used in sampling.

lapp0 · 2024-07-27T16:22:23Z

outlines/fsm/guide.py

+        # normalize
+        if state.prev_token is None:
+            new_token_str = self.tokenizer.decode([token_id])[0]
+        else:
+            prev_token_str = self.tokenizer.decode([[state.prev_token]])[0]
+            combined_token_str = self.tokenizer.decode([[state.prev_token, token_id]])[
+                0
+            ]
+            new_token_str = combined_token_str[len(prev_token_str) :]


Note: This token normalization step, which determines a tokens decoded value provided the previous token, is necessary for correctness of RegexFSM as well. Should be a separate issue.

miftahmoha · 2024-07-27T22:01:23Z

As a I've signaled here in 28 April #788 (comment).

I've been working on a parser recently which I would say is around 95% of progress. It's built from scratch and also dependency free (excluding tests), it has its own internals when it comes to guiding a CFG. It's solely designed for that purpose.

@rlouf @brandonwillard @lapp0 Should we have a look at it when it's finished? Should we discuss it in the discord server?

The discord link in GitHub is broken BTW, it would be nice if someone could fix it.

lapp0 · 2024-07-28T03:00:24Z

As a I've signaled here in 28 April #788 (comment).

I've been working on a parser recently which I would say is around 95% of progress. It's built from scratch and also dependency free (excluding tests), it has its own internals when it comes to guiding a CFG. It's solely designed for that purpose.

@rlouf @brandonwillard @lapp0 Should we have a look at it when it's finished? Should we discuss it in the discord server?

The discord link in GitHub is broken BTW, it would be nice if someone could fix it.

Very interesting. Yes, we can discuss here or on Discord. Here's a temporary invite while we sort out the invite situation https://discord.gg/H7pEAMPZ

Edit: I tried all 3 discord links on GitHub, they all work for me. Which one is broken?

miftahmoha · 2024-07-28T10:06:55Z

@lapp0 It seems like the problem came from my discord. I was referring to the link in the "Join us" section, it does work. Thanks.

lapp0 marked this pull request as ready for review July 25, 2024 21:12

lapp0 requested a review from brandonwillard July 25, 2024 21:13

lapp0 commented Jul 25, 2024

View reviewed changes

brandonwillard added enhancement structured generation Linked to structured generation correctness Everything related to the generation correctness grammar labels Jul 25, 2024

lapp0 force-pushed the cfg-beta branch 4 times, most recently from 19129e5 to 23a310e Compare July 26, 2024 20:00

lapp0 marked this pull request as draft July 27, 2024 01:09

lapp0 force-pushed the cfg-beta branch 3 times, most recently from a5f529d to 823559a Compare July 27, 2024 07:36

lapp0 marked this pull request as ready for review July 27, 2024 07:43

lapp0 force-pushed the cfg-beta branch 2 times, most recently from db36739 to 5922b3b Compare July 27, 2024 16:17

lapp0 commented Jul 27, 2024

View reviewed changes

brandonwillard force-pushed the cfg-beta branch from 5922b3b to 5d55e8d Compare July 27, 2024 20:53

brandonwillard added the run-benchmarks label Jul 27, 2024

rlouf force-pushed the cfg-beta branch from 5d55e8d to 77aecd6 Compare August 6, 2024 13:21

lapp0 mentioned this pull request Aug 12, 2024

Cfg beta lapp0/outlines#85

Open

6 tasks

lapp0 force-pushed the cfg-beta branch from 77aecd6 to a0df818 Compare August 12, 2024 19:44

rlouf force-pushed the cfg-beta branch from a0df818 to bca550c Compare August 13, 2024 14:02

rlouf added this to the 0.1 milestone Aug 13, 2024

lapp0 force-pushed the cfg-beta branch from bca550c to e763f29 Compare August 30, 2024 22:05

brandonwillard approved these changes Aug 30, 2024

View reviewed changes

lapp0 added 3 commits August 30, 2024 20:39

Update CFGGuide to use outlines.fsm.parsing. Enable generate.cfg

72479c6

Add experimental-warning to CFGGuide, update warning-linked docs

21994f0

Add benchmark: CFG rejection sampling + CFG no rejection sampling

b8d5b42

brandonwillard force-pushed the cfg-beta branch from e763f29 to b8d5b42 Compare August 31, 2024 01:39

brandonwillard merged commit 72377db into dottxt-ai:main Aug 31, 2024
5 of 7 checks passed

This was referenced Sep 14, 2024

Improve CFG Correctness #1151

Open

Cleanup Dead Code (FSM) #1152

Open

Batched generation does not work #1018

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `CFGGuide` to use `outlines.fsm.parsing`. Enable `generate.cfg` #1067

Update `CFGGuide` to use `outlines.fsm.parsing`. Enable `generate.cfg` #1067

lapp0 commented Jul 25, 2024 •

edited

Loading

lapp0 commented Jul 25, 2024 •

edited

Loading

lapp0 Jul 25, 2024

lapp0 Jul 25, 2024

brandonwillard commented Jul 25, 2024

lapp0 commented Jul 26, 2024

lapp0 commented Jul 27, 2024 •

edited

Loading

lapp0 Jul 27, 2024 •

edited

Loading

miftahmoha commented Jul 27, 2024 •

edited

Loading

lapp0 commented Jul 28, 2024 •

edited

Loading

miftahmoha commented Jul 28, 2024

Update CFGGuide to use outlines.fsm.parsing. Enable generate.cfg #1067

Update CFGGuide to use outlines.fsm.parsing. Enable generate.cfg #1067

Conversation

lapp0 commented Jul 25, 2024 • edited Loading

Rendered Docs

Fixes:

Changes

CFGGuide

Grammars

Integrations

Testing

Benchmarks

Analysis

TODO

lapp0 commented Jul 25, 2024 • edited Loading

New Issues: Direct CFGGuide Fixes

Ensure parser allows ambiguous terminals

Allow skipping rule

Incorrectly Over-Constrained

Improve performance

New Issues: Other Enabled Improvements

Clean Up Dead Code

Ensure token decode correctness in RegexGuide as well

Already Existing Issues

Allow CFG in outlines.serve

Introduce SQL and Lark Grammars

Context-sensitive features such as pythons tree parser

Fix models.llamacpp tokenizer

Remove Guide.is_final_state

lapp0 Jul 25, 2024

Choose a reason for hiding this comment

lapp0 Jul 25, 2024

Choose a reason for hiding this comment

brandonwillard commented Jul 25, 2024

lapp0 commented Jul 26, 2024

lapp0 commented Jul 27, 2024 • edited Loading

Benchmarks

lapp0 Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

miftahmoha commented Jul 27, 2024 • edited Loading

lapp0 commented Jul 28, 2024 • edited Loading

miftahmoha commented Jul 28, 2024

Update `CFGGuide` to use `outlines.fsm.parsing`. Enable `generate.cfg` #1067

Update `CFGGuide` to use `outlines.fsm.parsing`. Enable `generate.cfg` #1067

lapp0 commented Jul 25, 2024 •

edited

Loading

`CFGGuide`

lapp0 commented Jul 25, 2024 •

edited

Loading

New Issues: Direct `CFGGuide` Fixes

Ensure token decode correctness in `RegexGuide` as well

Allow CFG in `outlines.serve`

Fix `models.llamacpp` tokenizer

Remove `Guide.is_final_state`

lapp0 commented Jul 27, 2024 •

edited

Loading

lapp0 Jul 27, 2024 •

edited

Loading

miftahmoha commented Jul 27, 2024 •

edited

Loading

lapp0 commented Jul 28, 2024 •

edited

Loading