Skip to content

Slow generation, that depends on the max_tokens param when Lark grammar used #1210

Answered by lapp0
plutasnyy asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @plutasnyy

Outlines CFG is in beta, and has some performance and correctness bugs.

Terminals are all converted to FSMs before generation per https://github.com/dottxt-ai/outlines/blob/main/outlines/fsm/parsing.py#L552C9-L558

Per the profiling in the CFG beta PR ("Benchmarks" section), the slowness is due to using the partial parser to check each tokens legality.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by plutasnyy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants