-
Notifications
You must be signed in to change notification settings - Fork 485
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve the documentation for structured generation
- Loading branch information
Showing
13 changed files
with
258 additions
and
130 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: outlines.fsm.guide |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,63 @@ | ||
# Grammar-structured generation | ||
|
||
You can pass any context-free grammar in the EBNF format and Outlines will generate an output that is valid to this grammar: | ||
|
||
```python | ||
from outlines import models, generate | ||
|
||
arithmetic_grammar = """ | ||
?start: expression | ||
?expression: term (("+" | "-") term)* | ||
?term: factor (("*" | "/") factor)* | ||
?factor: NUMBER | ||
| "-" factor | ||
| "(" expression ")" | ||
%import common.NUMBER | ||
""" | ||
|
||
model = models.transformers("WizardLM/WizardMath-7B-V1.1") | ||
generator = generate.cfg(model, arithmetic_grammar) | ||
sequence = generator( | ||
"Alice had 4 apples and Bob ate 2. " | ||
+ "Write an expression for Alice's apples:" | ||
) | ||
|
||
print(sequence) | ||
# (8-2) | ||
``` | ||
|
||
!!! Note "Performance" | ||
|
||
The implementation of grammar-structured generation in Outlines is very naive. This does not reflect the performance of [.txt](https://dottxt.co)'s product, where we made grammar-structured generation as fast as regex-structured generation. | ||
|
||
|
||
## Ready-to-use grammars | ||
|
||
Outlines contains a (small) library of grammars that can be imported and use directly. We can rewrite the previous example as: | ||
|
||
```python | ||
from outlines import models, generate | ||
|
||
arithmetic_grammar = outlines.grammars.arithmetic | ||
|
||
model = models.transformers("WizardLM/WizardMath-7B-V1.1") | ||
generator = generate.cfg(model, arithmetic_grammar) | ||
sequence = generator( | ||
"Alice had 4 apples and Bob ate 2. " | ||
+ "Write an expression for Alice's apples:" | ||
) | ||
|
||
print(sequence) | ||
# (8-2) | ||
``` | ||
|
||
The following grammars are currently available: | ||
|
||
- Arithmetic grammar via `outlines.grammars.arithmetic` | ||
- JSON grammar via `outlines.grammars.json` | ||
|
||
If you would like more grammars to be added to the repository, please open an [issue](https://github.com/outlines-dev/outlines/issues) or a [pull request](https://github.com/outlines-dev/outlines/pulls). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,16 @@ | ||
# Multiple choices | ||
|
||
Choice between different options | ||
In some cases we know the output is to be chosen between different options. We can restrict the completion’s output to these choices using the is_in keyword argument: | ||
Oultines allows you to make sure the generated text is chosen between different options: | ||
|
||
```python | ||
import outlines.models as models | ||
from outlines import models, generate | ||
|
||
model = models.transformers("mistralai/Mistral-7B-v0.1") | ||
generator = generate.choice(model, ["skirt", "dress", "pen", "jacket"]) | ||
answer = generator("Pick the odd word out: skirt, dress, pen, jacket") | ||
|
||
complete = models.openai("gpt-3.5-turbo") | ||
answer = complete( | ||
"Pick the odd word out: skirt, dress, pen, jacket", | ||
is_in=["skirt", "dress", "pen", "jacket"] | ||
) | ||
``` | ||
|
||
!!! Note "Performance" | ||
|
||
`generation.choice` computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate from the same list of choices several times make sure that you only call `generate.choice` once. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,83 +1,37 @@ | ||
# Custom FSM Operations | ||
|
||
```RegexFSM.from_interegular_fsm``` leverages the flexibility of ```interegular.FSM``` to use the available operations in ```interegular```. | ||
Outlines is fast because it compiles regular expressions into an index ahead of inference. To do so we use the equivalence between regular expressions and Finite State Machines (FSMs), and the library [interegular](https://github.com/MegaIng/interegular) to perform the translation. | ||
|
||
## Examples | ||
Alternatively, one can pass a FSM built using `integular` directly to structure the generation. | ||
|
||
### ```difference``` | ||
## Example | ||
|
||
Returns an FSM which recognises only the strings recognised by the first FSM in the list, but none of the others. | ||
### Using the `difference` operation | ||
|
||
In the following example we build a fsm which recognizes only the strings valid to the first regular expression but not the second. In particular, it will prevent the words "pink" and "elephant" from being generated: | ||
|
||
```python | ||
import interegular | ||
from outlines import models, generate | ||
|
||
|
||
list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]""" | ||
pink_elephant_pattern = """.*(pink|elephant).*""" | ||
|
||
list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm() | ||
pink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm() | ||
|
||
list_of_strings_fsm.accepts('["a","pink","elephant"]') | ||
# True | ||
|
||
difference_fsm = list_of_strings_fsm - pink_elephant_fsm | ||
|
||
difference_fsm_fsm.accepts('["a","pink","elephant"]') | ||
# False | ||
difference_fsm_fsm.accepts('["a","blue","donkey"]') | ||
# True | ||
``` | ||
|
||
### ```union``` | ||
|
||
Returns a finite state machine which accepts any sequence of symbols that is accepted by either self or other. | ||
|
||
```python | ||
list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]""" | ||
tuple_of_strings_pattern = """\("[^"\s]*"(?:,"[^"\s]*")*\)""" | ||
|
||
list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm() | ||
tuple_of_strings_fsm = interegular.parse_pattern(tuple_of_strings_pattern).to_fsm() | ||
|
||
list_of_strings_fsm.accepts('("a","pink","elephant")') | ||
# False | ||
|
||
union_fsm = list_of_strings_fsm|tuple_of_strings_fsm | ||
|
||
union_fsm.accepts('["a","pink","elephant"]') | ||
# True | ||
union_fsm.accepts('("a","blue","donkey")') | ||
# True | ||
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2") | ||
generator = generate.fsm(model, difference_fsm) | ||
response = generator("Don't talk about pink elephants") | ||
``` | ||
|
||
### ```intersection``` | ||
|
||
Returns an FSM which accepts any sequence of symbols that is accepted by both of the original FSMs. | ||
|
||
```python | ||
list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]""" | ||
pink_elephant_pattern = """.*(pink|elephant).*""" | ||
|
||
list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm() | ||
pink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm() | ||
|
||
list_of_strings_fsm.accepts('["a","blue","donkey"]') | ||
# True | ||
|
||
intersection_fsm = list_of_strings_fsm & pink_elephant_fsm | ||
|
||
intersection_fsm.accepts('["a","pink","elephant"]') | ||
# True | ||
intersection_fsm.accepts('["a","blue","donkey"]') | ||
# False | ||
``` | ||
|
||
_There are more operations available, we refer to https://github.com/MegaIng/interegular/blob/master/interegular/fsm.py._ | ||
|
||
# Loading Custom FSM | ||
|
||
```python | ||
import outlines | ||
|
||
generator = outlines.generate.fsm(model, custom_fsm) | ||
|
||
response = generator(prompt) | ||
``` | ||
To see the other operations available, consult [interegular's documentation](https://github.com/MegaIng/interegular/blob/master/interegular/fsm.py). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# JSON mode | ||
|
||
Outlines can guarantee that the LLM will generate valid JSON, using [Grammar-structured generation](cfg.md): | ||
|
||
```python | ||
from outlines import models, generate | ||
|
||
json_grammar = outlines.grammars.json | ||
|
||
model = models.transformers("mistralai/Mistral-7b-v0.1") | ||
generator = generate.cfg(model, json_grammar) | ||
sequence = generator("Generate valid JSON") | ||
``` | ||
|
||
!!! Note "JSON that follows a schema" | ||
|
||
If you want to guarantee that the generated JSON follows a given schema, consult [this section](json.md) instead. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,27 @@ | ||
# Regular expressions | ||
|
||
Outlines can guarantee that the text generated by the LLM will be valid to a regular expression: | ||
|
||
```python | ||
from outlines import models, generate | ||
|
||
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2") | ||
|
||
generator = generate.regex( | ||
model, | ||
r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)", | ||
) | ||
|
||
prompt = "What is the IP address of the Google DNS servers? " | ||
answer = generator(prompt, max_tokens=30) | ||
|
||
print(answer) | ||
# What is the IP address of the Google DNS servers? | ||
# 2.2.6.1 | ||
``` | ||
|
||
If you find yourself using `generate.regex` to restrict the answers' type you can take a look at [type-structured generation](types.md) instead. | ||
|
||
!!! Note "Performance" | ||
|
||
`generate.regex` computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times using the same regular expression make sure that you only call `generate.regex` once. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.