Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Infinite Repetition in JSON Schemas Using Integer and String #1154

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/reference/generation/json.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,21 @@ print(result)
# User(name="John", last_name="Doe", id=11)
```

!!! Note "JSON and whitespaces"
!!! Note "JSON and unlimited patterns"

By default Outlines prevents the model from generating json with syntactic newlines, tabs, or multiple spaces. The default `whitespace_pattern` is `r"[ ]?"`. Small models tend to enter an infinite repetition loop if the `whitespace_pattern` allows infinite spacing. If you would like to allow the model to generate multiple tabs, newlines, and spaces, you can set the whitespace pattern as follows:
By default Outlines prevents the model from generating json with syntactic newlines, tabs, or multiple spaces. Additionally by default strings cannot be longer than 256 characters, and integers are bound between -1e19 and 1e19. Small models tend to enter an infinite repetition loop if JSON schema generation isn't constrained. If you would like to allow the model to generate multiple tabs, newlines, and spaces, you can set the whitespace pattern as follows:

```python
generator = generate.json(model, User, whitespace_pattern=r"[\n\t ]*")
```

Or you can remove all implicit constraints on json generation (whitespace, integer, and string) with

```python
generator = generate.json(model, User, safe_subset=False)
```


!!! Note "Performance"

`generation.json` computes an index that helps Outlines guide generation. This can take some time, but only needs to be done once. If you want to generate several times with the same schema make sure that you only call `generate.json` once.
Expand Down
Loading
Loading