Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid generate.json Output with models.llamacpp #973

Closed
lapp0 opened this issue Jun 15, 2024 · 0 comments · Fixed by #996
Closed

Invalid generate.json Output with models.llamacpp #973

lapp0 opened this issue Jun 15, 2024 · 0 comments · Fixed by #996
Assignees
Labels
bug JSON llama.cpp Related to the `llama.cpp` integration structured generation Linked to structured generation

Comments

@lapp0
Copy link
Contributor

lapp0 commented Jun 15, 2024

Related:

Describe the issue as clearly as possible:

models.llamacpp used with generate.json can result in an output which isn't valid with the pydantic schema.

Steps/code to reproduce the bug:

# Reproducer from "Robert Roland Roger" on Discord:

import outlines
from pydantic import BaseModel, StringConstraints
from typing import Annotated
import llama_cpp

model = outlines.models.llamacpp(
    "TheBloke/zephyr-7B-beta-GGUF", 
    "zephyr-7b-beta.Q6_K.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
)

class Character(BaseModel):
    name: Annotated[str, StringConstraints(max_length=10)]
    #age: int = Field(..., ge=18, lt=99)
    age: int
    #strength: int = Field(..., ge=1, lt=100)
    strength: int

generator = outlines.generate.json(model, Character)

prompt = """
<|system|>
You always output valid JSON.
<|user|>
Generate a new character in valid json. Use the following fields:
name, age (between 18 and 99), armor beween 1 and 100) and strength.
<|assistant|>
"""
character = generator(prompt)

Expected result:

An output which pydantic recognizes as valid with the schema.

Error message:

Expecting ',' delimiter: line 1 column 28 (char 27) [type=value_error.jsondecode, input_value='{ "name": "Elena", "age": 2', input_type=str]

Outlines/Python version information:

Version information

``` outlines==0.0.44 llama-cpp-python==0.2.78 ```

Another Reproducer using models.transformers_multimodal

import outlines

model = outlines.models.transformers_multimodal(
    "llava-hf/llava-v1.6-mistral-7b-hf",
	device="cuda",
	model_kwargs=dict(torch_dtype=torch.bfloat16)
)

from pydantic import BaseModel
from typing import List, Optional

class ImageData(BaseModel):
    caption: str
    tags: List[str]
    location: Optional[str]
    objects: List[str]

image_data_generator = outlines.generate.json(model, ImageData)
image_data_generator(
    "<image> detailed JSON metadata:",
    "https://upload.wikimedia.org/wikipedia/commons/e/ea/FCAB_EMD_GT22CU-3_San_Pedro_-_Ascotan.jpg"
)
@lapp0 lapp0 added the bug label Jun 15, 2024
@lapp0 lapp0 changed the title Invalid JSON Generation in models.llamacpp Invalid generate.json Output with models.llamacpp Jun 15, 2024
@rlouf rlouf assigned rlouf and lapp0 and unassigned rlouf Jun 18, 2024
@rlouf rlouf moved this to Todo in Improve Outlines Jun 18, 2024
@lapp0 lapp0 added structured generation Linked to structured generation JSON llama.cpp Related to the `llama.cpp` integration labels Jun 19, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Improve Outlines Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug JSON llama.cpp Related to the `llama.cpp` integration structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants