LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Radu1999 · 2024-06-13T18:32:39Z

Describe the issue as clearly as possible:

The example with custom fsm from documentation doesnt work for LlamaCpp as

logits, kv_cache = model(token_ids, attention_masks, kv_cache)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'LlamaCpp' object is not callable

Steps/code to reproduce the bug:

from transformers import AutoTokenizer
from outlines import models, generate
from outlines.models.transformers import TransformerTokenizer
from llama_cpp import Llama
import interegular
import torch

if __name__ == "__main__":
    # Create model
    llm = Llama("./models/Mistral-7B-Instruct-v0.2/mistral-7b-instruct-v0.2.Q5_K_M.gguf")
    model = models.LlamaCpp(llm)
    model.tokenizer = TransformerTokenizer(AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_fast=True))
    model.device = 'cpu'

    # Create fsm
    list_of_strings_pattern = """\["[^"\s]*"(?:,"[^"\s]*")*\]"""
    pink_elephant_pattern = """.*(pink|elephant).*"""

    list_of_strings_fsm = interegular.parse_pattern(list_of_strings_pattern).to_fsm()
    pink_elephant_fsm = interegular.parse_pattern(pink_elephant_pattern).to_fsm()

    difference_fsm = list_of_strings_fsm - pink_elephant_fsm

    generator = generate.fsm(model, difference_fsm)
    rng = torch.Generator(device="cpu")
    rng.manual_seed(789005)

    response = generator("[INST] Don't talk about pink elephants [/INST]")
    print(response)

Expected result:

I d expect it to work :)

Error message:

No response

Outlines/Python version information:

Version information
latest

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

lapp0 · 2024-06-21T15:11:14Z

It will be a bit before this is merged into main, but you can try it early with

pip install --upgrade git+https://github.com/lapp0/outlines@fix-llamacpp-fsm

Works on my end, please let me know if you run into any issues!

….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`

Radu1999 added the bug label Jun 13, 2024

brandonwillard added enhancement bug structured generation Linked to structured generation llama.cpp Related to the `llama.cpp` integration and removed bug enhancement labels Jun 13, 2024

lapp0 linked a pull request Jun 21, 2024 that will close this issue

Use outlines.processors for models.llamacpp #997

Draft

lapp0 mentioned this issue Jun 21, 2024

Improve outlines.processors, add integration tests to test_generate.py #998

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Radu1999 commented Jun 13, 2024

lapp0 commented Jun 21, 2024

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

LlamaCpp doesnt work with generate.fsm for custom FSMs #965

Comments

Radu1999 commented Jun 13, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

lapp0 commented Jun 21, 2024