Instructor streaming structured output with llama-cpp #127

ahuang11 · 2024-02-13T04:58:48Z

import llama_cpp
import instructor
import panel as pn
from pydantic import BaseModel
from huggingface_hub import hf_hub_download
pn.extension()

class Translations(BaseModel):
    chinese: str
    french: str
    spanish: str

model_path = hf_hub_download(
    "TheBloke/OpenHermes-2.5-Mistral-7B-GGUF",
    "openhermes-2.5-mistral-7b.Q4_K_M.gguf",
)
llama = llama_cpp.Llama(
    model_path=model_path,
    n_gpu_layers=-1,
    chat_format="chatml",
    n_ctx=2048,
    draft_model=llama_cpp.llama_speculative.LlamaPromptLookupDecoding(
        num_pred_tokens=2
    ),  # (1)!
    logits_all=True,
    verbose=False,
)
create = instructor.patch(
    create=llama.create_chat_completion_openai_v1,
    mode=instructor.Mode.JSON_SCHEMA,  # (2)!
)
message = {"role": "user", "content": "Teach me how to say `Hello` in three languages!"}
extraction_stream = create(
    response_model=instructor.Partial[Translations],  # (3)!
    messages=[message],
    stream=True,
)
json_pane = pn.pane.JSON()
display(json_pane)
for extraction in extraction_stream:
    json_pane.object = extraction.model_dump()

ahuang11 · 2024-02-15T05:51:42Z

Equivalent in funcchain

from pydantic import BaseModel
from huggingface_hub import hf_hub_download
from funcchain.model.patches.llamacpp import ChatLlamaCpp
from funcchain import settings, chain

pn.extension()


class Translations(BaseModel):
    chinese: str
    french: str
    spanish: str


def create_translations(text: str) -> Translations:
    """
    Translate the given text into three languages.
    """
    return chain()


model_path = hf_hub_download(
    "TheBloke/OpenHermes-2.5-Mistral-7B-GGUF",
    "openhermes-2.5-mistral-7b.Q4_K_M.gguf",
)
llama = ChatLlamaCpp(
    model_path=model_path,
    n_gpu_layers=-1,
    model_kwargs=dict(chat_format="chatml"),
    n_ctx=2048,
    verbose=False,
)
json_pane = pn.pane.JSON()
create_translations("Teach me how to say `Hello` in three languages!")

MarcSkovMadsen · 2024-02-15T06:23:05Z

Would we be able to deploy the local model to Hugging Face?

It would be nice to have live versions of the apps. And not just source code.

ahuang11 · 2024-02-15T06:24:43Z

Potentially? not sure whether huggingface cpus/memory are strong enough to load local llama models. I think when panel 1.4.0 is released, I want to rework a lot of these examples to reflect best practices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructor streaming structured output with llama-cpp #127

Instructor streaming structured output with llama-cpp #127

ahuang11 commented Feb 13, 2024

ahuang11 commented Feb 15, 2024

MarcSkovMadsen commented Feb 15, 2024

ahuang11 commented Feb 15, 2024

Instructor streaming structured output with llama-cpp #127

Instructor streaming structured output with llama-cpp #127

Comments

ahuang11 commented Feb 13, 2024

ahuang11 commented Feb 15, 2024

MarcSkovMadsen commented Feb 15, 2024

ahuang11 commented Feb 15, 2024