diff --git a/docs/cookbook/chain_of_thought.md b/docs/cookbook/chain_of_thought.md index d320feb8d..cc079a7ff 100644 --- a/docs/cookbook/chain_of_thought.md +++ b/docs/cookbook/chain_of_thought.md @@ -7,13 +7,13 @@ In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to ap We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves: -```shell +```bash pip install llama-cpp-python ``` We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -```shell +```bash wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf ``` diff --git a/docs/cookbook/dating_profiles.md b/docs/cookbook/dating_profiles.md index f839b65fe..d0fb9b576 100644 --- a/docs/cookbook/dating_profiles.md +++ b/docs/cookbook/dating_profiles.md @@ -170,7 +170,7 @@ parsed_profile = DatingProfile.model_validate_json(profile) Here are a couple of results: -``` +```json { "bio": """I'm an ambitious lawyer with a casual and fashionable style. I love games and sports, but my true passion is preparing refreshing cocktails at @@ -199,7 +199,7 @@ Here are a couple of results: } ``` -``` +```json { "bio": """I’m a sexy lawyer with time on my hands. I love to game and play ping pong, but the real reason you should swipe to the right diff --git a/docs/cookbook/index.md b/docs/cookbook/index.md index 58e84ae96..a844ce240 100644 --- a/docs/cookbook/index.md +++ b/docs/cookbook/index.md @@ -1,5 +1,7 @@ # Examples +This part of the documentation provides a few cookbooks that you can browse to get acquainted with the library and get some inspiration about what you could do with structured generation. Remember that you can easily change the model that is being used! + - [Classification](classification.md): Classify customer requests. - [Named Entity Extraction](extraction.md): Extract information from pizza orders. - [Dating Profile](dating_profiles.md): Build dating profiles from descriptions using prompt templating and JSON-structured generation. diff --git a/docs/cookbook/knowledge_graph_extraction.md b/docs/cookbook/knowledge_graph_extraction.md index c7e347dd4..c4c1dc75c 100644 --- a/docs/cookbook/knowledge_graph_extraction.md +++ b/docs/cookbook/knowledge_graph_extraction.md @@ -4,13 +4,13 @@ In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to ex We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves: -```shell +```bash pip install llama-cpp-python ``` We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -```shell +```bash wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf ``` diff --git a/docs/cookbook/qa-with-citations.md b/docs/cookbook/qa-with-citations.md index cb39befe9..c2111617f 100644 --- a/docs/cookbook/qa-with-citations.md +++ b/docs/cookbook/qa-with-citations.md @@ -4,13 +4,13 @@ This tutorial is adapted from the [instructor-ollama notebook](https://github.co We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves: -```shell +```bash pip install llama-cpp-python ``` We pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -```shell +```bash wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf ``` diff --git a/docs/cookbook/react_agent.md b/docs/cookbook/react_agent.md index 74930b70b..15fb964a0 100644 --- a/docs/cookbook/react_agent.md +++ b/docs/cookbook/react_agent.md @@ -8,13 +8,13 @@ Additionally, we give the LLM the possibility of using a scratchpad described in We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves: -```shell +```bash pip install llama-cpp-python ``` We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -```shell +```bash wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf ``` @@ -55,9 +55,8 @@ def wikipedia(q): "srsearch": q, "format": "json" }).json()["query"]["search"][0]["snippet"] -``` -```python + def calculate(numexp): return eval(numexp) ``` diff --git a/docs/reference/generation/generation.md b/docs/reference/generation/generation.md index 88b963c72..0c090f8a7 100644 --- a/docs/reference/generation/generation.md +++ b/docs/reference/generation/generation.md @@ -208,3 +208,9 @@ result = generator("What is 2+2?") print(result) # 4 ``` + + +[jsonschema]: https://json-schema.org/learn/getting-started-step-by-step +[pydantic]: https://docs.pydantic.dev/latest +[cfg]: https://en.wikipedia.org/wiki/Context-free_grammar +[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form diff --git a/docs/reference/generation/structured_generation_explanation.md b/docs/reference/generation/structured_generation_explanation.md index 0dedf060b..aa27a7a85 100644 --- a/docs/reference/generation/structured_generation_explanation.md +++ b/docs/reference/generation/structured_generation_explanation.md @@ -1,8 +1,4 @@ ---- -title: Structured Generation Explanation ---- - -# Structured Generation Explanation +# How does Outlines work? Language models generate text token by token, using the previous token sequence as input and sampled logits as output. This document explains the structured generation process, where only legal tokens are considered for the next step based on a predefined automata, e.g. a regex-defined [finite-state machine](https://en.wikipedia.org/wiki/Finite-state_machine) (FSM) or [Lark](https://lark-parser.readthedocs.io/en/stable/) grammar.` diff --git a/docs/reference/models/models.md b/docs/reference/models/models.md index dadfd34ad..34b5be4cf 100644 --- a/docs/reference/models/models.md +++ b/docs/reference/models/models.md @@ -4,60 +4,20 @@ title: Models # Models -Outlines supports generation using a number of inference engines (`outlines.models`) - -Loading a model using outlines follows a similar interface between inference engines. +Outlines supports generation using a number of inference engines (`outlines.models`). Loading a model using outlines follows a similar interface between inference engines: ```python import outlines -``` - -## [Transformers](./transformers.md) - -```python -model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct", model_kwargs={}) -``` - -For additional arguments and use of other Huggingface Transformers model types see [Outlines' Transformers documentation](./transformers.md). - -## [Transformers Vision](./transformers_vision.md) - -```python +model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct") model = outlines.models.transformers_vision("llava-hf/llava-v1.6-mistral-7b-hf") -``` - -For examples of generation and other details, see [Outlines' Transformers Vision documentation](./transformers_vision.md). - -## [vLLM](./vllm.md) - -```python model = outlines.models.vllm("microsoft/Phi-3-mini-128k-instruct") -``` - -## [llama.cpp](./llamacpp.md) - -```python -model = outlines.models.llamacpp("microsoft/Phi-3-mini-4k-instruct-gguf", "Phi-3-mini-4k-instruct-q4.gguf") -``` - -Additional llama.cpp parameters can be found in the [Outlines' llama.cpp documentation](./llamacpp.md). - -## [ExLlamaV2](./exllamav2.md) - -```python +model = outlines.models.llamacpp( + "microsoft/Phi-3-mini-4k-instruct-gguf", "Phi-3-mini-4k-instruct-q4.gguf" +) model = outlines.models.exllamav2("bartowski/Phi-3-mini-128k-instruct-exl2") -``` - -## [MLXLM](./mlxlmx.md) - -```python model = outlines.models.mlxlm("mlx-community/Phi-3-mini-4k-instruct-4bit") -``` -## [OpenAI](./openai.md) - -```python model = outlines.models.openai( "gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"] @@ -66,7 +26,7 @@ model = outlines.models.openai( # Feature Matrix -| | Transformers | Transformers Vision | vLLM | llama.cpp | ExLlamaV2 | MLXLM | OpenAI* | +| | [Transformers](transformers.md) | [Transformers Vision](transformers_vision.md) | [vLLM](vllm.md) | [llama.cpp](llamacpp.md) | [ExLlamaV2](exllamav2.md) | [MLXLM](mlxlm.md) | [OpenAI](openai.md)* | |-------------------|--------------|---------------------|------|-----------|-----------|-------|---------| | **Device** | | | | | | | | | Cuda | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | N/A | diff --git a/docs/reference/models/transformers.md b/docs/reference/models/transformers.md index 15eabb682..2a13e28ec 100644 --- a/docs/reference/models/transformers.md +++ b/docs/reference/models/transformers.md @@ -33,14 +33,15 @@ model = models.Transformers(llm, tokenizer) # Using Logits Processors There are two ways to use Outlines Structured Generation with HuggingFace Transformers: -- 1) Use Outlines generation wrapper, `outlines.models.transformers` -- 2) Use `OutlinesLogitsProcessor` with `transformers.AutoModelForCausalLM` + +1. Use Outlines generation wrapper, `outlines.models.transformers` +2. Use `OutlinesLogitsProcessor` with `transformers.AutoModelForCausalLM` Outlines supports a myriad of logits processors for structured generation. In these example, we will use the `RegexLogitsProcessor` which guarantees generated text matches the specified pattern. -## Example: `outlines.models.transformers` +## Using `outlines.models.transformers` -``` +```python import outlines time_regex_pattern = r"(0?[1-9]|1[0-2]):[0-5]\d\s?(am|pm)?" @@ -53,9 +54,9 @@ print(output) # 2:30 pm ``` -## Example: Direct `transformers` library use +## Using models initialized via the `transformers` library -``` +```python import outlines import transformers @@ -117,8 +118,9 @@ model = outlines.models.transformers( ) ``` -Further Reading: -- https://huggingface.co/docs/transformers/en/model_doc/mamba + + +Read [`transformers`'s documentation](https://huggingface.co/docs/transformers/en/model_doc/mamba) for more information. ### Encoder-Decoder Models @@ -144,8 +146,3 @@ model_bart = models.transformers( model_class=AutoModelForSeq2SeqLM, ) ``` - - -### Multi-Modal Models - -/Coming soon/ diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css index 4078215af..c4539ab80 100644 --- a/docs/stylesheets/extra.css +++ b/docs/stylesheets/extra.css @@ -96,6 +96,14 @@ background: #FFFFFF ! important } +.language-text { + background: #FFFFFF ! important +} + +.language-json { + background: #FFFFFF ! important +} + h1.title { color: #FFFFFF; margin: 0px 0px 5px; diff --git a/docs/welcome.md b/docs/welcome.md index 4c327c020..a7800f7ad 100644 --- a/docs/welcome.md +++ b/docs/welcome.md @@ -6,7 +6,7 @@ Outlines〰 is a Python library that allows you to use Large Language Model in a ## What models do you support? -We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/llamacpp.md), [exllama2](reference/models/exllamav2.md) and [mamba_ssm](reference/models/mamba.md) libraries. If you want to build and maintain an integration with another library, [get in touch][discord]. +We support [Openai](reference/models/openai.md), but the true power of Outlines〰 is unleashed with Open Source models available via the [transformers](reference/models/transformers.md), [llama.cpp](reference/models/llamacpp.md), [exllama2](reference/models/exllamav2.md), [mlx-lm](reference/models/mlxlm.md) and [vllm](reference/models/vllm.md) models. If you want to build and maintain an integration with another library, [get in touch][discord]. ## What are the main features? @@ -17,7 +17,7 @@ We support [Openai](reference/models/openai.md), but the true power of Outlines No more invalid JSON outputs, 100% guaranteed - [:octicons-arrow-right-24: Generate JSON](reference/json.md) + [:octicons-arrow-right-24: Generate JSON](reference/generation/json.md) - :material-keyboard-outline:{ .lg .middle } __JSON mode for vLLM__ @@ -34,7 +34,7 @@ We support [Openai](reference/models/openai.md), but the true power of Outlines Generate text that parses correctly 100% of the time - [:octicons-arrow-right-24: Guide LLMs](reference/regex.md) + [:octicons-arrow-right-24: Guide LLMs](reference/generation/regex.md) - :material-chat-processing-outline:{ .lg .middle } __Powerful Prompt Templating__ diff --git a/mkdocs.yml b/mkdocs.yml index d24ca9a63..afc56528b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -121,24 +121,24 @@ nav: - Docs: - reference/index.md - Generation: - - Generation Overview: reference/generation/generation.md + - Overview: reference/generation/generation.md - Text: reference/text.md - Samplers: reference/samplers.md - Structured generation: + - How does it work?: reference/generation/structured_generation_explanation.md - Classification: reference/generation/choices.md - Regex: reference/generation/regex.md - Type constraints: reference/generation/format.md - JSON (function calling): reference/generation/json.md - Grammar: reference/generation/cfg.md - Custom FSM operations: reference/generation/custom_fsm_ops.md - - Structured Generation Technical Explanation: reference/generation/structured_generation_explanation.md - Utilities: - Serve with vLLM: reference/serve/vllm.md - - Custom types: reference/types.md + - Custom types: reference/generation/types.md - Prompt templating: reference/prompting.md - Outlines functions: reference/functions.md - Models: - - Models Overview: reference/models/models.md + - Overview: reference/models/models.md - Open source: - Transformers: reference/models/transformers.md - Transformers Vision: reference/models/transformers_vision.md