diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 10879c78..30d5915c 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -35,7 +35,7 @@ jobs:
         pip install .[test]
     - name: Run tests
       run: |
-        pytest --cov=outlines
+        pytest --cov=outlines-core
     - name: Upload coverage data
       uses: actions/upload-artifact@v3
       with:
diff --git a/Dockerfile b/Dockerfile
deleted file mode 100644
index c6e5f067..00000000
--- a/Dockerfile
+++ /dev/null
@@ -1,17 +0,0 @@
-FROM python:3.10
-
-WORKDIR /outlines
-
-RUN pip install --upgrade pip
-
-# Copy necessary build components
-COPY pyproject.toml .
-COPY outlines ./outlines
-
-# Install outlines and outlines[serve]
-# .git required by setuptools-scm
-RUN --mount=source=.git,target=.git,type=bind \
-    pip install --no-cache-dir .[serve]
-
-# https://outlines-dev.github.io/outlines/reference/vllm/
-ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]
diff --git a/README.md b/README.md
index 364e7724..769c87c1 100644
--- a/README.md
+++ b/README.md
@@ -1,367 +1 @@
-<div align="center" style="margin-bottom: 1em;">
-
-# Outlines 〰️
-
-<img src="./docs/assets/images/logo.png" alt="Outlines Logo" width=300></img>
-
-[![.txt Twitter][dottxt-twitter-badge]][dottxt-twitter]
-[![Outlines Twitter][outlines-twitter-badge]][outlines-twitter]
-
-[![Contributors][contributors-badge]][contributors]
-[![Downloads][downloads-badge]][pypistats]
-[![Discord][discord-badge]][discord]
-
-
-*Robust (structured) text generation.*
-
-Made with ❤👷️ by the team at [.txt](https://dottxt.co).
-
-</div>
-
-
-``` bash
-pip install outlines
-```
-
-First time here? Go to our [setup guide](https://outlines-dev.github.io/outlines/welcome)
-
-## Features
-
-- [x] 🤖 [Multiple model integrations](https://outlines-dev.github.io/outlines/installation): OpenAI, transformers, llama.cpp, exllama2, mamba
-- [x] 🖍️ Simple and powerful prompting primitives based on the [Jinja templating engine](https://jinja.palletsprojects.com/)
-- [x] 🚄 [Multiple choices](#multiple-choices), [type constraints](#type-constraint) and dynamic stopping
-- [x] ⚡ Fast [regex-structured generation](#efficient-regex-structured-generation)
-- [x] 🔥 Fast [JSON generation](#efficient-json-generation-following-a-pydantic-model) following a JSON schema or a Pydantic model
-- [x] 📝 [Grammar-structured generation](#using-context-free-grammars-to-guide-generation)
-- [x] 🐍 Interleave completions with loops, conditionals, and custom Python functions
-- [x] 💾 Caching of generations
-- [x] 🗂️ Batch inference
-- [x] 🎲 Sample with the greedy, multinomial and beam search algorithms (and more to come!)
-- [x] 🚀 [Serve with vLLM](https://outlines-dev.github.io/outlines/reference/serve/vllm), with official Docker image, [`outlinesdev/outlines`](https://hub.docker.com/r/outlinesdev/outlines)!
-
-
-Outlines 〰 has new releases and features coming every week. Make sure to ⭐ star and 👀 watch this repository, follow [@dottxtai][dottxt-twitter] to stay up to date!
-
-## Why should I use structured generation?
-
-* It doesn't add any overhead during inference (cost-free)
-* It allows Open Source models to beat closed source models ([Mistral](https://x.com/dottxtai/status/1797692104023363765), [GPT-4](https://x.com/dottxtai/status/1798443290913853770))
-* [It speeds up inference](http://blog.dottxt.co/coalescence.html)
-* [It improves the performance of base models (GSM8K)](http://blog.dottxt.co/performance-gsm8k.html)
-* [It improves the performance of finetuned models (CoNNL)](https://predibase.com/blog/lorax-outlines-better-json-extraction-with-structured-generation-and-lora)
-* [It improves model efficiency (less examples needed)](https://huggingface.co/blog/evaluation-structured-outputs)
-
-## .txt company
-
-<div align="center">
-<img src="./docs/assets/images/dottxt.png" alt="Outlines Logo" width=100></img>
-</div>
-
-We started a company to keep pushing the boundaries of structured generation. Learn more about [.txt](https://twitter.com/dottxtai), and  [give our .json API a try](https://h1xbpbfsf0w.typeform.com/to/ZgBCvJHF) if you need a hosted solution ✨
-
-## Structured generation
-
-The first step towards reliability of systems that include large language models
-is to ensure that there is a well-defined interface between their output and
-user-defined code. **Outlines** provides ways to control the generation of
-language models to make their output more predictable.
-
-### Multiple choices
-
-You can reduce the completion to a choice between multiple possibilities:
-
-``` python
-import outlines
-
-model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
-
-prompt = """You are a sentiment-labelling assistant.
-Is the following review positive or negative?
-
-Review: This restaurant is just awesome!
-"""
-
-generator = outlines.generate.choice(model, ["Positive", "Negative"])
-answer = generator(prompt)
-```
-
-### Type constraint
-
-You can instruct the model to only return integers or floats:
-
-
-``` python
-import outlines
-
-model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
-
-prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
-answer = outlines.generate.format(model, int)(prompt)
-print(answer)
-# 3
-
-prompt = "sqrt(2)="
-generator = outlines.generate.format(model, float)
-answer = generator(prompt, max_tokens=10)
-print(answer)
-# 1.41421356
-```
-
-### Efficient regex-structured generation
-
-Outlines also comes with fast regex-structured generation. In fact, the `choice` and
-`format` functions above all use regex-structured generation under the
-hood:
-
-``` python
-import outlines
-
-model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
-
-prompt = "What is the IP address of the Google DNS servers? "
-
-generator = outlines.generate.text(model)
-unstructured = generator(prompt, max_tokens=30)
-
-generator = outlines.generate.regex(
-    model,
-    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
-)
-structured = generator(prompt, max_tokens=30)
-
-print(unstructured)
-# What is the IP address of the Google DNS servers?
-#
-# Passive DNS servers are at DNS servers that are private.
-# In other words, both IP servers are private. The database
-# does not contain Chelsea Manning
-
-print(structured)
-# What is the IP address of the Google DNS servers?
-# 2.2.6.1
-```
-
-Unlike other libraries, regex-structured generation in Outlines is almost as fast
-as non-structured generation.
-
-### Efficient JSON generation following a Pydantic model
-
-Outlines 〰 allows to guide the generation process so the output is *guaranteed* to follow a [JSON schema](https://json-schema.org/) or [Pydantic model](https://docs.pydantic.dev/latest/):
-
-```python
-from enum import Enum
-from pydantic import BaseModel, constr
-
-import outlines
-import torch
-
-
-class Weapon(str, Enum):
-    sword = "sword"
-    axe = "axe"
-    mace = "mace"
-    spear = "spear"
-    bow = "bow"
-    crossbow = "crossbow"
-
-
-class Armor(str, Enum):
-    leather = "leather"
-    chainmail = "chainmail"
-    plate = "plate"
-
-
-class Character(BaseModel):
-    name: constr(max_length=10)
-    age: int
-    armor: Armor
-    weapon: Weapon
-    strength: int
-
-
-model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
-
-# Construct structured sequence generator
-generator = outlines.generate.json(model, Character)
-
-# Draw a sample
-seed = 789001
-
-character = generator("Give me a character description", seed=seed)
-
-print(repr(character))
-# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)
-
-character = generator("Give me an interesting character description", rng=rng)
-
-print(repr(character))
-# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125)
-```
-
-The method works with union types, optional types, arrays, nested schemas, etc. Some field constraints are [not supported yet](https://github.com/outlines-dev/outlines/issues/215), but everything else should work.
-
-### Efficient JSON generation following a JSON Schema
-
-Sometimes you just want to be able to pass a JSON Schema instead of a Pydantic model. We've got you covered:
-
-``` python
-import outlines
-
-schema = '''{
-    "title": "Character",
-    "type": "object",
-    "properties": {
-        "name": {
-            "title": "Name",
-            "maxLength": 10,
-            "type": "string"
-        },
-        "age": {
-            "title": "Age",
-            "type": "integer"
-        },
-        "armor": {"$ref": "#/definitions/Armor"},
-        "weapon": {"$ref": "#/definitions/Weapon"},
-        "strength": {
-            "title": "Strength",
-            "type": "integer"
-        }
-    },
-    "required": ["name", "age", "armor", "weapon", "strength"],
-    "definitions": {
-        "Armor": {
-            "title": "Armor",
-            "description": "An enumeration.",
-            "enum": ["leather", "chainmail", "plate"],
-            "type": "string"
-        },
-        "Weapon": {
-            "title": "Weapon",
-            "description": "An enumeration.",
-            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
-            "type": "string"
-        }
-    }
-}'''
-
-model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
-generator = outlines.generate.json(model, schema)
-character = generator("Give me a character description")
-```
-
-### Using context-free grammars to guide generation
-
-Formal grammars rule the world, and Outlines makes them rule LLMs too. You can pass any context-free grammar in the EBNF format and Outlines will generate an output that is valid to this grammar:
-
-``` python
-import outlines
-
-arithmetic_grammar = """
-    ?start: expression
-
-    ?expression: term (("+" | "-") term)*
-
-    ?term: factor (("*" | "/") factor)*
-
-    ?factor: NUMBER
-           | "-" factor
-           | "(" expression ")"
-
-    %import common.NUMBER
-"""
-
-model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
-generator = outlines.generate.cfg(model, arithmetic_grammar)
-sequence = generator("Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:")
-
-print(sequence)
-# (8-2)
-```
-
-This was a very simple grammar, and you can use `outlines.generate.cfg` to generate syntactically valid Python, SQL, and much more than this. Any kind of structured text, really. All you have to do is search for "X EBNF grammar" on the web, and take a look at the [Outlines `grammars` module](https://github.com/outlines-dev/outlines/tree/main/outlines/grammars).
-
-### Open functions
-
-Outlines can infer the structure of the output from the signature of a function. The result is a dictionary, and can be passed directly to the function using the usual dictionary expansion syntax `**`:
-
-```python
-import outlines
-
-
-def add(a: int, b: int):
-    return a + b
-
-model = outlines.models.transformers("WizardLM/WizardMath-7B-V1.1")
-generator = outlines.generate.json(model, add)
-result = generator("Return json with two integers named a and b respectively. a is odd and b even.")
-
-print(add(**result))
-# 3
-```
-
-A great advantage of passing functions directly to specify the structure is that the structure of the LLM will change with the function's definition. No need to change the code at several places!
-
-## Prompting
-
-Building prompts can get messy. **Outlines** makes it easier to write and manage
-prompts by encapsulating templates inside "template functions".
-
-These functions make it possible to neatly separate the prompt logic from the
-general program logic; they can be imported from other modules and libraries.
-
-Template functions require no superfluous abstraction, they use the Jinja2
-templating engine to help build complex prompts in a concise manner:
-
-``` python
-import outlines
-
-examples = [
-    ("The food was disgusting", "Negative"),
-    ("We had a fantastic night", "Positive"),
-    ("Recommended", "Positive"),
-    ("The waiter was rude", "Negative")
-]
-
-@outlines.prompt
-def labelling(to_label, examples):
-    """You are a sentiment-labelling assistant.
-
-    {% for example in examples %}
-    {{ example[0] }} // {{ example[1] }}
-    {% endfor %}
-    {{ to_label }} //
-    """
-
-model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
-prompt = labelling("Just awesome", examples)
-answer = outlines.generate.text(model)(prompt, max_tokens=100)
-```
-
-## Join us
-
-- 💡 **Have an idea?** Come chat with us on [Discord][discord]
-- 🔨 **Want to contribute?** Consult our [contribution guide](https://outlines-dev.github.io/outlines/community/contribute/).
-- 🐞 **Found a bug?** Open an [issue](https://github.com/outlines-dev/outlines/issues)
-
-
-## Cite Outlines
-
-```
-@article{willard2023efficient,
-  title={Efficient Guided Generation for LLMs},
-  author={Willard, Brandon T and Louf, R{\'e}mi},
-  journal={arXiv preprint arXiv:2307.09702},
-  year={2023}
-}
-```
-
-[contributors]: https://github.com/outlines-dev/outlines/graphs/contributors
-[contributors-badge]: https://img.shields.io/github/contributors/outlines-dev/outlines?style=flat-square&logo=github&logoColor=white&color=ECEFF4
-[dottxt-twitter]: https://twitter.com/dottxtai
-[outlines-twitter]: https://twitter.com/OutlinesOSS
-[discord]: https://discord.gg/R9DSu34mGd
-[discord-badge]: https://img.shields.io/discord/1182316225284554793?color=81A1C1&logo=discord&logoColor=white&style=flat-square
-[downloads-badge]: https://img.shields.io/pypi/dm/outlines?color=89AC6B&logo=python&logoColor=white&style=flat-square
-[pypistats]: https://pypistats.org/packages/outlines
-[dottxt-twitter-badge]: https://img.shields.io/twitter/follow/dottxtai?style=social
-[outlines-twitter-badge]: https://img.shields.io/twitter/follow/OutlinesOSS?style=social
+# Outlines-core
diff --git a/benchmarks/asv.conf.json b/benchmarks/asv.conf.json
index f57db9a0..3dc3f620 100644
--- a/benchmarks/asv.conf.json
+++ b/benchmarks/asv.conf.json
@@ -1,7 +1,7 @@
 {
     "version": 1,
-    "project": "Outlines",
-    "project_url": "https://outlines-dev.github.io/outlines/",
+    "project": "Outlines-core",
+    "project_url": "https://outlines-dev.github.io/outlines-core/",
     "repo": "..",
     "branches": [
 	"HEAD"
@@ -11,7 +11,7 @@
         "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}",
     ],
     "environment_type": "virtualenv",
-    "show_commit_url": "https://github.com/outlines-dev/outlines/commit/",
+    "show_commit_url": "https://github.com/outlines-dev/outlines-core/commit/",
     "benchmark_dir": ".",
     "env_dir": "env",
     "results_dir": "results",
diff --git a/benchmarks/bench_json_schema.py b/benchmarks/bench_json_schema.py
index 8d1ceeb2..c85982c7 100644
--- a/benchmarks/bench_json_schema.py
+++ b/benchmarks/bench_json_schema.py
@@ -1,6 +1,6 @@
-from outlines.caching import cache_disabled
-from outlines.fsm.guide import RegexGuide
-from outlines.fsm.json_schema import build_regex_from_schema
+from outlines_core.caching import cache_disabled
+from outlines_core.fsm.guide import RegexGuide
+from outlines_core.fsm.json_schema import build_regex_from_schema
 
 from .common import ensure_numba_compiled, setup_tokenizer  # noqa: E402
 
diff --git a/benchmarks/bench_numba_compile.py b/benchmarks/bench_numba_compile.py
index 2713707e..35edc953 100644
--- a/benchmarks/bench_numba_compile.py
+++ b/benchmarks/bench_numba_compile.py
@@ -3,8 +3,8 @@
 import interegular
 import numba
 
-from outlines.caching import cache_disabled
-from outlines.fsm import regex
+from outlines_core.caching import cache_disabled
+from outlines_core.fsm import regex
 
 from .common import setup_tokenizer
 
diff --git a/benchmarks/bench_regex_guide.py b/benchmarks/bench_regex_guide.py
index 099f94df..eeb1f983 100644
--- a/benchmarks/bench_regex_guide.py
+++ b/benchmarks/bench_regex_guide.py
@@ -1,5 +1,5 @@
-from outlines.caching import cache_disabled
-from outlines.fsm.guide import RegexGuide
+from outlines_core.caching import cache_disabled
+from outlines_core.fsm.guide import RegexGuide
 
 from .common import ensure_numba_compiled, setup_tokenizer
 
diff --git a/benchmarks/common.py b/benchmarks/common.py
index 7d999ea9..db25593d 100644
--- a/benchmarks/common.py
+++ b/benchmarks/common.py
@@ -1,7 +1,7 @@
 from transformers import AutoTokenizer
 
-from outlines.fsm.guide import RegexGuide
-from outlines.models.transformers import TransformerTokenizer
+from outlines_core.fsm.guide import RegexGuide
+from outlines_core.models.transformers import TransformerTokenizer
 
 
 def setup_tokenizer():
diff --git a/outlines/fsm/parsing.py b/outlines/fsm/parsing.py
deleted file mode 100644
index e4fa7b76..00000000
--- a/outlines/fsm/parsing.py
+++ /dev/null
@@ -1,870 +0,0 @@
-from copy import copy, deepcopy
-from dataclasses import dataclass
-from functools import lru_cache
-from typing import Any, Dict, FrozenSet, Iterator, Optional, Set, Tuple, Union
-
-import interegular
-from interegular.fsm import FSM
-from interegular.patterns import Unsupported
-from lark import Lark, Token
-from lark.common import LexerConf, ParserConf
-from lark.exceptions import LexError, UnexpectedInput
-from lark.indenter import Indenter
-from lark.lexer import (
-    BasicLexer,
-    ContextualLexer,
-    LexerState,
-    LexerThread,
-    Scanner,
-    UnexpectedCharacters,
-    UnexpectedToken,
-    _create_unless,
-)
-from lark.parser_frontends import (
-    ParsingFrontend,
-    PostLexConnector,
-    _validate_frontend_args,
-)
-from lark.parsers.lalr_analysis import (
-    Action,
-    IntParseTable,
-    LALR_Analyzer,
-    ParseTable,
-    Shift,
-)
-from lark.parsers.lalr_interactive_parser import InteractiveParser
-from lark.parsers.lalr_parser import LALR_Parser, ParseConf, ParserState, _Parser
-
-from outlines.fsm.regex import (
-    fsm_union,
-    get_sub_fsms_from_seq,
-    get_token_transition_keys,
-    make_deterministic_fsm,
-    walk_fsm,
-)
-
-PartialParseState = Tuple[str, int]
-ParseStateType = Union[int, FrozenSet]
-
-
-@dataclass
-class PartialTerminalInfo:
-    priority: int
-    terminal_name: str
-    can_transition: bool
-    is_final: bool
-
-
-@dataclass
-class PartialTokensInfo:
-    fsm_state_seq: Tuple[int, ...]
-    is_not_finished: bool
-    terminals_and_info: Tuple[PartialTerminalInfo, ...]
-    final_terminals_and_info: Tuple[PartialTerminalInfo, ...]
-
-
-class PartialParserConf(ParserConf):
-    __serialize_fields__ = (
-        "rules",
-        "start",
-        "parser_type",
-        "deterministic",
-        "use_value_stack",
-    )
-
-    def __init__(self, rules, callbacks, start, deterministic, use_value_stack):
-        super().__init__(rules, callbacks, start)
-        self.deterministic = deterministic
-        self.use_value_stack = use_value_stack
-
-
-class PartialLark(Lark):
-    __serialize_fields__ = (
-        "parser",
-        "rules",
-        "options",
-        "deterministic",
-        "use_value_stack",
-    )
-
-    def __init__(self, grammar, **options):
-        # TODO: Could've extended `LarkOptions`, but all these extensions are
-        # already way too much (and brittle).  This library really needs a
-        # complete refactoring.
-        self.deterministic = options.pop("deterministic", False)
-        self.use_value_stack = options.pop("use_value_stack", False)
-        options["regex"] = True
-        super().__init__(grammar, **options)
-        assert self.options.parser == "lalr"
-
-    def _build_lexer(self, dont_ignore: bool = False) -> "PartialBasicLexer":
-        lexer_conf = self.lexer_conf
-        if dont_ignore:
-            from copy import copy
-
-            lexer_conf = copy(lexer_conf)
-            lexer_conf.ignore = ()
-
-        return PartialBasicLexer(lexer_conf)
-
-    def _build_parser(self) -> "PartialParsingFrontend":
-        self._prepare_callbacks()
-        _validate_frontend_args(self.options.parser, self.options.lexer)
-        parser_conf = PartialParserConf(
-            self.rules,
-            self._callbacks,
-            self.options.start,
-            self.deterministic,
-            self.use_value_stack,
-        )
-
-        # This is `_construct_parsing_frontend` expanded/inlined
-        parser_type = self.options.parser
-        lexer_type = self.options.lexer
-        lexer_conf = self.lexer_conf
-
-        assert isinstance(lexer_conf, LexerConf)
-        assert isinstance(parser_conf, ParserConf)
-        parser_conf.parser_type = parser_type
-        self.lexer_conf.lexer_type = lexer_type
-        return PartialParsingFrontend(lexer_conf, parser_conf, self.options)
-
-    def __repr__(self):
-        return "{}(open({!r}), parser={!r}, lexer={!r}, ...)".format(
-            type(self).__name__,
-            self.source_path,
-            self.options.parser,
-            self.options.lexer,
-        )
-
-    def parse_from_state(self, parse_state: "PartialParseState", is_end=False):
-        return self.parser.parser.parser.parse_from_state(parse_state, is_end=is_end)
-
-
-class PartialLexerThread(LexerThread):
-    def __copy__(self):
-        return type(self)(copy(self.lexer), copy(self.state))
-
-    def __repr__(self):
-        return f"{type(self).__name__}(lexer={self.lexer!r}, state={self.state!r})"
-
-
-class PartialPostLexConnector(PostLexConnector):
-    def __copy__(self):
-        return type(self)(self.lexer, copy(self.postlexer))
-
-    def __repr__(self):
-        return (
-            f"{type(self).__name__}(lexer={self.lexer!r}, postlexer={self.postlexer!r})"
-        )
-
-
-class PartialParsingFrontend(ParsingFrontend):
-    def __init__(self, lexer_conf, parser_conf, options, parser=None):
-        assert parser_conf.parser_type == "lalr"
-
-        options._plugins["LALR_Parser"] = PartialLALRParser
-        options._plugins["BasicLexer"] = PartialBasicLexer
-        options._plugins["ContextualLexer"] = PartialContextualLexer
-        options._plugins["LexerThread"] = PartialLexerThread
-
-        super().__init__(lexer_conf, parser_conf, options, parser=parser)
-
-        if lexer_conf.postlex:
-            self.lexer = PartialPostLexConnector(self.lexer.lexer, lexer_conf.postlex)
-
-        self._termset_fsm_info = None
-        self._symbols_to_states: Optional[
-            Dict[str, Set[Tuple[ParseStateType, Action]]]
-        ] = None
-        self._reverse_shifts: Optional[
-            Dict[ParseStateType, Dict[str, Set[ParseStateType]]]
-        ] = None
-        # self._state_transition_map: Optional[
-        #     Dict[Tuple[ParseStateType, str], Set[ParseStateType]]
-        # ] = None
-
-    def _compute_maps(
-        self,
-    ):
-        """Compute state transition and symbols-to-states maps."""
-        self._reverse_shifts = {}
-        self._symbols_to_states = {}
-
-        parse_table = self.parser.parser.parse_table
-
-        for from_state, symbols_to_ops in parse_table.states.items():
-            for symbol, op in symbols_to_ops.items():
-                if op[0] == Shift:
-                    symbols_to_from_states = self._reverse_shifts.setdefault(op[1], {})
-                    symbols_to_from_states.setdefault(symbol, set()).add(from_state)
-                self._symbols_to_states.setdefault(symbol, set()).add((from_state, op))
-
-        # # TODO: This approach is very wasteful.
-        # context_lexer = get_contextual_lexer(self)
-        # self._state_transition_map = {}
-        #
-        # for from_state, transitions in parse_table.states.items():
-        #     for symbol, action in transitions.items():
-        #         # TODO: Filter non-terminals
-        #         if symbol not in context_lexer.root_lexer.terminals_by_name:
-        #             continue
-        #
-        #         if action[0] is Shift:
-        #             self._state_transition_map.setdefault(
-        #                 (from_state, symbol), set()
-        #             ).add(action[1])
-        #             continue
-        #
-        #         antecedent_state_seqs = parse_to_terminal(self, [(from_state,)], symbol)
-        #
-        #         for antecedent_state_seq in antecedent_state_seqs:
-        #             antecedent_state = antecedent_state_seq[-1]
-        #             self._state_transition_map.setdefault(
-        #                 (from_state, symbol), set()
-        #             ).add(antecedent_state)
-
-    def _compute_termset_fsm_info(self):
-        """Collect and return information about terminal symbol sets and their FSMs.
-
-        Terminal symbol sets (or "termsets") are ordered sequences of terminal
-        symbols that are used by each parser state.  Associated with each is a
-        collection of FSMs for each terminal and a single parse state FSM that is
-        the union of each terminal's FSM.
-
-        This constructs a list of tuples containing the termset, the set of
-        parse states that use the termsets, parse state FSMs, and information
-        mapping the components of the parse state FSMs to their terminal symbol
-        FSMs.
-
-        """
-        context_lexer = get_contextual_lexer(self)
-        termsets_to_fsms = {}
-        termsets_to_parse_states: Dict[Tuple[str, ...], Set[ParseStateType]] = {}
-        for parse_state, lexer in context_lexer.lexers.items():
-            scanner = lexer.scanner
-            key = tuple(term.name for term in scanner.terminals)
-            termsets_to_fsms[key] = (scanner.fsm, scanner.fsms_to_trans_finals)
-            termsets_to_parse_states.setdefault(key, set()).add(parse_state)
-
-        self._termset_fsm_info = [
-            (
-                termset,
-                frozenset(termsets_to_parse_states[termset]),
-                fsm,
-                fsms_to_trans_finals,
-            )
-            for termset, (fsm, fsms_to_trans_finals) in termsets_to_fsms.items()
-        ]
-
-    @property
-    def termset_fsm_info(self):
-        if self._termset_fsm_info is None:
-            self._compute_termset_fsm_info()
-        return self._termset_fsm_info
-
-    @property
-    def symbols_to_states(self):
-        if self._symbols_to_states is None:
-            self._compute_maps()
-        return self._symbols_to_states
-
-    @property
-    def reverse_shifts(self):
-        if self._reverse_shifts is None:
-            self._compute_maps()
-        return self._reverse_shifts
-
-    # @property
-    # def state_transition_map(self):
-    #     if self._state_transition_map is None:
-    #         self._compute_maps()
-    #     return self._state_transition_map
-
-
-class PartialLALRParser(LALR_Parser):
-    def __init__(self, parser_conf, debug=False, strict=False):
-        analysis = LALR_Analyzer(
-            parser_conf, debug=debug if not parser_conf.deterministic else True
-        )
-        analysis.compute_lalr()
-        callbacks = parser_conf.callbacks
-
-        self.parser_conf = parser_conf
-        self._parse_table = analysis.parse_table
-
-        if parser_conf.deterministic:
-            old_to_new = {}
-
-            def to_tuple(v):
-                new = old_to_new.get(v)
-                if new is None:
-                    new = tuple(sorted(v, key=lambda y: str(y)))
-                    old_to_new[v] = new
-                return new
-
-            enum = sorted(
-                self._parse_table.states.keys(),
-                key=lambda x: str(sorted(x, key=lambda y: str(y))),
-            )
-
-            new_states = {}
-            for s in enum:
-                transitions = {
-                    term: op if op[0] is not Shift else (op[0], to_tuple(op[1]))
-                    for term, op in self._parse_table.states[s].items()
-                }
-                new_states[to_tuple(s)] = transitions
-
-            self._parse_table = type(self._parse_table)(
-                new_states,
-                {k: to_tuple(v) for k, v in self._parse_table.start_states.items()},
-                {k: to_tuple(v) for k, v in self._parse_table.end_states.items()},
-            )
-
-            if not debug:
-                self._parse_table = IntParseTable.from_ParseTable(self._parse_table)
-                self.states_to_rulesets = dict(
-                    zip(self._parse_table.states.keys(), new_states.keys())
-                )
-
-        self.parser = PartialParser(
-            self._parse_table,
-            callbacks,
-            debug,
-            use_value_stack=parser_conf.use_value_stack,
-        )
-
-    @classmethod
-    def deserialize(cls, data, memo, callbacks, debug=False):
-        inst = cls.__new__(cls)
-        inst._parse_table = ParseTable.deserialize(data, memo)
-        inst.parser = PartialParser(inst._parse_table, callbacks, debug)
-        return inst
-
-
-class PartialParserState(ParserState):
-    __slots__ = "use_value_stack"
-
-    def __init__(
-        self,
-        parse_conf,
-        lexer,
-        state_stack=None,
-        value_stack=None,
-        use_value_stack=False,
-    ):
-        super().__init__(
-            parse_conf, lexer, state_stack=state_stack, value_stack=value_stack
-        )
-        self.use_value_stack = use_value_stack
-
-    def feed_token(self, token, is_end=False):
-        if token.type == "partial":
-            # If none of the potential terminals can transition, we need to know now
-            current_state = self.state_stack[-1]
-            current_lexer = get_contextual_lexer(self.lexer).lexers[current_state]
-
-            # We have to feed the token and determine whether or not at least
-            # one terminal is consistent with the stack; otherwise, we'll miss
-            # invalid REDUCE cases.
-            # TODO: We should track separate parses conditional on possible
-            # token/symbol types, then we can coherently reuse the following
-            # results instead of recomputing it later.
-            can_transition = False
-            for terminal_info in token.value.terminals_and_info:
-                if terminal_info.terminal_name not in current_lexer.ignore_types:
-                    test_token = Token.new_borrow_pos(
-                        terminal_info.terminal_name, "", token
-                    )
-
-                    stack = copy(self.state_stack)
-                    try:
-                        self.feed_token_no_stack(test_token, is_end=is_end)
-                        can_transition = True
-                        break
-                    except UnexpectedToken:
-                        continue
-                    finally:
-                        self.state_stack = stack
-                else:
-                    can_transition = True
-
-            if not can_transition:
-                expected = {
-                    s
-                    for s in self.parse_conf.states[current_state].keys()
-                    if s.isupper()
-                }
-                raise UnexpectedToken(
-                    token, expected, state=self, interactive_parser=None
-                )
-
-        elif self.use_value_stack:
-            super().feed_token(token, is_end=is_end)
-        else:
-            self.feed_token_no_stack(token, is_end=is_end)
-
-    def feed_token_no_stack(self, token, is_end=False):
-        """
-        This is a copy of `ParserState.feed_token` with all the value stack
-        steps removed.  Since we're not exactly parsing in order to obtain a
-        CST or anything similar, we can avoid the growing expense of tracking
-        the parse tree.
-        """
-        state_stack = self.state_stack
-        states = self.parse_conf.states
-        end_state = self.parse_conf.end_state
-
-        while True:
-            state = state_stack[-1]
-            try:
-                action, arg = states[state][token.type]
-            except KeyError:
-                expected = {s for s in states[state].keys() if s.isupper()}
-                raise UnexpectedToken(
-                    token, expected, state=self, interactive_parser=None
-                )
-
-            assert arg != end_state
-
-            if action is Shift:
-                # shift once and return
-                assert not is_end
-                state_stack.append(arg)
-                return
-            else:
-                # reduce+shift as many times as necessary
-                rule = arg
-                size = len(rule.expansion)
-                if size:
-                    del state_stack[-size:]
-
-                _action, new_state = states[state_stack[-1]][rule.origin.name]
-                assert _action is Shift
-                state_stack.append(new_state)
-
-                if is_end and state_stack[-1] == end_state:
-                    return
-
-    def __copy__(self):
-        return type(self)(
-            self.parse_conf,
-            copy(self.lexer),
-            copy(self.state_stack),
-            deepcopy(self.value_stack),
-            use_value_stack=self.use_value_stack,
-        )
-
-    def __repr__(self):
-        return f"{type(self).__name__}(lexer={self.lexer!r}, state_stack={self.state_stack!r})"
-
-
-class PartialParser(_Parser):
-    def __init__(self, parse_table, callbacks, debug=False, use_value_stack=False):
-        super().__init__(parse_table, callbacks, debug=debug)
-        self.use_value_stack = use_value_stack
-
-    def parse(
-        self, lexer, start, value_stack=None, state_stack=None, start_interactive=False
-    ):
-        parse_conf = ParseConf(self.parse_table, self.callbacks, start)
-        parser_state = PartialParserState(
-            parse_conf, copy(lexer), state_stack, value_stack, self.use_value_stack
-        )
-        if start_interactive:
-            return InteractiveParser(self, parser_state, parser_state.lexer)
-        return self.parse_from_state(parser_state)
-
-    def parse_from_state(self, state, last_token=None, is_end=False):
-        try:
-            token = last_token
-            for token in state.lexer.lex(state):
-                state.feed_token(token)
-
-            if is_end and (not token or token.type != "partial"):
-                end_token = (
-                    Token.new_borrow_pos("$END", "", token)
-                    if token
-                    else Token("$END", "", 0, 1, 1)
-                )
-                state.feed_token(end_token, True)
-
-            return state
-        except UnexpectedInput as e:
-            try:
-                e.interactive_parser = InteractiveParser(self, state, state.lexer)
-            except NameError:
-                pass
-            raise e
-        except Exception:
-            if self.debug:
-                print("")
-                print("STATE STACK DUMP")
-                print("----------------")
-                for i, s in enumerate(state.state_stack):
-                    print("%d)" % i, s)
-                print("")
-
-            raise
-
-
-class PartialScanner(Scanner):
-    @classmethod
-    @lru_cache
-    def construct_terminal_fsm(cls, terminal):
-        # TODO: This should really be done at the lexer/parser level so that
-        # the lifetime of these objects is tied to the parser itself.
-        regex_str = terminal.pattern.to_regexp()
-        pattern = interegular.parse_pattern(regex_str)
-        fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())
-        return fsm, pattern.prefix_postfix
-
-    def __init__(self, terminals, g_regex_flags, re_, use_bytes, match_whole=False):
-        self.terminals = terminals
-        self.g_regex_flags = g_regex_flags
-        self.use_bytes = use_bytes
-        self.match_whole = match_whole
-        self.allowed_types = {t.name for t in self.terminals}
-        self._mres = None
-
-        fsms = []
-        for t in self.terminals:
-            fsm, prefix_postfix = self.construct_terminal_fsm(t)
-
-            # TODO FIXME: We don't support this right now.
-            assert prefix_postfix == (0, 0)
-
-            fsms.append(fsm)
-
-        self.fsm, self.fsms_to_trans_finals = fsm_union(fsms)
-
-    def get_terminals_info(
-        self, fsm_state_seq
-    ) -> Tuple[Tuple[PartialTerminalInfo, ...], Tuple[PartialTerminalInfo, ...]]:
-        """Get the possible terminal symbols for an FSM state sequence."""
-        terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()
-        final_terminals_and_info: Tuple[PartialTerminalInfo, ...] = ()
-        for i, (fsm_id, fsm_reads_more, in_final) in enumerate(
-            get_sub_fsms_from_seq(fsm_state_seq, self.fsms_to_trans_finals)
-        ):
-            terminal_name = self.terminals[fsm_id].name
-            info = PartialTerminalInfo(i, terminal_name, fsm_reads_more, in_final)
-            terminals_and_info += (info,)
-            if in_final:
-                final_terminals_and_info += (info,)
-
-        return terminals_and_info, final_terminals_and_info
-
-    def match(self, text, pos, last_fsm_state_seq: Optional[Tuple[int, ...]] = None):
-        """Determine an FSM match over `text` starting at `pos` and continuing `last_fsm_state_seq`."""
-
-        start_pos = pos
-
-        if last_fsm_state_seq:
-            assert len(last_fsm_state_seq) > 1
-            start_pos += len(last_fsm_state_seq) - 1
-            start_state = last_fsm_state_seq[-1]
-        else:
-            start_state = self.fsm.initial
-
-        text_part = text[start_pos:]
-
-        text_transitions = get_token_transition_keys(
-            self.fsm.fsm_info.alphabet_symbol_mapping,
-            self.fsm.fsm_info.alphabet_anything_value,
-            text_part,
-        )
-
-        state_seq = walk_fsm(
-            self.fsm,
-            text_transitions,
-            start_state,
-            full_match=self.match_whole,
-        )
-
-        if not state_seq:
-            return None
-
-        if last_fsm_state_seq:
-            res = last_fsm_state_seq + tuple(state_seq)
-        else:
-            res = (start_state,) + tuple(state_seq)
-
-        return res
-
-
-class PartialContextualLexer(ContextualLexer):
-    def __init__(self, conf: "LexerConf", states, always_accept=()):
-        terminals = list(conf.terminals)
-        terminals_by_name = conf.terminals_by_name
-
-        trad_conf = copy(conf)
-        trad_conf.terminals = terminals
-
-        lexer_by_symbols: Dict = {}
-        self.lexers = {}
-        for state, accepts in states.items():
-            key = frozenset(accepts)
-            try:
-                lexer = lexer_by_symbols[key]
-            except KeyError:
-                accepts = set(accepts) | set(conf.ignore) | set(always_accept)
-                lexer_conf = copy(trad_conf)
-                lexer_conf.terminals = [
-                    terminals_by_name[n] for n in accepts if n in terminals_by_name
-                ]
-                lexer = PartialBasicLexer(lexer_conf)
-                lexer_by_symbols[key] = lexer
-
-            self.lexers[state] = lexer
-
-        assert trad_conf.terminals is terminals
-        self.root_lexer = PartialBasicLexer(trad_conf)
-
-    def lex(self, lexer_state: LexerState, parser_state: Any) -> Iterator[Token]:
-        try:
-            while True:
-                lexer = self.lexers[parser_state.position]
-                yield lexer.next_token(lexer_state, parser_state)
-        except EOFError:
-            pass
-
-
-class PartialBasicLexer(BasicLexer):
-    def __init__(self, conf: "LexerConf"):
-        super().__init__(conf)
-        # Eagerly construct the scanner
-        self._build_scanner()
-
-    def _build_scanner(self):
-        # This seems incredibly convoluted: `lark` creates callback-triggered
-        # nested scanners for regex-defined terminals that overlap with
-        # string-defined terminals when both types of terminals have the same
-        # priority.  Unless I'm missing something important, why not simply
-        # reorder the terminals so that the string-defined ones come before the
-        # regex-defined ones?
-        terminals, self.callback = _create_unless(
-            self.terminals, self.g_regex_flags, self.re, self.use_bytes
-        )
-
-        # We can't let people arbitrarily mess with the scanning process.
-        assert not self.user_callbacks
-        # for type_, f in self.user_callbacks.items():
-        #     if type_ in self.callback:
-        #         # Already a callback there, probably UnlessCallback
-        #         self.callback[type_] = CallChain(
-        #             self.callback[type_], f, lambda t: t.type == type_
-        #         )
-        #     else:
-        #         self.callback[type_] = f
-
-        # We used the "callback" results to reorder the terminals (see the
-        # comments above).
-        for terminal_name, callback in self.callback.items():
-            terminal = self.terminals_by_name[terminal_name]
-            for sub_terminal in callback.scanner.terminals:
-                self.terminals.remove(sub_terminal)
-                idx = self.terminals.index(terminal)
-                self.terminals.insert(idx, sub_terminal)
-
-        self._scanner = PartialScanner(
-            self.terminals, self.g_regex_flags, self.re, self.use_bytes
-        )
-
-    def match(self, text, pos, last_fsm_state_seq=None):
-        return self.scanner.match(text, pos, last_fsm_state_seq)
-
-    def next_token(self, lex_state: LexerState, parser_state: Any = None) -> Token:
-        last_token = lex_state.last_token
-
-        last_fsm_state_seq = None
-        if last_token and last_token.type == "partial":
-            # Continue from last partial lexer state
-            last_fsm_state_seq = last_token.value.fsm_state_seq
-
-        line_ctr = lex_state.line_ctr
-        end_pos = line_ctr.char_pos + (
-            len(last_fsm_state_seq) - 1 if last_fsm_state_seq else 0
-        )
-        while end_pos < len(lex_state.text):
-            res = self.match(lex_state.text, line_ctr.char_pos, last_fsm_state_seq)
-
-            if not res:
-                if (
-                    not last_fsm_state_seq
-                    or last_fsm_state_seq[-1] not in self.scanner.fsm.finals
-                ):
-                    allowed = self.scanner.allowed_types - self.ignore_types
-                    if not allowed:
-                        allowed = {"<END-OF-FILE>"}
-                    raise UnexpectedCharacters(
-                        lex_state.text,
-                        line_ctr.char_pos,
-                        line_ctr.line,
-                        line_ctr.column,
-                        allowed=allowed,
-                        token_history=lex_state.last_token and [lex_state.last_token],
-                        state=parser_state,
-                        terminals_by_name=self.terminals_by_name,
-                    )
-
-                # The partial match might be complete now
-                fsm_state_seq = last_token.value.fsm_state_seq
-                terminals_and_info = last_token.value.terminals_and_info
-                final_terminals_and_info = last_token.value.final_terminals_and_info
-            else:
-                fsm_state_seq = res
-                (
-                    terminals_and_info,
-                    final_terminals_and_info,
-                ) = self.scanner.get_terminals_info(fsm_state_seq)
-
-            priority_terminal_info = (
-                final_terminals_and_info[0]
-                if final_terminals_and_info
-                else terminals_and_info[0]
-            )
-
-            is_not_finished = (
-                not priority_terminal_info.is_final
-                or priority_terminal_info.can_transition
-                or len(terminals_and_info) > 1
-            )
-
-            start_pos = line_ctr.char_pos
-            end_pos = start_pos + len(fsm_state_seq) - 1
-
-            if end_pos >= len(lex_state.text) and is_not_finished:
-                type_name = "partial"
-                token_value = PartialTokensInfo(
-                    fsm_state_seq,
-                    is_not_finished,
-                    terminals_and_info,
-                    final_terminals_and_info,
-                )
-                # Don't update the line counter states until we've finished
-                value = ""
-            else:
-                type_name = priority_terminal_info.terminal_name
-                # The token value should contain all partial scan parts in this
-                # case
-                value = token_value = lex_state.text[start_pos:end_pos]
-
-            assert isinstance(self.callback, Dict)
-
-            if type_name not in self.ignore_types:
-                t = Token(
-                    type_name,
-                    token_value,
-                    line_ctr.char_pos,
-                    line_ctr.line,
-                    line_ctr.column,
-                )
-
-                line_ctr.feed(value, type_name in self.newline_types)
-
-                t.end_line = line_ctr.line
-                t.end_column = line_ctr.column
-                t.end_pos = line_ctr.char_pos
-                if t.type in self.callback:
-                    t = self.callback[t.type](t)
-                    if not isinstance(t, Token):
-                        raise LexError(
-                            "Callbacks must return a token (returned %r)" % t
-                        )
-                lex_state.last_token = t
-                return t
-
-            if type_name in self.callback:
-                t2 = Token(
-                    type_name, value, line_ctr.char_pos, line_ctr.line, line_ctr.column
-                )
-                self.callback[type_name](t2)
-
-            line_ctr.feed(value, type_name in self.newline_types)
-
-            last_fsm_state_seq = None
-
-        raise EOFError(self)
-
-
-class PartialIndenter(Indenter):
-    """An `Indenter` that doesn't reset its state every time `process` is called."""
-
-    def process(self, stream):
-        return self._process(stream)
-
-    def _process(self, stream):
-        for token in stream:
-            # These were previously *after* the `yield`, but that makes the
-            # state tracking unnecessarily convoluted.
-            if token.type in self.OPEN_PAREN_types:
-                self.paren_level += 1
-            elif token.type in self.CLOSE_PAREN_types:
-                self.paren_level -= 1
-                if self.paren_level < 0:
-                    raise UnexpectedToken(token, [])
-
-            if token.type == self.NL_type:
-                yield from self.handle_NL(token)
-            else:
-                yield token
-
-        # TODO: What do we want to do here?
-        # while len(self.indent_level) > 1:
-        #     self.indent_level.pop()
-        #     yield Token(self.DEDENT_type, "")
-
-    def accepts_token_type(self, token_type):
-        if token_type in self.CLOSE_PAREN_types and self.paren_level - 1 < 0:
-            return False
-
-        # TODO:
-        # if token_type == self.NL_type and self.paren_level == 0:
-        #     ...
-        #     return False
-
-        return True
-
-    def __copy__(self):
-        res = type(self)()
-        res.paren_level = self.paren_level
-        res.indent_level = copy(self.indent_level)
-        return res
-
-    def __repr__(self):
-        return f"{type(self).__name__}(paren_level={self.paren_level!r}, indent_level={self.indent_level!r})"
-
-
-class PartialPythonIndenter(PartialIndenter):
-    NL_type = "_NEWLINE"
-    OPEN_PAREN_types = ["LPAR", "LSQB", "LBRACE"]
-    CLOSE_PAREN_types = ["RPAR", "RSQB", "RBRACE"]
-    INDENT_type = "_INDENT"
-    DEDENT_type = "_DEDENT"
-    tab_len = 8
-
-
-def get_contextual_lexer(x: Union[PartialLexerThread, PartialParsingFrontend]):
-    if isinstance(x.lexer, ContextualLexer):
-        return x.lexer
-    else:
-        return x.lexer.lexer
-
-
-def terminals_to_fsms(lp: PartialLark) -> Dict[str, FSM]:
-    """Construct a ``dict`` mapping terminal symbol names to their finite state machines."""
-
-    symbol_names_and_fsms = {}
-    for terminal in lp.terminals:
-        pattern = interegular.parse_pattern(terminal.pattern.to_regexp())
-        # TODO: Use `pyparser.terminals[0].pattern.flags`?
-        try:
-            fsm, _ = make_deterministic_fsm(pattern.to_fsm().reduce())
-        except Unsupported:
-            fsm = None
-
-        symbol_names_and_fsms[terminal.name] = fsm
-
-    return symbol_names_and_fsms
diff --git a/pyproject.toml b/pyproject.toml
index 24c9cdd5..fdaa0500 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,9 +3,9 @@ requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]
 build-backend = "setuptools.build_meta"
 
 [project]
-name = "outlines"
+name = "outlines_core"
 authors= [{name = "Outlines Developers"}]
-description = "Probabilistic Generative Model Programming"
+description = "Structured Text Generation in Rust"
 requires-python = ">=3.8"
 license = {text = "Apache-2.0"}
 keywords=[
@@ -25,9 +25,6 @@ classifiers = [
 ]
 dependencies = [
    "interegular",
-   "jinja2",
-   "lark",
-   "nest_asyncio",
    "numpy<2.0.0",
    "cloudpickle",
    "diskcache",
@@ -35,12 +32,9 @@ dependencies = [
    "numba",
    "referencing",
    "jsonschema",
-   "requests",
    "tqdm",
    "datasets",
    "typing_extensions",
-   "pycountry",
-   "pyairports",
 ]
 dynamic = ["version"]
 
@@ -55,36 +49,30 @@ test = [
     "diff-cover",
     "accelerate",
     "beartype<0.16.0",
-    "responses",
     "huggingface_hub",
     "torch",
     "transformers",
     "pillow",
 ]
-serve = [
-    "vllm>=0.3.0",
-    "uvicorn",
-    "fastapi",
-    "pydantic>=2.0",
-]
 
 [project.urls]
-homepage = "https://github.com/outlines-dev/outlines"
-documentation = "https://outlines-dev.github.io/outlines/"
-repository = "https://github.com/outlines-dev/outlines"
+homepage = "https://github.com/outlines-dev/outlines-core"
+documentation = "https://outlines-dev.github.io/outlines-core/"
+repository = "https://github.com/outlines-dev/outlines-core/"
 
 [project.readme]
 file="README.md"
 content-type = "text/markdown"
 
 [tool.setuptools]
-packages = ["outlines"]
+packages = ["outlines_core"]
+package-dir = {"" = "src"}
 
 [tool.setuptools.package-data]
 "outlines" = ["py.typed"]
 
 [tool.setuptools_scm]
-write_to = "outlines/_version.py"
+write_to = "src/outlines_core/_version.py"
 
 [tool.pytest.ini_options]
 testpaths = ["tests"]
@@ -104,10 +92,7 @@ enable_incomplete_feature = ["Unpack"]
 
 [[tool.mypy.overrides]]
 module = [
-    "exllamav2.*",
-    "jinja2",
     "jsonschema.*",
-    "nest_asyncio",
     "numpy.*",
     "cloudpickle.*",
     "diskcache.*",
@@ -117,20 +102,15 @@ module = [
     "torch.*",
     "transformers.*",
     "huggingface_hub",
-    "lark.*",
     "interegular.*",
     "datasets.*",
     "numba.*",
-    "requests.*",
-    "responses.*",
-    "uvicorn.*",
-    "fastapi.*",
 ]
 ignore_missing_imports = true
 
 [tool.coverage.run]
 omit = [
-    "outlines/_version.py",
+    "src/outlines_core/_version.py",
     "tests/*",
 ]
 branch = true
diff --git a/outlines/__init__.py b/src/outlines_core/__init__.py
similarity index 55%
rename from outlines/__init__.py
rename to src/outlines_core/__init__.py
index f8d13c66..1a322f2b 100644
--- a/outlines/__init__.py
+++ b/src/outlines_core/__init__.py
@@ -1,10 +1,9 @@
 """Outlines is a Generative Model Programming Framework."""
-import outlines.models
-from outlines.caching import clear_cache, disable_cache, get_cache
+import outlines_core.models
+from outlines_core.caching import clear_cache, disable_cache, get_cache
 
 __all__ = [
     "clear_cache",
     "disable_cache",
     "get_cache",
-    "vectorize",
 ]
diff --git a/outlines/caching.py b/src/outlines_core/caching.py
similarity index 95%
rename from outlines/caching.py
rename to src/outlines_core/caching.py
index 6fdda621..92a08415 100644
--- a/outlines/caching.py
+++ b/src/outlines_core/caching.py
@@ -48,10 +48,12 @@ def get_cache():
     environment variable.
 
     """
-    from outlines._version import __version__ as outlines_version  # type: ignore
+    from outlines_core._version import (
+        __version__ as outlines_core_version,  # type: ignore
+    )
 
     home_dir = os.path.expanduser("~")
-    cache_dir = os.environ.get("OUTLINES_CACHE_DIR", f"{home_dir}/.cache/outlines")
+    cache_dir = os.environ.get("OUTLINES_CACHE_DIR", f"{home_dir}/.cache/outlines-core")
     memory = Cache(
         cache_dir,
         eviction_policy="none",
@@ -60,9 +62,9 @@ def get_cache():
     )
 
     # ensure if version upgrade occurs, old cache is pruned
-    if outlines_version != memory.get("__version__"):
+    if outlines_core_version != memory.get("__version__"):
         memory.clear()
-    memory["__version__"] = outlines_version
+    memory["__version__"] = outlines_core_version
 
     return memory
 
diff --git a/outlines/fsm/__init__.py b/src/outlines_core/fsm/__init__.py
similarity index 100%
rename from outlines/fsm/__init__.py
rename to src/outlines_core/fsm/__init__.py
diff --git a/outlines/fsm/fsm.py b/src/outlines_core/fsm/fsm.py
similarity index 92%
rename from outlines/fsm/fsm.py
rename to src/outlines_core/fsm/fsm.py
index bfcf55c0..4daf3c86 100644
--- a/outlines/fsm/fsm.py
+++ b/src/outlines_core/fsm/fsm.py
@@ -1,10 +1,10 @@
 import warnings
 from typing import TYPE_CHECKING, Iterable, NewType, Optional
 
-from outlines.fsm.guide import RegexGuide, StopAtEOSGuide
+from outlines_core.fsm.guide import RegexGuide, StopAtEOSGuide
 
 if TYPE_CHECKING:
-    from outlines.models.tokenizer import Tokenizer
+    from outlines_core.models.tokenizer import Tokenizer
 
 FSMState = NewType("FSMState", int)
 
diff --git a/outlines/fsm/guide.py b/src/outlines_core/fsm/guide.py
similarity index 98%
rename from outlines/fsm/guide.py
rename to src/outlines_core/fsm/guide.py
index c846c441..3773505d 100644
--- a/outlines/fsm/guide.py
+++ b/src/outlines_core/fsm/guide.py
@@ -14,15 +14,15 @@
 import interegular
 import torch
 
-from outlines.caching import cache
-from outlines.fsm.regex import (
+from outlines_core.caching import cache
+from outlines_core.fsm.regex import (
     create_fsm_index_tokenizer,
     make_byte_level_fsm,
     make_deterministic_fsm,
 )
 
 if TYPE_CHECKING:
-    from outlines.models.tokenizer import Tokenizer
+    from outlines_core.models.tokenizer import Tokenizer
 
 
 @dataclass(frozen=True)
diff --git a/outlines/fsm/json_schema.py b/src/outlines_core/fsm/json_schema.py
similarity index 100%
rename from outlines/fsm/json_schema.py
rename to src/outlines_core/fsm/json_schema.py
diff --git a/outlines/fsm/regex.py b/src/outlines_core/fsm/regex.py
similarity index 99%
rename from outlines/fsm/regex.py
rename to src/outlines_core/fsm/regex.py
index 8cfd81ea..3c06790a 100644
--- a/outlines/fsm/regex.py
+++ b/src/outlines_core/fsm/regex.py
@@ -29,7 +29,7 @@
 from tqdm import tqdm
 
 if TYPE_CHECKING:
-    from outlines.models.tokenizer import Tokenizer
+    from outlines_core.models.tokenizer import Tokenizer
 
 
 class BetterAlphabet(Alphabet):
diff --git a/outlines/fsm/types.py b/src/outlines_core/fsm/types.py
similarity index 100%
rename from outlines/fsm/types.py
rename to src/outlines_core/fsm/types.py
diff --git a/outlines/integrations/utils.py b/src/outlines_core/integrations/utils.py
similarity index 100%
rename from outlines/integrations/utils.py
rename to src/outlines_core/integrations/utils.py
diff --git a/outlines/models/__init__.py b/src/outlines_core/models/__init__.py
similarity index 100%
rename from outlines/models/__init__.py
rename to src/outlines_core/models/__init__.py
diff --git a/outlines/models/tokenizer.py b/src/outlines_core/models/tokenizer.py
similarity index 100%
rename from outlines/models/tokenizer.py
rename to src/outlines_core/models/tokenizer.py
diff --git a/outlines/models/transformers.py b/src/outlines_core/models/transformers.py
similarity index 99%
rename from outlines/models/transformers.py
rename to src/outlines_core/models/transformers.py
index 10f8f248..e219d8a4 100644
--- a/outlines/models/transformers.py
+++ b/src/outlines_core/models/transformers.py
@@ -4,7 +4,7 @@
 
 from datasets.fingerprint import Hasher
 
-from outlines.models.tokenizer import Tokenizer
+from outlines_core.models.tokenizer import Tokenizer
 
 if TYPE_CHECKING:
     import torch
diff --git a/outlines/py.typed b/src/outlines_core/py.typed
similarity index 100%
rename from outlines/py.typed
rename to src/outlines_core/py.typed
diff --git a/tests/fsm/test_fsm.py b/tests/fsm/test_fsm.py
index 94166fd9..bb074b0b 100644
--- a/tests/fsm/test_fsm.py
+++ b/tests/fsm/test_fsm.py
@@ -1,6 +1,6 @@
 import pytest
 
-from outlines.fsm.fsm import RegexFSM, StopAtEosFSM
+from outlines_core.fsm.fsm import RegexFSM, StopAtEosFSM
 
 
 def assert_expected_tensor_ids(tensor, ids):
diff --git a/tests/fsm/test_guide.py b/tests/fsm/test_guide.py
index 9a66bc04..c48b1ad9 100644
--- a/tests/fsm/test_guide.py
+++ b/tests/fsm/test_guide.py
@@ -1,6 +1,6 @@
 import pytest
 
-from outlines.fsm.guide import Generate, RegexGuide, StopAtEOSGuide, Write
+from outlines_core.fsm.guide import Generate, RegexGuide, StopAtEOSGuide, Write
 
 
 def assert_expected_tensor_ids(tensor, ids):
diff --git a/tests/fsm/test_json_schema.py b/tests/fsm/test_json_schema.py
index 21571da8..12b26912 100644
--- a/tests/fsm/test_json_schema.py
+++ b/tests/fsm/test_json_schema.py
@@ -6,7 +6,7 @@
 import pytest
 from pydantic import BaseModel, Field, constr
 
-from outlines.fsm.json_schema import (
+from outlines_core.fsm.json_schema import (
     BOOLEAN,
     DATE,
     DATE_TIME,
diff --git a/tests/fsm/test_regex.py b/tests/fsm/test_regex.py
index fa72ad0d..ef424156 100644
--- a/tests/fsm/test_regex.py
+++ b/tests/fsm/test_regex.py
@@ -4,7 +4,7 @@
 import pytest
 from transformers import AutoTokenizer
 
-from outlines.fsm.regex import (
+from outlines_core.fsm.regex import (
     _walk_fsm,
     create_fsm_index_end_to_end,
     create_fsm_index_tokenizer,
@@ -18,8 +18,8 @@
     reduced_vocabulary,
     walk_fsm,
 )
-from outlines.integrations.utils import adapt_tokenizer
-from outlines.models.transformers import TransformerTokenizer
+from outlines_core.integrations.utils import adapt_tokenizer
+from outlines_core.models.transformers import TransformerTokenizer
 
 
 def identity(s):
@@ -534,7 +534,7 @@ def test_json_index_performance():
     from line_profiler import LineProfiler  # type: ignore [import]
     from pydantic import BaseModel, constr
 
-    import outlines
+    import outlines_core
 
     class Weapon(str, Enum):
         sword = "sword"
@@ -558,16 +558,16 @@ class Character(BaseModel):
         # TODO: Add support for conint
         strength: int  # conint(int, ge=0, le=100)
 
-    model = outlines.models.transformers("gpt2", device="cuda")
+    model = outlines_core.models.transformers("gpt2", device="cuda")
     json_schema = json.dumps(Character.model_json_schema())
 
     def build_regex():
-        regex_str = outlines.index.json_schema.build_regex_from_object(json_schema)
-        outlines.generate.regex(model, regex_str)
+        regex_str = outlines_core.index.json_schema.build_regex_from_object(json_schema)
+        outlines_core.generate.regex(model, regex_str)
 
     profiler = LineProfiler(create_fsm_index_end_to_end)
     profiler.add_function(create_fsm_index_tokenizer)
-    profiler.add_function(outlines.index.index.RegexFSM.__init__)
+    profiler.add_function(outlines_core.index.index.RegexFSM.__init__)
 
     profiler.runctx(
         "build_regex()",
@@ -665,7 +665,7 @@ def convert_token_to_string(self, token):
 def test_numba_leading_null_byte_UnicodeCharSeq_remains_broken():
     """Assert numba UnicodeCharSeq w/ leading \x00 is still broken"""
     # EXPLANATION:
-    # https://github.com/outlines-dev/outlines/pull/930#issuecomment-2143535968
+    # https://github.com/outlines_core-dev/outlines/pull/930#issuecomment-2143535968
 
     # from https://github.com/numba/numba/issues/9542
     d = numba.typed.typeddict.Dict.empty(numba.types.UnicodeCharSeq(1), numba.int64)
@@ -683,7 +683,7 @@ def test_numba_leading_null_byte_UnicodeCharSeq_remains_broken():
 def test_numba_leading_null_byte_unicode_type_sane(input_key):
     """Assert numba unicode_type w/ leading \x00 is working"""
     # EXPLANATION:
-    # https://github.com/outlines-dev/outlines/pull/930#issuecomment-2143535968
+    # https://github.com/outlines_core-dev/outlines/pull/930#issuecomment-2143535968
 
     # from https://github.com/numba/numba/issues/9542
     d = numba.typed.typeddict.Dict.empty(numba.types.unicode_type, numba.int64)
diff --git a/tests/fsm/test_types.py b/tests/fsm/test_types.py
index d5450434..2102db92 100644
--- a/tests/fsm/test_types.py
+++ b/tests/fsm/test_types.py
@@ -2,7 +2,7 @@
 
 import pytest
 
-from outlines.fsm.types import (
+from outlines_core.fsm.types import (
     BOOLEAN,
     DATE,
     DATETIME,
diff --git a/tests/models/test_tokenizer.py b/tests/models/test_tokenizer.py
index 831f7fe3..95e9cc8f 100644
--- a/tests/models/test_tokenizer.py
+++ b/tests/models/test_tokenizer.py
@@ -1,6 +1,6 @@
 import pytest
 
-from outlines.models.tokenizer import Tokenizer
+from outlines_core.models.tokenizer import Tokenizer
 
 
 def test_tokenizer():
diff --git a/tests/models/test_transformers.py b/tests/models/test_transformers.py
index 1404d287..8ac8d466 100644
--- a/tests/models/test_transformers.py
+++ b/tests/models/test_transformers.py
@@ -3,7 +3,7 @@
 from transformers import AutoTokenizer
 from transformers.models.gpt2 import GPT2TokenizerFast
 
-from outlines.models.transformers import TransformerTokenizer, transformers
+from outlines_core.models.transformers import TransformerTokenizer, transformers
 
 TEST_MODEL = "hf-internal-testing/tiny-random-GPTJForCausalLM"
 
diff --git a/tests/test_cache.py b/tests/test_cache.py
index eb4ec406..766d97ad 100644
--- a/tests/test_cache.py
+++ b/tests/test_cache.py
@@ -32,20 +32,20 @@ def test_cache(refresh_environment):
     """Initialize a temporary cache and delete it after the test has run."""
     with tempfile.TemporaryDirectory() as tempdir:
         os.environ["OUTLINES_CACHE_DIR"] = tempdir
-        import outlines
+        import outlines_core
 
-        memory = outlines.get_cache()
+        memory = outlines_core.get_cache()
         assert memory.directory == tempdir
 
-        yield outlines.caching.cache()
+        yield outlines_core.caching.cache()
 
         memory.clear()
 
 
 def test_get_cache(test_cache):
-    import outlines
+    import outlines_core
 
-    memory = outlines.get_cache()
+    memory = outlines_core.get_cache()
     assert isinstance(memory, diskcache.Cache)
 
     # If the cache is enabled then the size
@@ -70,9 +70,9 @@ def f(x):
 
 def test_disable_cache(test_cache):
     """Make sure that we can disable the cache."""
-    import outlines
+    import outlines_core
 
-    outlines.disable_cache()
+    outlines_core.disable_cache()
 
     # If the cache is disabled then the size
     # of `store` should increase every time
@@ -92,7 +92,7 @@ def f(x):
 
 def test_clear_cache(test_cache):
     """Make sure that we can clear the cache."""
-    import outlines
+    import outlines_core
 
     store = list()
 
@@ -110,7 +110,7 @@ def f(x):
 
     # The size of `store` should increase if we call `f`
     # after clearing the cache.
-    outlines.clear_cache()
+    outlines_core.clear_cache()
     f(1)
     assert len(store) == store_size + 1
 
@@ -118,14 +118,14 @@ def f(x):
 def test_version_upgrade_cache_invalidate(test_cache, mocker):
     """Ensure we can change the signature of a cached function if we upgrade the version"""
 
-    import outlines.caching
+    import outlines_core.caching
 
     def simulate_restart_outlines():
         # clearing in-memory lru_cache which returns the diskcache in
         # order to simulate a reload, we're not clearing the diskcache itself
-        outlines.caching.get_cache.cache_clear()
+        outlines_core.caching.get_cache.cache_clear()
 
-    mocker.patch("outlines._version.__version__", new="0.0.0")
+    mocker.patch("outlines_core._version.__version__", new="0.0.0")
     simulate_restart_outlines()
 
     # initialize cache with signature of Tuple-of-3
@@ -148,7 +148,7 @@ def foo():
         a, b = foo()
 
     # "restart" outlines WITH version upgrade
-    mocker.patch("outlines._version.__version__", new="0.0.1")
+    mocker.patch("outlines_core._version.__version__", new="0.0.1")
     simulate_restart_outlines()
 
     # change signature to Tuple-of-2
@@ -163,7 +163,7 @@ def foo():
 def test_cache_disabled_decorator(test_cache):
     """Ensure cache can be disabled in a local scope"""
 
-    from outlines.caching import cache_disabled
+    from outlines_core.caching import cache_disabled
 
     mock = unittest.mock.MagicMock()