diff --git a/.github/workflows/llm-docs.yml b/.github/workflows/llm-docs.yml
index 7ae396e7f7..c49671ef68 100644
--- a/.github/workflows/llm-docs.yml
+++ b/.github/workflows/llm-docs.yml
@@ -13,13 +13,11 @@ jobs:
     - name: Checkout repository
       uses: actions/checkout@v3
 
-    - name: Set up Node.js
-      uses: actions/setup-node@v3
-      with:
-        node-version: '22'
-
     - name: Compile llms.txt
-      run: npx --yes sitefetch https://cog.run -o docs/llms.txt --concurrency 10
+      run: |
+        # Concatenate all the markdown docs, minus everything after "## Contributors" in README.md, and write to docs/llms.txt
+        (sed '/## Contributors/q' README.md; for file in docs/*.md; do echo -e "\n\n\n\n\n---\n\n\n\n\n"; cat "$file"; done) > docs/llms.txt
+
 
     - name: Check for changes
       run: |
diff --git a/docs/llms.txt b/docs/llms.txt
index afca7b4647..f7f8a1b5c4 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -1,295 +1,277 @@
-<page>
-  <title>Cog: Containers for machine learning#</title>
-  <url>https://cog.run</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/README.md "Edit this page")
+# Cog: Containers for machine learning
 
 Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.
 
 You can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/).
 
-Highlights[#](#highlights "Permanent link")
--------------------------------------------
-
-*   📦 **Docker containers without the pain.** Writing your own `Dockerfile` can be a bewildering process. With Cog, you define your environment with a [simple configuration file](#how-it-works) and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on.
-    
-*   🤬️ **No more CUDA hell.** Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.
-    
-*   ✅ **Define the inputs and outputs for your model with standard Python.** Then, Cog generates an OpenAPI schema and validates the inputs and outputs with Pydantic.
-    
-*   🎁 **Automatic HTTP prediction server**: Your model's types are used to dynamically generate a RESTful HTTP API using [FastAPI](https://fastapi.tiangolo.com/).
-    
-*   🥞 **Automatic queue worker.** Long-running deep learning models or batch processing is best architected with a queue. Cog models do this out of the box. Redis is currently supported, with more in the pipeline.
-    
-*   ☁️ **Cloud storage.** Files can be read and written directly to Amazon S3 and Google Cloud Storage. (Coming soon.)
-    
-*   🚀 **Ready for production.** Deploy your model anywhere that Docker images run. Your own infrastructure, or [Replicate](https://replicate.com/).
-    
-
-How it works[#](#how-it-works "Permanent link")
------------------------------------------------
+## Highlights
 
-Define the Docker environment your model runs in with `cog.yaml`:
-
-`build:   gpu: true   system_packages:     - "libgl1-mesa-glx"     - "libglib2.0-0"   python_version: "3.12"   python_packages:     - "torch==2.3" predict: "predict.py:Predictor"`
+- 📦 **Docker containers without the pain.** Writing your own `Dockerfile` can be a bewildering process. With Cog, you define your environment with a [simple configuration file](#how-it-works) and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on.
 
-Define how predictions are run on your model with `predict.py`:
+- 🤬️ **No more CUDA hell.** Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.
 
-`from cog import BasePredictor, Input, Path import torch  class Predictor(BasePredictor):     def setup(self):         """Load the model into memory to make running multiple predictions efficient"""         self.model = torch.load("./weights.pth")      # The arguments and types the model takes as input     def predict(self,           image: Path = Input(description="Grayscale input image")     ) -> Path:         """Run a single prediction on the model"""         processed_image = preprocess(image)         output = self.model(processed_image)         return postprocess(output)`
+- ✅ **Define the inputs and outputs for your model with standard Python.** Then, Cog generates an OpenAPI schema and validates the inputs and outputs with Pydantic.
 
-Now, you can run predictions on this model:
+- 🎁 **Automatic HTTP prediction server**: Your model's types are used to dynamically generate a RESTful HTTP API using [FastAPI](https://fastapi.tiangolo.com/).
 
-`$ cog predict -i image=@input.jpg --> Building Docker image... --> Running Prediction... --> Output written to output.jpg`
+- 🥞 **Automatic queue worker.** Long-running deep learning models or batch processing is best architected with a queue. Cog models do this out of the box. Redis is currently supported, with more in the pipeline.
 
-Or, build a Docker image for deployment:
+- ☁️ **Cloud storage.** Files can be read and written directly to Amazon S3 and Google Cloud Storage. (Coming soon.)
 
-`$ cog build -t my-colorization-model --> Building Docker image... --> Built my-colorization-model:latest  $ docker run -d -p 5000:5000 --gpus all my-colorization-model  $ curl http://localhost:5000/predictions -X POST \     -H 'Content-Type: application/json' \     -d '{"input": {"image": "https://.../input.jpg"}}'`
+- 🚀 **Ready for production.** Deploy your model anywhere that Docker images run. Your own infrastructure, or [Replicate](https://replicate.com).
 
-Or, combine build and run via the `serve` command:
+## How it works
 
-`$ cog serve -p 8080  $ curl http://localhost:8080/predictions -X POST \     -H 'Content-Type: application/json' \     -d '{"input": {"image": "https://.../input.jpg"}}'`
-
-Why are we building this?[#](#why-are-we-building-this "Permanent link")
-------------------------------------------------------------------------
-
-It's really hard for researchers to ship machine learning models to production.
-
-Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed.
-
-[Andreas](https://github.com/andreasjansson) and [Ben](https://github.com/bfirsh) created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created [Docker Compose](https://github.com/docker/compose).
-
-We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. [Uber](https://eng.uber.com/michelangelo-pyml/) and others have built similar systems. So, we're making an open source version so other people can do this too.
-
-Hit us up if you're interested in using it or want to collaborate with us. [We're on Discord](https://discord.gg/replicate) or email us at [\[email protected\]](https://cog.run/cdn-cgi/l/email-protection#d4a0b1b5b994a6b1a4b8bdb7b5a0b1fab7bbb9).
-
-Prerequisites[#](#prerequisites "Permanent link")
--------------------------------------------------
-
-*   **macOS, Linux or Windows 11**. Cog works on macOS, Linux and Windows 11 with [WSL 2](https://cog.run/wsl2/wsl2/)
-*   **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to [install Buildx](https://docs.docker.com/build/architecture/#buildx) as well.
-
-Install[#](#install "Permanent link")
--------------------------------------
-
-If you're using macOS, you can install Cog using Homebrew:
-
-You can also download and install the latest release using our [install script](https://cog.run/install):
-
-`# fish shell sh (curl -fsSL https://cog.run/install.sh | psub)  # bash, zsh, and other shells sh <(curl -fsSL https://cog.run/install.sh)  # download with wget and run in a separate command wget -qO- https://cog.run/install.sh sh ./install.sh`
-
-You can manually install the latest release of Cog directly from GitHub by running the following commands in a terminal:
-
-`sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)" sudo chmod +x /usr/local/bin/cog`
-
-Alternatively, you can build Cog from source and install it with these commands:
-
-Or if you are on docker:
-
-`RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)"`
-
-Upgrade[#](#upgrade "Permanent link")
--------------------------------------
-
-If you're using macOS and you previously installed Cog with Homebrew, run the following:
-
-Otherwise, you can upgrade to the latest version by running the same commands you used to install it.
-
-Next steps[#](#next-steps "Permanent link")
--------------------------------------------
-
-*   [Get started with an example model](https://cog.run/getting-started/)
-*   [Get started with your own model](https://cog.run/getting-started-own-model/)
-*   [Using Cog with notebooks](https://cog.run/notebooks/)
-*   [Using Cog with Windows 11](https://cog.run/wsl2/wsl2/)
-*   [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples)
-*   [Deploy models with Cog](https://cog.run/deploy/)
-*   [`cog.yaml` reference](https://cog.run/yaml/) to learn how to define your model's environment
-*   [Prediction interface reference](https://cog.run/python/) to learn how the `Predictor` interface works
-*   [Training interface reference](https://cog.run/training/) to learn how to add a fine-tuning API to your model
-*   [HTTP API reference](https://cog.run/http/) to learn how to use the HTTP API that models serve
+Define the Docker environment your model runs in with `cog.yaml`:
 
-Need help?[#](#need-help "Permanent link")
-------------------------------------------
+```yaml
+build:
+  gpu: true
+  system_packages:
+    - "libgl1-mesa-glx"
+    - "libglib2.0-0"
+  python_version: "3.12"
+  python_packages:
+    - "torch==2.3"
+predict: "predict.py:Predictor"
+```
 
-[Join us in #cog on Discord.](https://discord.gg/replicate)
-
-Contributors ✨[#](#contributors "Permanent link")
--------------------------------------------------
+Define how predictions are run on your model with `predict.py`:
 
-Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/en/emoji-key)):
+```python
+from cog import BasePredictor, Input, Path
+import torch
+
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make running multiple predictions efficient"""
+        self.model = torch.load("./weights.pth")
+
+    # The arguments and types the model takes as input
+    def predict(self,
+          image: Path = Input(description="Grayscale input image")
+    ) -> Path:
+        """Run a single prediction on the model"""
+        processed_image = preprocess(image)
+        output = self.model(processed_image)
+        return postprocess(output)
+```
 
-<table><tbody><tr><td><a href="https://fir.sh/"><br><sub><b>Ben Firshman</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=bfirsh" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=bfirsh" title="Documentation">📖</a></td><td><a href="https://replicate.ai/"><br><sub><b>Andreas Jansson</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=andreasjansson" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=andreasjansson" title="Documentation">📖</a> <a href="#maintenance-andreasjansson" title="Maintenance">🚧</a></td><td><a href="http://zeke.sikelianos.com/"><br><sub><b>Zeke Sikelianos</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=zeke" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=zeke" title="Documentation">📖</a> <a href="#tool-zeke" title="Tools">🔧</a></td><td><a href="https://rory.bio/"><br><sub><b>Rory Byrne</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=synek" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=synek" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/commits?author=synek" title="Tests">⚠️</a></td><td><a href="https://github.com/hangtwenty"><br><sub><b>Michael Floering</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=hangtwenty" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=hangtwenty" title="Documentation">📖</a> <a href="#ideas-hangtwenty" title="Ideas, Planning, &amp; Feedback">🤔</a></td><td><a href="https://bencevans.io/"><br><sub><b>Ben Evans</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=bencevans" title="Documentation">📖</a></td><td><a href="https://shashank.pw/"><br><sub><b>shashank agarwal</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=imshashank" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=imshashank" title="Documentation">📖</a></td></tr><tr><td><a href="https://victorxlr.me/"><br><sub><b>VictorXLR</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=VictorXLR" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=VictorXLR" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/commits?author=VictorXLR" title="Tests">⚠️</a></td><td><a href="https://annahung31.github.io/"><br><sub><b>hung anna</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aannahung31" title="Bug reports">🐛</a></td><td><a href="http://notes.variogr.am/"><br><sub><b>Brian Whitman</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Abwhitman" title="Bug reports">🐛</a></td><td><a href="https://github.com/JimothyJohn"><br><sub><b>JimothyJohn</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3AJimothyJohn" title="Bug reports">🐛</a></td><td><a href="https://github.com/ericguizzo"><br><sub><b>ericguizzo</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aericguizzo" title="Bug reports">🐛</a></td><td><a href="http://www.dominicbaggott.com/"><br><sub><b>Dominic Baggott</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=evilstreak" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=evilstreak" title="Tests">⚠️</a></td><td><a href="https://github.com/dashstander"><br><sub><b>Dashiell Stander</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Adashstander" title="Bug reports">🐛</a> <a href="https://github.com/replicate/cog/commits?author=dashstander" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=dashstander" title="Tests">⚠️</a></td></tr><tr><td><a href="https://github.com/Hurricane-eye"><br><sub><b>Shuwei Liang</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3AHurricane-eye" title="Bug reports">🐛</a> <a href="#question-Hurricane-eye" title="Answering Questions">💬</a></td><td><a href="https://github.com/ericallam"><br><sub><b>Eric Allam</b></sub></a><br><a href="#ideas-ericallam" title="Ideas, Planning, &amp; Feedback">🤔</a></td><td><a href="https://perdomo.me/"><br><sub><b>Iván Perdomo</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aiperdomo" title="Bug reports">🐛</a></td><td><a href="http://charlesfrye.github.io/"><br><sub><b>Charles Frye</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=charlesfrye" title="Documentation">📖</a></td><td><a href="https://github.com/phamquiluan"><br><sub><b>Luan Pham</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aphamquiluan" title="Bug reports">🐛</a> <a href="https://github.com/replicate/cog/commits?author=phamquiluan" title="Documentation">📖</a></td><td><a href="https://github.com/TommyDew42"><br><sub><b>TommyDew</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=TommyDew42" title="Code">💻</a></td><td><a href="https://m4ke.org/"><br><sub><b>Jesse Andrews</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=anotherjesse" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=anotherjesse" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/commits?author=anotherjesse" title="Tests">⚠️</a></td></tr><tr><td><a href="https://whiteink.com/"><br><sub><b>Nick Stenning</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=nickstenning" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=nickstenning" title="Documentation">📖</a> <a href="#design-nickstenning" title="Design">🎨</a> <a href="#infra-nickstenning" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a> <a href="https://github.com/replicate/cog/commits?author=nickstenning" title="Tests">⚠️</a></td><td><a href="https://merrell.io/"><br><sub><b>Justin Merrell</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=justinmerrell" title="Documentation">📖</a></td><td><a href="https://github.com/ruriky"><br><sub><b>Rurik Ylä-Onnenvuori</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aruriky" title="Bug reports">🐛</a></td><td><a href="https://www.youka.club/"><br><sub><b>Youka</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Ayoukaclub" title="Bug reports">🐛</a></td><td><a href="https://github.com/afiaka87"><br><sub><b>Clay Mullis</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=afiaka87" title="Documentation">📖</a></td><td><a href="https://github.com/mattt"><br><sub><b>Mattt</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=mattt" title="Code">💻</a> <a href="https://github.com/replicate/cog/commits?author=mattt" title="Documentation">📖</a> <a href="#infra-mattt" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a></td><td><a href="https://github.com/Juneezee"><br><sub><b>Eng Zer Jun</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=Juneezee" title="Tests">⚠️</a></td></tr><tr><td><a href="https://github.com/bbedward"><br><sub><b>BB</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=bbedward" title="Code">💻</a></td><td><a href="https://github.com/williamluer"><br><sub><b>williamluer</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=williamluer" title="Documentation">📖</a></td><td><a href="http://sirupsen.com/"><br><sub><b>Simon Eskildsen</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=sirupsen" title="Code">💻</a></td><td><a href="https://erbridge.co.uk/"><br><sub><b>F</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aerbridge" title="Bug reports">🐛</a> <a href="https://github.com/replicate/cog/commits?author=erbridge" title="Code">💻</a></td><td><a href="https://github.com/philandstuff"><br><sub><b>Philip Potter</b></sub></a><br><a href="https://github.com/replicate/cog/issues?q=author%3Aphilandstuff" title="Bug reports">🐛</a> <a href="https://github.com/replicate/cog/commits?author=philandstuff" title="Code">💻</a></td><td><a href="https://github.com/joannejchen"><br><sub><b>Joanne Chen</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=joannejchen" title="Documentation">📖</a></td><td><a href="http://technillogue.github.io/"><br><sub><b>technillogue</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=technillogue" title="Code">💻</a></td></tr><tr><td><a href="http://aroncarroll.com/"><br><sub><b>Aron Carroll</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=aron" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/commits?author=aron" title="Code">💻</a> <a href="#ideas-aron" title="Ideas, Planning, &amp; Feedback">🤔</a></td><td><a href="https://github.com/Theodotus1243"><br><sub><b>Bohdan Mykhailenko</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=Theodotus1243" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/issues?q=author%3ATheodotus1243" title="Bug reports">🐛</a></td><td><a href="https://github.com/one1zero1one"><br><sub><b>Daniel Radu</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=one1zero1one" title="Documentation">📖</a> <a href="https://github.com/replicate/cog/issues?q=author%3Aone1zero1one" title="Bug reports">🐛</a></td><td><a href="https://github.com/Etelis"><br><sub><b>Itay Etelis</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=Etelis" title="Code">💻</a></td><td><a href="https://www.wavefunction.dev/"><br><sub><b>Gennaro Schiano</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=gschian0" title="Documentation">📖</a></td><td><a href="http://andreknoerig.de/"><br><sub><b>André Knörig</b></sub></a><br><a href="https://github.com/replicate/cog/commits?author=aknoerig" title="Documentation">📖</a></td></tr></tbody></table>
+Now, you can run predictions on this model:
 
-This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!</content>
-</page>
+```console
+$ cog predict -i image=@input.jpg
+--> Building Docker image...
+--> Running Prediction...
+--> Output written to output.jpg
+```
 
-<page>
-  <title>Getting Started - Cog</title>
-  <url>https://cog.run/getting-started/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/getting-started.md "Edit this page")
+Or, build a Docker image for deployment:
 
-This guide will walk you through what you can do with Cog by using an example model.
+```console
+$ cog build -t my-colorization-model
+--> Building Docker image...
+--> Built my-colorization-model:latest
 
-Prerequisites[#](#prerequisites "Permanent link")
--------------------------------------------------
+$ docker run -d -p 5000:5000 --gpus all my-colorization-model
 
-*   **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
-*   **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.
+$ curl http://localhost:5000/predictions -X POST \
+    -H 'Content-Type: application/json' \
+    -d '{"input": {"image": "https://.../input.jpg"}}'
+```
 
-Install Cog[#](#install-cog "Permanent link")
----------------------------------------------
+Or, combine build and run via the `serve` command:
 
-First, install Cog:
+```console
+$ cog serve -p 8080
 
-``sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog``
+$ curl http://localhost:8080/predictions -X POST \
+    -H 'Content-Type: application/json' \
+    -d '{"input": {"image": "https://.../input.jpg"}}'
+```
 
-Create a project[#](#create-a-project "Permanent link")
--------------------------------------------------------
+<!-- NOTE (bfirsh): Development environment instructions intentionally left out of readme for now, so as not to confuse the "ship a model to production" message.
 
-Let's make a directory to work in:
+In development, you can also run arbitrary commands inside the Docker environment:
 
-`mkdir cog-quickstart cd cog-quickstart`
+```console
+$ cog run python train.py
+...
+```
 
-Run commands[#](#run-commands "Permanent link")
------------------------------------------------
+Or, [spin up a Jupyter notebook](docs/notebooks.md):
 
-The simplest thing you can do with Cog is run a command inside a Docker environment.
+```console
+$ cog run -p 8888 jupyter notebook --allow-root --ip=0.0.0.0
+```
+-->
 
-The first thing you need to do is create a file called `cog.yaml`:
+## Why are we building this?
 
-`build:   python_version: "3.11"`
+It's really hard for researchers to ship machine learning models to production.
 
-Then, you can run any command inside this environment. For example, enter
+Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed.
 
-and you'll get an interactive Python shell:
+[Andreas](https://github.com/andreasjansson) and [Ben](https://github.com/bfirsh) created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created [Docker Compose](https://github.com/docker/compose).
 
-`✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981 Running 'python' in Docker with the current directory mounted as a volume... ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────  Python 3.11.1 (main, Jan 27 2023, 10:52:46) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>`
+We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. [Uber](https://eng.uber.com/michelangelo-pyml/) and others have built similar systems. So, we're making an open source version so other people can do this too.
 
-(Hit Ctrl-D to exit the Python shell.)
+Hit us up if you're interested in using it or want to collaborate with us. [We're on Discord](https://discord.gg/replicate) or email us at [team@replicate.com](mailto:team@replicate.com).
 
-Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.
+## Prerequisites
 
-Run predictions on a model[#](#run-predictions-on-a-model "Permanent link")
----------------------------------------------------------------------------
+- **macOS, Linux or Windows 11**. Cog works on macOS, Linux and Windows 11 with [WSL 2](docs/wsl2/wsl2.md)
+- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to [install Buildx](https://docs.docker.com/build/architecture/#buildx) as well.
 
-Let's pretend we've trained a model. With Cog, we can define how to run predictions on it in a standard way, so other people can easily run predictions on it without having to hunt around for a prediction script.
+## Install
 
-First, run this to get some pre-trained model weights:
+If you're using macOS, you can install Cog using Homebrew:
 
-`WEIGHTS_URL=https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5 curl -O $WEIGHTS_URL`
+```console
+brew install cog
+```
 
-Then, we need to write some code to describe how predictions are run on the model.
+You can also download and install the latest release using our 
+[install script](https://cog.run/install):
 
-Save this to `predict.py`:
+```sh
+# fish shell
+sh (curl -fsSL https://cog.run/install.sh | psub)
 
-`from typing import Any from cog import BasePredictor, Input, Path from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image as keras_image from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions import numpy as np  class Predictor(BasePredictor):     def setup(self):         """Load the model into memory to make running multiple predictions efficient"""         self.model = ResNet50(weights='resnet50_weights_tf_dim_ordering_tf_kernels.h5')      # Define the arguments and types the model takes as input     def predict(self, image: Path = Input(description="Image to classify")) -> Any:         """Run a single prediction on the model"""         # Preprocess the image         img = keras_image.load_img(image, target_size=(224, 224))         x = keras_image.img_to_array(img)         x = np.expand_dims(x, axis=0)         x = preprocess_input(x)         # Run the prediction         preds = self.model.predict(x)         # Return the top 3 predictions         return decode_predictions(preds, top=3)[0]`
+# bash, zsh, and other shells
+sh <(curl -fsSL https://cog.run/install.sh)
 
-We also need to point Cog at this, and tell it what Python dependencies to install. Update `cog.yaml` to look like this:
+# download with wget and run in a separate command
+wget -qO- https://cog.run/install.sh
+sh ./install.sh
+```
 
-`build:   python_version: "3.11"   python_packages:     - pillow==9.5.0     - tensorflow==2.12.0 predict: "predict.py:Predictor"`
+You can manually install the latest release of Cog directly from GitHub 
+by running the following commands in a terminal:
 
-Let's grab an image to test the model with:
+```console
+sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
+sudo chmod +x /usr/local/bin/cog
+```
 
-`IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg curl $IMAGE_URL > input.jpg`
+Alternatively, you can build Cog from source and install it with these commands:
 
-Now, let's run the model using Cog:
+```console
+make
+sudo make install
+```
 
-`cog predict -i image=@input.jpg`
+Or if you are on docker:
 
-If you see the following output
+```
+RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)"
+```
 
-`[   [     "n02123159",     "tiger_cat",     0.4874822497367859   ],   [     "n02123045",     "tabby",     0.23169134557247162   ],   [     "n02124075",     "Egyptian_cat",     0.09728282690048218   ] ]`
+## Upgrade
 
-then it worked!
+If you're using macOS and you previously installed Cog with Homebrew, run the following:
 
-Note: The first time you run `cog predict`, the build process will be triggered to generate a Docker container that can run your model. The next time you run `cog predict` the pre-built container will be used.
+```console
+brew upgrade cog
+```
 
-Build an image[#](#build-an-image "Permanent link")
----------------------------------------------------
+Otherwise, you can upgrade to the latest version by running the same commands you used to install it.
 
-We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves predictions with an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time predictions.
+## Next steps
 
-`cog build -t resnet # Building Docker image... # Built resnet:latest`
+- [Get started with an example model](docs/getting-started.md)
+- [Get started with your own model](docs/getting-started-own-model.md)
+- [Using Cog with notebooks](docs/notebooks.md)
+- [Using Cog with Windows 11](docs/wsl2/wsl2.md)
+- [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples)
+- [Deploy models with Cog](docs/deploy.md)
+- [`cog.yaml` reference](docs/yaml.md) to learn how to define your model's environment
+- [Prediction interface reference](docs/python.md) to learn how the `Predictor` interface works
+- [Training interface reference](docs/training.md) to learn how to add a fine-tuning API to your model
+- [HTTP API reference](docs/http.md) to learn how to use the HTTP API that models serve
 
-Once you've built the image, you can optionally view the generated dockerfile to get a sense of what Cog is doing under the hood:
+## Need help?
 
-You can run this image with `cog predict` by passing the filename as an argument:
+[Join us in #cog on Discord.](https://discord.gg/replicate)
 
-`cog predict resnet -i image=@input.jpg`
+## Contributors ✨
 
-Or, you can run it with Docker directly, and it'll serve an HTTP server:
 
-`docker run -d --rm -p 5000:5000 resnet`
 
-We can send inputs directly with `curl`:
 
-`curl http://localhost:5000/predictions -X POST \     -H 'Content-Type: application/json' \     -d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}'`
 
-As a shorthand, you can add the Docker image's name as an extra line in `cog.yaml`:
+---
 
-`image: "r8.im/replicate/resnet"`
 
-Once you've done this, you can use `cog push` to build and push the image to a Docker registry:
 
-`cog push # Building r8.im/replicate/resnet... # Pushing r8.im/replicate/resnet... # Pushed!`
 
-The Docker image is now accessible to anyone or any system that has access to this Docker registry.
 
-> **Note** Model repos often contain large data files, like weights and checkpoints. If you put these files in their own subdirectory and run `cog build` with the `--separate-weights` flag, Cog will copy these files into a separate Docker layer, which reduces the time needed to rebuild after making changes to code.
-> 
-> `# ✅ Yes . ├── checkpoints/ │   └── weights.ckpt ├── predict.py └── cog.yaml  # ❌ No . ├── weights.ckpt # <- Don't put weights in root directory ├── predict.py └── cog.yaml  # ❌ No . ├── checkpoints/ │   ├── weights.ckpt │   └── load_weights.py # <- Don't put code in weights directory ├── predict.py └── cog.yaml`
+# Deploy models with Cog
 
-Next steps[#](#next-steps "Permanent link")
--------------------------------------------
+Cog containers are Docker containers that serve an HTTP server 
+for running predictions on your model. 
+You can deploy them anywhere that Docker containers run.
 
-Those are the basics! Next, you might want to take a look at:
+This guide assumes you have a model packaged with Cog. 
+If you don't, [follow our getting started guide](getting-started-own-model.md), 
+or use [an example model](https://github.com/replicate/cog-examples).
 
-*   [A guide to help you set up your own model on Cog.](https://cog.run/getting-started-own-model/)
-*   [A guide explaining how to deploy a model.](https://cog.run/deploy/)
-*   [Reference for `cog.yaml`](https://cog.run/yaml/)
-*   [Reference for the Python library](https://cog.run/python/)</content>
-</page>
+## Getting started
 
-<page>
-  <title>Deploy your model - Cog</title>
-  <url>https://cog.run/deploy/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/deploy.md "Edit this page")
-
-Deploy models with Cog[#](#deploy-models-with-cog "Permanent link")
--------------------------------------------------------------------
-
-Cog containers are Docker containers that serve an HTTP server for running predictions on your model. You can deploy them anywhere that Docker containers run.
+First, build your model:
 
-This guide assumes you have a model packaged with Cog. If you don't, [follow our getting started guide](https://cog.run/getting-started-own-model/), or use [an example model](https://github.com/replicate/cog-examples).
+```console
+cog build -t my-model
+```
 
-Getting started[#](#getting-started "Permanent link")
------------------------------------------------------
+Then, start the Docker container:
 
-First, build your model:
+```shell
+# If your model uses a CPU:
+docker run -d -p 5001:5000 my-model
 
-Then, start the Docker container:
+# If your model uses a GPU:
+docker run -d -p 5001:5000 --gpus all my-model
 
-`# If your model uses a CPU: docker run -d -p 5001:5000 my-model  # If your model uses a GPU: docker run -d -p 5001:5000 --gpus all my-model  # If you're on an M1 Mac: docker run -d -p 5001:5000 --platform=linux/amd64 my-model`
+# If you're on an M1 Mac:
+docker run -d -p 5001:5000 --platform=linux/amd64 my-model
+```
 
 The server is now running locally on port 5001.
 
-To view the OpenAPI schema, open [localhost:5001/openapi.json](http://localhost:5001/openapi.json) in your browser or use cURL to make a request:
+To view the OpenAPI schema, 
+open [localhost:5001/openapi.json](http://localhost:5001/openapi.json) 
+in your browser 
+or use cURL to make a request:
 
-`curl http://localhost:5001/openapi.json`
+```console
+curl http://localhost:5001/openapi.json
+```
 
 To stop the server, run:
 
-To run a prediction on the model, call the `/predictions` endpoint, passing input in the format expected by your model:
+```console
+docker kill my-model
+```
 
-`curl http://localhost:5001/predictions -X POST \     --header "Content-Type: application/json" \     --data '{"input": {"image": "https://.../input.jpg"}}'`
+To run a prediction on the model, 
+call the `/predictions` endpoint, 
+passing input in the format expected by your model:
 
-For more details about the HTTP API, see the [HTTP API reference documentation](https://cog.run/http/).
+```console
+curl http://localhost:5001/predictions -X POST \
+    --header "Content-Type: application/json" \
+    --data '{"input": {"image": "https://.../input.jpg"}}'
+```
 
-Options[#](#options "Permanent link")
--------------------------------------
+For more details about the HTTP API, 
+see the [HTTP API reference documentation](http.md).
+
+## Options
 
 Cog Docker images have `python -m cog.server.http` set as the default command, which gets overridden if you pass a command to `docker run`. When you use command-line options, you need to pass in the full command before the options.
 
-### `--threads`[#](#-threads "Permanent link")
+### `--threads`
 
 This controls how many threads are used by Cog, which determines how many requests Cog serves in parallel. If your model uses a CPU, this is the number of CPUs on your machine. If your model uses a GPU, this is 1, because typically a GPU can only be used by one process.
 
@@ -297,1112 +279,1653 @@ You might need to adjust this if you want to control how much memory your model
 
 For example:
 
-`docker run -d -p 5000:5000 my-model python -m cog.server.http --threads=10`
+    docker run -d -p 5000:5000 my-model python -m cog.server.http --threads=10
 
-`--host`[#](#-host "Permanent link")
-------------------------------------
+## `--host`
 
-By default, Cog serves to `0.0.0.0`. You can override this using the `--host` option.
+By default, Cog serves to `0.0.0.0`.
+You can override this using the `--host` option.
 
-For example, to serve Cog on an IPv6 address, run:
+For example, 
+to serve Cog on an IPv6 address, run:
 
-`docker run -d -p 5000:5000 my-model python -m cog.server.http --host="::"`</content>
-</page>
+    docker run -d -p 5000:5000 my-model python -m cog.server.http --host="::"
 
-<page>
-  <title>Training API - Cog</title>
-  <url>https://cog.run/training/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/training.md "Edit this page")
 
-Training interface reference[#](#training-interface-reference "Permanent link")
--------------------------------------------------------------------------------
 
-> \[!NOTE\]  
-> The training API is still experimental, and is subject to change.
 
-Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fune-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2).
 
-How it works[#](#how-it-works "Permanent link")
------------------------------------------------
+---
 
-If you've used Cog before, you've probably seen the [Predictor](https://cog.run/python/) class, which defines the interface for creating predictions against your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights.
 
-`cog.yaml`:
 
-`build:   python_version: "3.10" train: "train.py:train"`
 
-`train.py`:
 
-`from cog import BasePredictor, File import io  def train(param: str) -> File:     return io.StringIO("hello " + param)`
+# Environment variables
 
-Then you can run it like this:
-
-`$ cog train -i param=train ...  $ cat weights hello train`
-
-`Input(**kwargs)`[#](#inputkwargs "Permanent link")
----------------------------------------------------
-
-Use Cog's `Input()` function to define each of the parameters in your `train()` function:
+This guide lists the environment variables that change how Cog functions.
 
-`from cog import Input, Path  def train(     train_data: Path = Input(description="HTTPS URL of a file containing training data"),     learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),     seed: int = Input(description="random seed to use for training", default=None) ) -> str:   return "hello, weights"`
+### `COG_NO_UPDATE_CHECK`
 
-The `Input()` function takes these keyword arguments:
+By default, Cog automatically checks for updates 
+and notifies you if there is a new version available.
 
-*   `description`: A description of what to pass to this input for users of the model.
-*   `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
-*   `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
-*   `le`: For `int` or `float` types, the value must be less than or equal to this number.
-*   `min_length`: For `str` types, the minimum length of the string.
-*   `max_length`: For `str` types, the maximum length of the string.
-*   `regex`: For `str` types, the string must match this regular expression.
-*   `choices`: For `str` or `int` types, a list of possible values for this input.
+To disable this behavior, 
+set the `COG_NO_UPDATE_CHECK` environment variable to any value.
 
-Each parameter of the `train()` function must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](https://cog.run/python/#input-and-output-types) for the full list of supported types.
+```console
+$ COG_NO_UPDATE_CHECK=1 cog build  # runs without automatic update check
+```
 
-Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
 
-`def predict(self,   training_data: str = "foo bar", # this is valid   iterations: int                 # also valid ) -> str:   # ...`
 
-Training Output[#](#training-output "Permanent link")
------------------------------------------------------
 
-Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a `TrainingOutput` object with multiple fields to return from your `train()` function, and specify it as the return type for the train function using Python's `->` return type annotation:
 
-`from cog import BaseModel, Input, Path  class TrainingOutput(BaseModel):     weights: Path  def train(     train_data: Path = Input(description="HTTPS URL of a file containing training data"),     learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),     seed: int = Input(description="random seed to use for training", default=42) ) -> TrainingOutput:   weights_file = generate_weights("...")   return TrainingOutput(weights=Path(weights_file))`
+---
 
-Testing[#](#testing "Permanent link")
--------------------------------------
 
-If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a `COG_WEIGHTS` environment variable when running `predict`:
 
-`cog predict -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK"`</content>
-</page>
 
-<page>
-  <title>Using your own model - Cog</title>
-  <url>https://cog.run/getting-started-own-model/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/getting-started-own-model.md "Edit this page")
 
-Getting started with your own model[#](#getting-started-with-your-own-model "Permanent link")
----------------------------------------------------------------------------------------------
+# Getting started with your own model
 
-This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the [main getting started guide](https://cog.run/getting-started/).
+This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the [main getting started guide](getting-started.md).
 
-Prerequisites[#](#prerequisites "Permanent link")
--------------------------------------------------
+## Prerequisites
 
-*   **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
-*   **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.
+- **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
+- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.
 
-Initialization[#](#initialization "Permanent link")
----------------------------------------------------
+## Initialization
 
 First, install Cog if you haven't already:
 
-``sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog``
+```sh
+sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
+sudo chmod +x /usr/local/bin/cog
+```
 
 To configure your project for use with Cog, you'll need to add two files:
 
-*   [`cog.yaml`](https://cog.run/yaml/) defines system requirements, Python package dependencies, etc
-*   [`predict.py`](https://cog.run/python/) describes the prediction interface for your model
+- [`cog.yaml`](yaml.md) defines system requirements, Python package dependencies, etc
+- [`predict.py`](python.md) describes the prediction interface for your model
 
 Use the `cog init` command to generate these files in your project:
 
-`$ cd path/to/your/model $ cog init`
+```sh
+$ cd path/to/your/model
+$ cog init
+```
 
-Define the Docker environment[#](#define-the-docker-environment "Permanent link")
----------------------------------------------------------------------------------
+## Define the Docker environment
 
 The `cog.yaml` file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image.
 
 For example:
 
-`build:   python_version: "3.11"   python_packages:     - "torch==2.0.1"`
+```yaml
+build:
+  python_version: "3.11"
+  python_packages:
+    - "torch==2.0.1"
+```
 
 This will generate a Docker image with Python 3.11 and PyTorch 2 installed, for both CPU and GPU, with the correct version of CUDA, and various other sensible best-practices.
 
 To run a command inside this environment, prefix it with `cog run`:
 
-`$ cog run python ✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981 Running 'python' in Docker with the current directory mounted as a volume... ────────────────────────────────────────────────────────────────────────────────────────  Python 3.11.1 (main, Jan 27 2023, 10:52:46) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>`
+```
+$ cog run python
+✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
+Running 'python' in Docker with the current directory mounted as a volume...
+────────────────────────────────────────────────────────────────────────────────────────
+
+Python 3.11.1 (main, Jan 27 2023, 10:52:46)
+[GCC 9.3.0] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>>
+```
 
 This is handy for ensuring a consistent environment for development or training.
 
-With `cog.yaml`, you can also install system packages and other things. [Take a look at the full reference to see what else you can do.](https://cog.run/yaml/)
+With `cog.yaml`, you can also install system packages and other things. [Take a look at the full reference to see what else you can do.](yaml.md)
 
-Define how to run predictions[#](#define-how-to-run-predictions "Permanent link")
----------------------------------------------------------------------------------
+## Define how to run predictions
 
 The next step is to update `predict.py` to define the interface for running predictions on your model. The `predict.py` generated by `cog init` looks something like this:
 
-`from cog import BasePredictor, Path, Input import torch  class Predictor(BasePredictor):     def setup(self):         """Load the model into memory to make running multiple predictions efficient"""         self.net = torch.load("weights.pth")      def predict(self,             image: Path = Input(description="Image to enlarge"),             scale: float = Input(description="Factor to scale image by", default=1.5)     ) -> Path:         """Run a single prediction on the model"""         # ... pre-processing ...         output = self.net(input)         # ... post-processing ...         return output`
+```python
+from cog import BasePredictor, Path, Input
+import torch
+
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make running multiple predictions efficient"""
+        self.net = torch.load("weights.pth")
+
+    def predict(self,
+            image: Path = Input(description="Image to enlarge"),
+            scale: float = Input(description="Factor to scale image by", default=1.5)
+    ) -> Path:
+        """Run a single prediction on the model"""
+        # ... pre-processing ...
+        output = self.net(input)
+        # ... post-processing ...
+        return output
+```
 
 Edit your `predict.py` file and fill in the functions with your own model's setup and prediction code. You might need to import parts of your model from another file.
 
 You also need to define the inputs to your model as arguments to the `predict()` function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:
 
-*   `str`: a string
-*   `int`: an integer
-*   `float`: a floating point number
-*   `bool`: a boolean
-*   `cog.File`: a file-like object representing a file
-*   `cog.Path`: a path to a file on disk
+- `str`: a string
+- `int`: an integer
+- `float`: a floating point number
+- `bool`: a boolean
+- `cog.File`: a file-like object representing a file
+- `cog.Path`: a path to a file on disk
 
 You can provide more information about the input with the `Input()` function, as shown above. It takes these basic arguments:
 
-*   `description`: A description of what to pass to this input for users of the model
-*   `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
-*   `ge`: For `int` or `float` types, the value should be greater than or equal to this number.
-*   `le`: For `int` or `float` types, the value should be less than or equal to this number.
-*   `choices`: For `str` or `int` types, a list of possible values for this input.
+- `description`: A description of what to pass to this input for users of the model
+- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
+- `ge`: For `int` or `float` types, the value should be greater than or equal to this number.
+- `le`: For `int` or `float` types, the value should be less than or equal to this number.
+- `choices`: For `str` or `int` types, a list of possible values for this input.
 
-There are some more advanced options you can pass, too. For more details, [take a look at the prediction interface documentation](https://cog.run/python/).
+There are some more advanced options you can pass, too. For more details, [take a look at the prediction interface documentation](python.md).
 
 Next, add the line `predict: "predict.py:Predictor"` to your `cog.yaml`, so it looks something like this:
 
-`build:   python_version: "3.11"   python_packages:     - "torch==2.0.1" predict: "predict.py:Predictor"`
+```yaml
+build:
+  python_version: "3.11"
+  python_packages:
+    - "torch==2.0.1"
+predict: "predict.py:Predictor"
+```
 
 That's it! To test this works, try running a prediction on the model:
 
-`$ cog predict -i [[email protected]](https://cog.run/cdn-cgi/l/email-protection) ✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4 ✓ Model running in Docker image 664ef88bc1f4  Written output to output.png`
+```
+$ cog predict -i image=@input.jpg
+✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4
+✓ Model running in Docker image 664ef88bc1f4
+
+Written output to output.png
+```
 
 To pass more inputs to the model, you can add more `-i` options:
 
+```
+$ cog predict -i image=@image.jpg -i scale=2.0
+```
+
 In this case it is just a number, not a file, so you don't need the `@` prefix.
 
-Using GPUs[#](#using-gpus "Permanent link")
--------------------------------------------
+## Using GPUs
 
 To use GPUs with Cog, add the `gpu: true` option to the `build` section of your `cog.yaml`:
 
+```yaml
+build:
+  gpu: true
+  ...
+```
+
 Cog will use the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.
 
-For more details, [see the `gpu` section of the `cog.yaml` reference](https://cog.run/yaml/#gpu).
+For more details, [see the `gpu` section of the `cog.yaml` reference](yaml.md#gpu).
 
-Next steps[#](#next-steps "Permanent link")
--------------------------------------------
+## Next steps
 
 Next, you might want to take a look at:
 
-*   [A guide explaining how to deploy a model.](https://cog.run/deploy/)
-*   [The reference for `cog.yaml`](https://cog.run/yaml/)
-*   [The reference for the Python library](https://cog.run/python/)</content>
-</page>
+- [A guide explaining how to deploy a model.](deploy.md)
+- [The reference for `cog.yaml`](yaml.md)
+- [The reference for the Python library](python.md)
 
-<page>
-  <title>YAML spec - Cog</title>
-  <url>https://cog.run/yaml/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/yaml.md "Edit this page")
 
-`cog.yaml` reference[#](#cogyaml-reference "Permanent link")
-------------------------------------------------------------
 
-`cog.yaml` defines how to build a Docker image and how to run predictions on your model inside that image.
 
-It has three keys: [`build`](#build), [`image`](#image), and [`predict`](#predict). It looks a bit like this:
 
-`build:   python_version: "3.11"   python_packages:     - pytorch==2.0.1   system_packages:     - "ffmpeg"     - "git" predict: "predict.py:Predictor"`
+---
 
-Tip: Run [`cog init`](https://cog.run/getting-started-own-model/#initialization) to generate an annotated `cog.yaml` file that can be used as a starting point for setting up your model.
 
-`build`[#](#build "Permanent link")
------------------------------------
 
-This stanza describes how to build the Docker image your model runs in. It contains various options within it:
 
-### `cuda`[#](#cuda "Permanent link")
 
-Cog automatically picks the correct version of CUDA to install, but this lets you override it for whatever reason by specifying the minor (`11.8`) or patch (`11.8.0`) version of CUDA to use.
+# Getting started
 
-For example:
+This guide will walk you through what you can do with Cog by using an example model.
 
-### `gpu`[#](#gpu "Permanent link")
+> [!TIP]
+> Using a language model to help you write the code for your new Cog model?
+>
+> Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org).
 
-Enable GPUs for this model. When enabled, the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image will be used, and Cog will automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.
+## Prerequisites
 
-For example:
+- **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
+- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.
 
-When you use `cog run` or `cog predict`, Cog will automatically pass the `--gpus=all` flag to Docker. When you run a Docker image built with Cog, you'll need to pass this option to `docker run`.
+## Install Cog
 
-### `python_packages`[#](#python_packages "Permanent link")
+First, install Cog:
 
-A list of Python packages to install from the PyPi package index, in the format `package==version`. For example:
+```bash
+sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
+sudo chmod +x /usr/local/bin/cog
 
-`build:   python_packages:     - pillow==8.3.1     - tensorflow==2.5.0`
+```
 
-To install Git-hosted Python packages, add `git` to the `system_packages` list, then use the `git+https://` syntax to specify the package name. For example:
+## Create a project
 
-`build:   system_packages:     - "git"   python_packages:     - "git+https://github.com/huggingface/transformers"`
+Let's make a directory to work in:
 
-You can also pin Python package installations to a specific git commit:
+```bash
+mkdir cog-quickstart
+cd cog-quickstart
 
-`build:   system_packages:     - "git"   python_packages:     - "git+https://github.com/huggingface/transformers@2d1602a"`
+```
 
-Note that you can use a shortened prefix of the 40-character git commit SHA, but you must use at least six characters, like `2d1602a` above.
+## Run commands
 
-### `python_requirements`[#](#python_requirements "Permanent link")
+The simplest thing you can do with Cog is run a command inside a Docker environment.
 
-A pip requirements file specifying the Python packages to install. For example:
+The first thing you need to do is create a file called `cog.yaml`:
 
-`build:   python_requirements: requirements.txt`
+```yaml
+build:
+  python_version: "3.11"
+```
 
-Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies.
+Then, you can run any command inside this environment. For example, enter
 
-### `python_version`[#](#python_version "Permanent link")
+```bash
+cog run python
 
-The minor (`3.11`) or patch (`3.11.1`) version of Python to use. For example:
+```
 
-`build:   python_version: "3.11.1"`
+and you'll get an interactive Python shell:
 
-Cog supports all active branches of Python: 3.8, 3.9, 3.10, 3.11, 3.12, 3.13. If you don't define a version, Cog will use the latest version of Python 3.12 or a version of Python that is compatible with the versions of PyTorch or TensorFlow you specify.
+```none
+✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
+Running 'python' in Docker with the current directory mounted as a volume...
+───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 
-Note that these are the versions supported **in the Docker container**, not your host machine. You can run any version(s) of Python you wish on your host machine.
+Python 3.11.1 (main, Jan 27 2023, 10:52:46)
+[GCC 9.3.0] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>>
+```
 
-### `run`[#](#run "Permanent link")
+(Hit Ctrl-D to exit the Python shell.)
 
-A list of setup commands to run in the environment after your system packages and Python packages have been installed. If you're familiar with Docker, it's like a `RUN` instruction in your `Dockerfile`.
+Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.
 
-For example:
+## Run predictions on a model
 
-`build:   run:     - curl -L https://github.com/cowsay-org/cowsay/archive/refs/tags/v3.7.0.tar.gz | tar -xzf -     - cd cowsay-3.7.0 && make install`
+Let's pretend we've trained a model. With Cog, we can define how to run predictions on it in a standard way, so other people can easily run predictions on it without having to hunt around for a prediction script.
 
-Your code is _not_ available to commands in `run`. This is so we can build your image efficiently when running locally.
+First, run this to get some pre-trained model weights:
 
-Each command in `run` can be either a string or a dictionary in the following format:
+```bash
+WEIGHTS_URL=https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
+curl -O $WEIGHTS_URL
 
-`build:   run:     - command: pip install       mounts:         - type: secret           id: pip           target: /etc/pip.conf`
+```
 
-You can use secret mounts to securely pass credentials to setup commands, without baking them into the image. For more information, see [Dockerfile reference](https://docs.docker.com/engine/reference/builder/#run---mounttypesecret).
+Then, we need to write some code to describe how predictions are run on the model.
 
-### `system_packages`[#](#system_packages "Permanent link")
+Save this to `predict.py`:
 
-A list of Ubuntu APT packages to install. For example:
+```python
+from typing import Any
+from cog import BasePredictor, Input, Path
+from tensorflow.keras.applications.resnet50 import ResNet50
+from tensorflow.keras.preprocessing import image as keras_image
+from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
+import numpy as np
+
+
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make running multiple predictions efficient"""
+        self.model = ResNet50(weights='resnet50_weights_tf_dim_ordering_tf_kernels.h5')
+
+    # Define the arguments and types the model takes as input
+    def predict(self, image: Path = Input(description="Image to classify")) -> Any:
+        """Run a single prediction on the model"""
+        # Preprocess the image
+        img = keras_image.load_img(image, target_size=(224, 224))
+        x = keras_image.img_to_array(img)
+        x = np.expand_dims(x, axis=0)
+        x = preprocess_input(x)
+        # Run the prediction
+        preds = self.model.predict(x)
+        # Return the top 3 predictions
+        return decode_predictions(preds, top=3)[0]
+```
 
-`build:   system_packages:     - "ffmpeg"     - "libavcodec-dev"`
+We also need to point Cog at this, and tell it what Python dependencies to install. Update `cog.yaml` to look like this:
 
-`image`[#](#image "Permanent link")
------------------------------------
+```yaml
+build:
+  python_version: "3.11"
+  python_packages:
+    - pillow==9.5.0
+    - tensorflow==2.12.0
+predict: "predict.py:Predictor"
+```
 
-The name given to built Docker images. If you want to push to a registry, this should also include the registry name.
+Let's grab an image to test the model with:
 
-For example:
+```bash
+IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg
+curl $IMAGE_URL > input.jpg
 
-`image: "r8.im/your-username/your-model"`
+```
 
-r8.im is Replicate's registry, but this can be any Docker registry.
+Now, let's run the model using Cog:
 
-If you don't set this, then a name will be generated from the directory name.
+```bash
+cog predict -i image=@input.jpg
 
-If you set this, then you can run `cog push` without specifying the model name.
+```
 
-If you specify an image name argument when pushing (like `cog push your-username/custom-model-name`), the argument will be used and the value of `image` in cog.yaml will be ignored.
+If you see the following output
 
-`predict`[#](#predict "Permanent link")
----------------------------------------
+```
+[
+  [
+    "n02123159",
+    "tiger_cat",
+    0.4874822497367859
+  ],
+  [
+    "n02123045",
+    "tabby",
+    0.23169134557247162
+  ],
+  [
+    "n02124075",
+    "Egyptian_cat",
+    0.09728282690048218
+  ]
+]
+```
 
-The pointer to the `Predictor` object in your code, which defines how predictions are run on your model.
+then it worked!
 
-For example:
+Note: The first time you run `cog predict`, the build process will be triggered to generate a Docker container that can run your model. The next time you run `cog predict` the pre-built container will be used.
 
-`predict: "predict.py:Predictor"`
+## Build an image
 
-See [the Python API documentation for more information](https://cog.run/python/).</content>
-</page>
+We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves predictions with an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time predictions.
 
-<page>
-  <title>Prediction API - Cog</title>
-  <url>https://cog.run/python/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/python.md "Edit this page")
+```bash
+cog build -t resnet
+# Building Docker image...
+# Built resnet:latest
 
-Prediction interface reference[#](#prediction-interface-reference "Permanent link")
------------------------------------------------------------------------------------
+```
 
-This document defines the API of the `cog` Python module, which is used to define the interface for running predictions on your model.
+Once you've built the image, you can optionally view the generated dockerfile to get a sense of what Cog is doing under the hood:
 
-Tip: Run [`cog init`](https://cog.run/getting-started-own-model/#initialization) to generate an annotated `predict.py` file that can be used as a starting point for setting up your model.
-
-Contents[#](#contents "Permanent link")
----------------------------------------
-
-*   [Contents](#contents)
-*   [`BasePredictor`](#basepredictor)
-*   [`Predictor.setup()`](#predictorsetup)
-*   [`Predictor.predict(**kwargs)`](#predictorpredictkwargs)
-    *   [Streaming output](#streaming-output)
-*   [`Input(**kwargs)`](#inputkwargs)
-*   [Output](#output)
-*   [Returning an object](#returning-an-object)
-*   [Returning a list](#returning-a-list)
-*   [Optional properties](#optional-properties)
-*   [Input and output types](#input-and-output-types)
-*   [`File()`](#file)
-*   [`Path()`](#path)
-*   [`Secret`](#secret)
-*   [`List`](#list)
-
-`BasePredictor`[#](#basepredictor "Permanent link")
----------------------------------------------------
+```bash
+cog debug
+```
 
-You define how Cog runs predictions on your model by defining a class that inherits from `BasePredictor`. It looks something like this:
+You can run this image with `cog predict` by passing the filename as an argument:
 
-`from cog import BasePredictor, Path, Input import torch  class Predictor(BasePredictor):     def setup(self):         """Load the model into memory to make running multiple predictions efficient"""         self.model = torch.load("weights.pth")      def predict(self,             image: Path = Input(description="Image to enlarge"),             scale: float = Input(description="Factor to scale image by", default=1.5)     ) -> Path:         """Run a single prediction on the model"""         # ... pre-processing ...         output = self.model(image)         # ... post-processing ...         return output`
+```bash
+cog predict resnet -i image=@input.jpg
 
-Your Predictor class should define two methods: `setup()` and `predict()`.
+```
 
-### `Predictor.setup()`[#](#predictorsetup "Permanent link")
+Or, you can run it with Docker directly, and it'll serve an HTTP server:
 
-Prepare the model so multiple predictions run efficiently.
+```bash
+docker run -d --rm -p 5000:5000 resnet
 
-Use this _optional_ method to include any expensive one-off operations in here like loading trained models, instantiate data transformations, etc.
+```
 
-Many models use this method to download their weights (e.g. using [`pget`](https://github.com/replicate/pget)). This has some advantages:
+We can send inputs directly with `curl`:
 
-*   Smaller image sizes
-*   Faster build times
-*   Faster pushes and inference on [Replicate](https://replicate.com/)
+```bash
+curl http://localhost:5000/predictions -X POST \
+    -H 'Content-Type: application/json' \
+    -d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}'
 
-However, this may also significantly increase your `setup()` time.
+```
 
-As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your `cog.yaml` and ensure they are not excluded in your `.dockerignore` file.
+As a shorthand, you can add the Docker image's name as an extra line in `cog.yaml`:
 
-While this will increase your image size and build time, it offers other advantages:
+```yaml
+image: "r8.im/replicate/resnet"
+```
 
-*   Faster `setup()` time
-*   Ensures idempotency and reduces your model's reliance on external systems
-*   Preserves reproducibility as your model will be self-contained in the image
+Once you've done this, you can use `cog push` to build and push the image to a Docker registry:
 
-> When using this method, you should use the `--separate-weights` flag on `cog build` to store weights in a [separate layer](https://github.com/replicate/cog/blob/12ac02091d93beebebed037f38a0c99cd8749806/docs/getting-started.md?plain=1#L219).
+```bash
+cog push
+# Building r8.im/replicate/resnet...
+# Pushing r8.im/replicate/resnet...
+# Pushed!
+```
 
-### `Predictor.predict(**kwargs)`[#](#predictorpredictkwargs "Permanent link")
+The Docker image is now accessible to anyone or any system that has access to this Docker registry.
 
-Run a single prediction.
+> **Note**
+> Model repos often contain large data files, like weights and checkpoints. If you put these files in their own subdirectory and run `cog build` with the `--separate-weights` flag, Cog will copy these files into a separate Docker layer, which reduces the time needed to rebuild after making changes to code.
+>
+> ```shell
+> # ✅ Yes
+> .
+> ├── checkpoints/
+> │   └── weights.ckpt
+> ├── predict.py
+> └── cog.yaml
+>
+> # ❌ No
+> .
+> ├── weights.ckpt # <- Don't put weights in root directory
+> ├── predict.py
+> └── cog.yaml
+>
+> # ❌ No
+> .
+> ├── checkpoints/
+> │   ├── weights.ckpt
+> │   └── load_weights.py # <- Don't put code in weights directory
+> ├── predict.py
+> └── cog.yaml
+> ```
+
+## Next steps
 
-This _required_ method is where you call the model that was loaded during `setup()`, but you may also want to add pre- and post-processing code here.
+Those are the basics! Next, you might want to take a look at:
 
-The `predict()` method takes an arbitrary list of named arguments, where each argument name must correspond to an [`Input()`](#inputkwargs) annotation.
+- [A guide to help you set up your own model on Cog.](getting-started-own-model.md)
+- [A guide explaining how to deploy a model.](deploy.md)
+- [Reference for `cog.yaml`](yaml.md)
+- [Reference for the Python library](python.md)
 
-`predict()` can return strings, numbers, [`cog.Path`](#path) objects representing files on disk, or lists or dicts of those types. You can also define a custom [`Output()`](#outputbasemodel) for more complex return types.
 
-#### Streaming output[#](#streaming-output "Permanent link")
 
-Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output a images they are being generated.
 
-To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[<type>]` where `<type>` can be one of `str`, `int`, `float`, `bool`, `cog.File`, or `cog.Path`.
 
-`from cog import BasePredictor, Path from typing import Iterator  class Predictor(BasePredictor):     def predict(self) -> Iterator[Path]:         done = False         while not done:             output_path, done = do_stuff()             yield Path(output_path)`
+---
 
-If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings.
 
-`from cog import BasePredictor, Path, ConcatenateIterator  class Predictor(BasePredictor):     def predict(self) -> ConcatenateIterator[str]:         tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]         for token in tokens:             yield token + " "`
 
-`Input(**kwargs)`[#](#inputkwargs "Permanent link")
----------------------------------------------------
 
-Use cog's `Input()` function to define each of the parameters in your `predict()` method:
 
-`class Predictor(BasePredictor):     def predict(self,             image: Path = Input(description="Image to enlarge"),             scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0)     ) -> Path:`
+# HTTP API
 
-The `Input()` function takes these keyword arguments:
+> [!TIP]
+> For information about how to run the HTTP server, 
+> see [our documentation to deploying models](deploy.md).
 
-*   `description`: A description of what to pass to this input for users of the model.
-*   `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
-*   `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
-*   `le`: For `int` or `float` types, the value must be less than or equal to this number.
-*   `min_length`: For `str` types, the minimum length of the string.
-*   `max_length`: For `str` types, the maximum length of the string.
-*   `regex`: For `str` types, the string must match this regular expression.
-*   `choices`: For `str` or `int` types, a list of possible values for this input.
+When you run a Docker image built by Cog, 
+it serves an HTTP API for making predictions.
 
-Each parameter of the `predict()` method must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](#input-and-output-types) for the full list of supported types.
+The server supports both synchronous and asynchronous prediction creation:
 
-Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
+- **Synchronous**: 
+  The server waits until the prediction is completed
+  and responds with the result.
+- **Asynchronous**: 
+  The server immediately returns a response 
+  and processes the prediction in the background. 
+
+The client can create a prediction asynchronously
+by setting the `Prefer: respond-async` header in their request.
+When provided, the server responds immediately after starting the prediction 
+with `202 Accepted` status and a prediction object in status `processing`.
+
+> [!NOTE]
+> The only supported way to receive updates on the status of predictions
+> started asynchronously is using [webhooks](#webhooks). 
+> Polling for prediction status is not currently supported.
+
+You can also use certain server endpoints to create predictions idempotently,
+such that if a client calls this endpoint more than once with the same ID 
+(for example, due to a network interruption) 
+while the prediction is still running, 
+no new prediction is created. 
+Instead, the client receives a `202 Accepted` response
+with the initial state of the prediction.
+
+---
 
-`class Predictor(BasePredictor):     def predict(self,         prompt: str = "default prompt", # this is valid         iterations: int                 # also valid     ) -> str:         # ...`
+Here's a summary of the prediction creation endpoints:
 
-Output[#](#output "Permanent link")
------------------------------------
+| Endpoint                           | Header                  | Behavior                     |
+| ---------------------------------- | ----------------------- | ---------------------------- |
+| `POST /predictions`                | -                       | Synchronous, non-idempotent  |
+| `POST /predictions`                | `Prefer: respond-async` | Asynchronous, non-idempotent |
+| `PUT /predictions/<prediction_id>` | -                       | Synchronous, idempotent      |
+| `PUT /predictions/<prediction_id>` | `Prefer: respond-async` | Asynchronous, idempotent     |
 
-Cog predictors can return a simple data type like a string, number, float, or boolean. Use Python's `-> <type>` syntax to annotate the return type.
+Choose the endpoint that best fits your needs:
 
-Here's an example of a predictor that returns a string:
+- Use synchronous endpoints when you want to wait for the prediction result.
+- Use asynchronous endpoints when you want to start a prediction 
+  and receive updates via webhooks.
+- Use idempotent endpoints when you need to safely retry requests 
+  without creating duplicate predictions.
+
+## Webhooks
+
+You can provide a `webhook` parameter in the client request body
+when creating a prediction.
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+Prefer: respond-async
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"},
+    "webhook": "https://example.com/webhook/prediction"
+}
+```
+
+The server makes requests to the provided URL
+with the current state of the prediction object in the request body
+at the following times.
+
+- `start`: 
+  Once, when the prediction starts
+  (`status` is `starting`).
+- `output`: 
+  Each time a predict function generates an output 
+  (either once using `return` or multiple times using `yield`)
+- `logs`: 
+  Each time the predict function writes to `stdout`
+- `completed`: 
+  Once, when the prediction reaches a terminal state 
+  (`status` is `succeeded`, `canceled`, or `failed`)
+
+Webhook requests for `start` and `completed` event types 
+are sent immediately.
+Webhook requests for `output` and `logs` event types 
+are sent at most once every 500ms.
+This interval is not configurable.
+
+By default, the server sends requests for all event types. 
+Clients can specify which events trigger webhook requests 
+with the `webhook_events_filter` parameter in the prediction request body. 
+For example,
+the following request specifies that webhooks are sent by the server
+only at the start and end of the prediction:
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+Prefer: respond-async
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"},
+    "webhook": "https://example.com/webhook/prediction",
+    "webhook_events_filter": ["start", "completed"]
+}
+```
+
+## Generating unique prediction IDs
+
+Endpoints for creating and canceling a prediction idempotently
+accept a `prediction_id` parameter in their path.
+The server can run only one prediction at a time.
+The client must ensure that running prediction is complete
+before creating a new one with a different ID.
+
+Clients are responsible for providing unique prediction IDs.
+We recommend generating a UUIDv4 or [UUIDv7](https://uuid7.com),
+base32-encoding that value,
+and removing padding characters (`==`).
+This produces a random identifier that is 26 ASCII characters long.
+
+```python
+>> from uuid import uuid4
+>> from base64 import b32encode
+>> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=')
+'wjx3whax6rf4vphkegkhcvpv6a'
+```
+
+## File uploads
+
+A model's `predict` function can produce file output by yielding or returning
+a `cog.Path` or `cog.File` value.
+
+By default,
+files are returned as a base64-encoded 
+[data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs).
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"},
+}
+```
+
+```http
+HTTP/1.1 200 OK
+Content-Type: application/json
+
+{
+    "status": "succeeded",
+    "output": "data:image/png;base64,..."
+}
+```
+
+When creating a prediction synchronously,
+the client can configure a base URL to upload output files to instead
+by setting the `output_file_prefix` parameter in the request body:
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"},
+    "output_file_prefix": "https://example.com/upload",
+}
+```
+
+When the model produces a file output,
+the server sends the following request to upload the file to the configured URL:
+
+```http
+PUT /upload HTTP/1.1
+Host: example.com
+Content-Type: multipart/form-data
+
+--boundary
+Content-Disposition: form-data; name="file"; filename="image.png"
+Content-Type: image/png
+
+<binary data>
+--boundary--
+```
 
-`from cog import BasePredictor  class Predictor(BasePredictor):     def predict(self) -> str:         return "hello"`
+If the upload succeeds, the server responds with output:
 
-### Returning an object[#](#returning-an-object "Permanent link")
+```http
+HTTP/1.1 200 OK
+Content-Type: application/json
 
-To return a complex object with multiple values, define an `Output` object with multiple fields to return from your `predict()` method:
+{
+    "status": "succeeded",
+    "output": "http://example.com/upload/image.png"
+}
+```
 
-`from cog import BasePredictor, BaseModel, File  class Output(BaseModel):     file: File     text: str  class Predictor(BasePredictor):     def predict(self) -> Output:         return Output(text="hello", file=io.StringIO("hello"))`
+If the upload fails, the server responds with an error.
 
-Each of the output object's properties must be one of the supported output types. For the full list, see [Input and output types](#input-and-output-types). Also, make sure to name the output class as `Output` and nothing else.
+> [!IMPORTANT]  
+> File uploads for predictions created asynchronously 
+> require `--upload-url` to be specified when starting the HTTP server.
 
-### Returning a list[#](#returning-a-list "Permanent link")
+<a id="api"></a>
 
-The `predict()` method can return a list of any of the supported output types. Here's an example that outputs multiple files:
+## Endpoints
 
-`from cog import BasePredictor, Path  class Predictor(BasePredictor):     def predict(self) -> list[Path]:         predictions = ["foo", "bar", "baz"]         output = []         for i, prediction in enumerate(predictions):             out_path = Path(f"/tmp/out-{i}.txt")             with out_path.open("w") as f:                 f.write(prediction)             output.append(out_path)         return output`
+### `GET /openapi.json`
 
-Files are named in the format `output.<index>.<extension>`, e.g. `output.0.txt`, `output.1.txt`, and `output.2.txt` from the example above.
+The [OpenAPI](https://swagger.io/specification/) specification of the API, 
+which is derived from the input and output types specified in your model's 
+[Predictor](python.md) and [Training](training.md) objects.
 
-### Optional properties[#](#optional-properties "Permanent link")
+### `POST /predictions`
 
-To conditionally omit properties from the Output object, define them using `typing.Optional`:
+Makes a single prediction.
 
-`from cog import BaseModel, BasePredictor, Path from typing import Optional  class Output(BaseModel):     score: Optional[float]     file: Optional[Path]  class Predictor(BasePredictor):     def predict(self) -> Output:         if condition:             return Output(score=1.5)         else:             return Output(file=io.StringIO("hello"))`
+The request body is a JSON object with the following fields:
 
-Input and output types[#](#input-and-output-types "Permanent link")
--------------------------------------------------------------------
+- `input`: 
+  A JSON object with the same keys as the 
+  [arguments to the `predict()` function](python.md).
+  Any `File` or `Path` inputs are passed as URLs.
 
-Each parameter of the `predict()` method must be annotated with a type. The method's return type must also be annotated. The supported types are:
+The response body is a JSON object with the following fields:
 
-*   `str`: a string
-*   `int`: an integer
-*   `float`: a floating point number
-*   `bool`: a boolean
-*   [`cog.File`](#file): a file-like object representing a file
-*   [`cog.Path`](#path): a path to a file on disk
-*   [`cog.Secret`](#secret): a string containing sensitive information
+- `status`: Either `succeeded` or `failed`.
+- `output`: The return value of the `predict()` function.
+- `error`: If `status` is `failed`, the error message.
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+
+{
+    "input": {
+        "image": "https://example.com/image.jpg",
+        "text": "Hello world!"
+    }
+}
+```
+
+```http
+HTTP/1.1 200 OK
+Content-Type: application/json
+
+{
+    "status": "succeeded",
+    "output": "data:image/png;base64,..."
+}
+```
+
+If the client sets the `Prefer: respond-async` header in their request,
+the server responds immediately after starting the prediction 
+with `202 Accepted` status and a prediction object in status `processing`.
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+Prefer: respond-async
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"}
+}
+```
+
+```http
+HTTP/1.1 202 Accepted
+Content-Type: application/json
+
+{
+    "status": "starting",
+}
+```
+
+### `PUT /predictions/<prediction_id>`
+
+Make a single prediction.
+This is the idempotent version of the `POST /predictions` endpoint.
+
+```http
+PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
+Content-Type: application/json; charset=utf-8
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"}
+}
+```
+
+```http
+HTTP/1.1 200 OK
+Content-Type: application/json
+
+{
+    "status": "succeeded",
+    "output": "data:image/png;base64,..."
+}
+```
+
+If the client sets the `Prefer: respond-async` header in their request,
+the server responds immediately after starting the prediction 
+with `202 Accepted` status and a prediction object in status `processing`.
+
+```http
+PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
+Content-Type: application/json; charset=utf-8
+Prefer: respond-async
+
+{
+    "input": {"prompt": "A picture of an onion with sunglasses"}
+}
+```
+
+```http
+HTTP/1.1 202 Accepted
+Content-Type: application/json
+
+{
+    "id": "wjx3whax6rf4vphkegkhcvpv6a",
+    "status": "starting"
+}
+```
+
+### `POST /predictions/<prediction_id>/cancel`
+
+A client can cancel an asynchronous prediction by making a
+`POST /predictions/<prediction_id>/cancel` request
+using the prediction `id` provided when the prediction was created.
+
+For example, 
+if the client creates a prediction by sending the request:
+
+```http
+POST /predictions HTTP/1.1
+Content-Type: application/json; charset=utf-8
+Prefer: respond-async
+
+{
+    "id": "abcd1234",
+    "input": {"prompt": "A picture of an onion with sunglasses"},
+}
+```
 
-`File()`[#](#file "Permanent link")
------------------------------------
+The client can cancel the prediction by sending the request:
 
-> \[!WARNING\]  
-> `cog.File` is deprecated and will be removed in a future version of Cog. Use [`cog.Path`](#path) instead.
+```http
+POST /predictions/abcd1234/cancel HTTP/1.1
+```
 
-The `cog.File` object is used to get files in and out of models. It represents a _file handle_.
+A prediction cannot be canceled if it's
+created synchronously, without the `Prefer: respond-async` header,
+or created without a provided `id`.
 
-For models that return a `cog.File` object, the prediction output returned by Cog's built-in HTTP server will be a URL.
+If a prediction exists with the provided `id`,
+the server responds with status `200 OK`.
+Otherwise, the server responds with status `404 Not Found`.
 
-`from cog import BasePredictor, File, Input, Path from PIL import Image  class Predictor(BasePredictor):     def predict(self, source_image: File = Input(description="Image to enlarge")) -> File:         pillow_img = Image.open(source_image)         upscaled_image = do_some_processing(pillow_img)         return File(upscaled_image)`
+When a prediction is canceled,
+Cog raises `cog.server.exceptions.CancelationException`
+in the model's `predict` function.
+This exception may be caught by the model to perform necessary cleanup.
+The cleanup should be brief, ideally completing within a few seconds.
+After cleanup, the exception must be re-raised using a bare raise statement.
+Failure to re-raise the exception may result in the termination of the container.
 
-`Path()`[#](#path "Permanent link")
------------------------------------
+```python
+from cog import Path
+from cog.server.exceptions import CancelationException
 
-The `cog.Path` object is used to get files in and out of models. It represents a _path to a file on disk_.
+def predict(image: Path) -> Path:
+    try:
+        return process(image)
+    except CancelationException as e:
+        cleanup() 
+        raise e
+```
 
-`cog.Path` is a subclass of Python's [`pathlib.Path`](https://docs.python.org/3/library/pathlib.html#basic-use) and can be used as a drop-in replacement.
 
-For models that return a `cog.Path` object, the prediction output returned by Cog's built-in HTTP server will be a URL.
 
-This example takes an input file, resizes it, and returns the resized image:
 
-``import tempfile from cog import BasePredictor, Input, Path  class Predictor(BasePredictor):     def predict(self, image: Path = Input(description="Image to enlarge")) -> Path:         upscaled_image = do_some_processing(image)          # To output `cog.Path` objects the file needs to exist, so create a temporary file first.         # This file will automatically be deleted by Cog after it has been returned.         output_path = Path(tempfile.mkdtemp()) / "upscaled.png"         upscaled_image.save(output_path)         return Path(output_path)``
 
-`Secret`[#](#secret "Permanent link")
--------------------------------------
+---
 
-The `cog.Secret` type is used to signify that an input holds sensitive information, like a password or API token.
 
-`cog.Secret` is a subclass of Pydantic's [`SecretStr`](https://docs.pydantic.dev/latest/api/types/#pydantic.types.SecretStr). Its default string representation redacts its contents to prevent accidental disclure. You can access its contents with the `get_secret_value()` method.
 
-`from cog import BasePredictor, Secret  class Predictor(BasePredictor):     def predict(self, api_token: Secret) -> None:         # Prints '**********'         print(api_token)                  # Use get_secret_value method to see the secret's content.         print(api_token.get_secret_value())`
 
-A predictor's `Secret` inputs are represented in OpenAPI with the following schema:
 
-`{   "type": "string",   "format": "password",   "x-cog-secret": true, }`
+# Notebooks
 
-Models uploaded to Replicate treat secret inputs differently throughout its system. When you create a prediction on Replicate, any value passed to a `Secret` input is redacted after being sent to the model.
+Cog plays nicely with Jupyter notebooks.
 
-> \[!WARNING\]  
-> Passing secret values to untrusted models can result in unintended disclosure, exfiltration, or misuse of sensitive data.
+## Install the jupyterlab Python package
 
-`List`[#](#list "Permanent link")
----------------------------------
+First, add `jupyterlab` to the `python_packages` array in your [`cog.yaml`](yaml.md) file:
 
-The List type is also supported in inputs. It can hold any supported type.
+```yaml
+build:
+  python_packages:
+    - "jupyterlab==3.3.4"
+```
 
-Example for **List\[Path\]**:
 
-`class Predictor(BasePredictor):    def predict(self, paths: list[Path]) -> str:        output_parts = []  # Use a list to collect file contents        for path in paths:            with open(path) as f:              output_parts.append(f.read())        return "".join(output_parts)`
+## Run a notebook
 
-The corresponding cog command:
+Cog can run notebooks in the environment you've defined in `cog.yaml` with the following command:
 
-`$ echo test1 > 1.txt $ echo test2 > 2.txt $ cog predict -i paths=@1.txt -i paths=@2.txt Running prediction... test1  test2`
+```sh
+cog run -p 8888 jupyter lab --allow-root --ip=0.0.0.0
+```
 
-\- Note the repeated inputs with the same name "paths" which constitute the list</content>
-</page>
+## Use notebook code in your predictor
 
-<page>
-  <title>Environment variables - Cog</title>
-  <url>https://cog.run/environment/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/environment.md "Edit this page")
+You can also import a notebook into your Cog [Predictor](python.md) file.
 
-This guide lists the environment variables that change how Cog functions.
+First, export your notebook to a Python file:
 
-### `COG_NO_UPDATE_CHECK`[#](#cog_no_update_check "Permanent link")
+```sh
+jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py
+```
 
-By default, Cog automatically checks for updates and notifies you if there is a new version available.
+Then import the exported Python script into your `predict.py` file. Any functions or variables defined in your notebook will be available to your predictor:
 
-To disable this behavior, set the `COG_NO_UPDATE_CHECK` environment variable to any value.
+```python
+from cog import BasePredictor, Input
 
-`$ COG_NO_UPDATE_CHECK=1 cog build  # runs without automatic update check`</content>
-</page>
+import my_notebook
 
-<page>
-  <title>Notebooks - Cog</title>
-  <url>https://cog.run/notebooks/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/notebooks.md "Edit this page")
+class Predictor(BasePredictor):
+    def predict(self, prompt: str = Input(description="string prompt")) -> str:
+      output = my_notebook.do_stuff(prompt)
+      return output
+```
 
-Cog plays nicely with Jupyter notebooks.
 
-Install the jupyterlab Python package[#](#install-the-jupyterlab-python-package "Permanent link")
--------------------------------------------------------------------------------------------------
 
-First, add `jupyterlab` to the `python_packages` array in your [`cog.yaml`](https://cog.run/yaml/) file:
 
-`build:   python_packages:     - "jupyterlab==3.3.4"`
 
-Run a notebook[#](#run-a-notebook "Permanent link")
----------------------------------------------------
+---
 
-Cog can run notebooks in the environment you've defined in `cog.yaml` with the following command:
 
-`cog run -p 8888 jupyter lab --allow-root --ip=0.0.0.0`
 
-Use notebook code in your predictor[#](#use-notebook-code-in-your-predictor "Permanent link")
----------------------------------------------------------------------------------------------
 
-You can also import a notebook into your Cog [Predictor](https://cog.run/python/) file.
 
-First, export your notebook to a Python file:
+# Private package registry
 
-`jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py`
+This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup.
 
-Then import the exported Python script into your `predict.py` file. Any functions or variables defined in your notebook will be available to your predictor:
+## `pip.conf`
+
+In a directory outside your Cog project, create a `pip.conf` file with an `index-url` set to the registry's URL with embedded credentials.
 
-`from cog import BasePredictor, Input  import my_notebook  class Predictor(BasePredictor):     def predict(self, prompt: str = Input(description="string prompt")) -> str:       output = my_notebook.do_stuff(prompt)       return output`</content>
-</page>
+```conf
+[global]
+index-url = https://username:password@my-private-registry.com
+```
 
-<page>
-  <title>Private registry - Cog</title>
-  <url>https://cog.run/private-package-registry/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/private-package-registry.md "Edit this page")
+> **Warning**
+> Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in `.gitignore` and `.dockerignore`.
 
-Private package registry[#](#private-package-registry "Permanent link")
------------------------------------------------------------------------
+## `cog.yaml`
 
-This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup.
+In your project's [`cog.yaml`](yaml.md) file, add a setup command to run `pip install` with a secret configuration file mounted to `/etc/pip.conf`.
 
-`pip.conf`[#](#pipconf "Permanent link")
-----------------------------------------
+```yaml
+build:
+  run:
+    - command: pip install
+      mounts:
+        - type: secret
+          id: pip
+          target: /etc/pip.conf
+```
 
-In a directory outside your Cog project, create a `pip.conf` file with an `index-url` set to the registry's URL with embedded credentials.
+## Build
 
-> **Warning** Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in `.gitignore` and `.dockerignore`.
+When building or pushing your model with Cog, pass the `--secret` option with an `id` matching the one specified in `cog.yaml`, along with a path to your local `pip.conf` file.
 
-`cog.yaml`[#](#cogyaml "Permanent link")
-----------------------------------------
+```console
+$ cog build --secret id=pip,source=/path/to/pip.conf
+```
 
-In your project's [`cog.yaml`](https://cog.run/yaml/) file, add a setup command to run `pip install` with a secret configuration file mounted to `/etc/pip.conf`.
+Using a secret mount allows the private registry credentials to be securely passed to the `pip install` setup command, without baking them into the Docker image.
 
-`build:   run:     - command: pip install       mounts:         - type: secret           id: pip           target: /etc/pip.conf`
+> **Warning**
+> If you run `cog build` or `cog push` and then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change the `id` value in `cog.yaml` and the `--secret` option, or pass the `--no-cache` option to bypass the cache entirely.
 
-Build[#](#build "Permanent link")
----------------------------------
 
-When building or pushing your model with Cog, pass the `--secret` option with an `id` matching the one specified in `cog.yaml`, along with a path to your local `pip.conf` file.
 
-`$ cog build --secret id=pip,source=/path/to/pip.conf`
 
-Using a secret mount allows the private registry credentials to be securely passed to the `pip install` setup command, without baking them into the Docker image.
 
-> **Warning** If you run `cog build` or `cog push` and then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change the `id` value in `cog.yaml` and the `--secret` option, or pass the `--no-cache` option to bypass the cache entirely.</content>
-</page>
+---
 
-<page>
-  <title>Email Protection | Cloudflare</title>
-  <url>https://cog.run/cdn-cgi/l/email-protection#d4a0b1b5b994a6b1a4b8bdb7b5a0b1fab7bbb9</url>
-  <content>Please enable cookies.
 
-You are unable to access this email address cog.run
----------------------------------------------------
 
-The website from which you got to this page is protected by Cloudflare. Email addresses on that page have been hidden in order to keep them from being accessed by malicious bots. **You must enable Javascript in your browser in order to decode the e-mail address**.
 
-If you have a website and are interested in protecting it in a similar way, you can [sign up for Cloudflare](https://www.cloudflare.com/sign-up?utm_source=email_protection).
 
-*   [How does Cloudflare protect email addresses on website from spammers?](https://support.cloudflare.com/hc/en-us/articles/200170016-What-is-Email-Address-Obfuscation-)
-*   [Can I sign up for Cloudflare?](https://support.cloudflare.com/hc/en-us/categories/200275218-Getting-Started)
+# Prediction interface reference
 
-Cloudflare Ray ID: **9065855db901ceb9** • Your IP: 192.184.142.173 • Performance & security by [Cloudflare](https://www.cloudflare.com/5xx-error-landing)</content>
-</page>
+This document defines the API of the `cog` Python module, which is used to define the interface for running predictions on your model.
 
-<page>
-  <title>HTTP API - Cog</title>
-  <url>https://cog.run/http/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/http.md "Edit this page")
+> [!TIP]
+> Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `predict.py` file that can be used as a starting point for setting up your model.
+
+> [!TIP]
+> Using a language model to help you write the code for your new Cog model?
+>
+> Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org).
+
+## Contents
+
+- [Contents](#contents)
+- [`BasePredictor`](#basepredictor)
+  - [`Predictor.setup()`](#predictorsetup)
+  - [`Predictor.predict(**kwargs)`](#predictorpredictkwargs)
+    - [Streaming output](#streaming-output)
+- [`Input(**kwargs)`](#inputkwargs)
+- [Output](#output)
+  - [Returning an object](#returning-an-object)
+  - [Returning a list](#returning-a-list)
+  - [Optional properties](#optional-properties)
+- [Input and output types](#input-and-output-types)
+- [`File()`](#file)
+- [`Path()`](#path)
+- [`Secret`](#secret)
+- [`List`](#list)
+
+## `BasePredictor`
 
-> \[!TIP\] For information about how to run the HTTP server, see [our documentation to deploying models](https://cog.run/deploy/).
+You define how Cog runs predictions on your model by defining a class that inherits from `BasePredictor`. It looks something like this:
 
-When you run a Docker image built by Cog, it serves an HTTP API for making predictions.
+```python
+from cog import BasePredictor, Path, Input
+import torch
+
+class Predictor(BasePredictor):
+    def setup(self):
+        """Load the model into memory to make running multiple predictions efficient"""
+        self.model = torch.load("weights.pth")
+
+    def predict(self,
+            image: Path = Input(description="Image to enlarge"),
+            scale: float = Input(description="Factor to scale image by", default=1.5)
+    ) -> Path:
+        """Run a single prediction on the model"""
+        # ... pre-processing ...
+        output = self.model(image)
+        # ... post-processing ...
+        return output
+```
 
-The server supports both synchronous and asynchronous prediction creation:
+Your Predictor class should define two methods: `setup()` and `predict()`.
 
-*   **Synchronous**: The server waits until the prediction is completed and responds with the result.
-*   **Asynchronous**: The server immediately returns a response and processes the prediction in the background.
+### `Predictor.setup()`
 
-The client can create a prediction asynchronously by setting the `Prefer: respond-async` header in their request. When provided, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`.
+Prepare the model so multiple predictions run efficiently.
 
-> \[!NOTE\] The only supported way to receive updates on the status of predictions started asynchronously is using [webhooks](#webhooks). Polling for prediction status is not currently supported.
+Use this _optional_ method to include any expensive one-off operations in here like loading trained models, instantiate data transformations, etc.
 
-You can also use certain server endpoints to create predictions idempotently, such that if a client calls this endpoint more than once with the same ID (for example, due to a network interruption) while the prediction is still running, no new prediction is created. Instead, the client receives a `202 Accepted` response with the initial state of the prediction.
+Many models use this method to download their weights (e.g. using [`pget`](https://github.com/replicate/pget)). This has some advantages:
 
-* * *
+- Smaller image sizes
+- Faster build times
+- Faster pushes and inference on [Replicate](https://replicate.com)
 
-Here's a summary of the prediction creation endpoints:
+However, this may also significantly increase your `setup()` time.
 
-| Endpoint | Header | Behavior |
-| --- | --- | --- |
-| `POST /predictions` | \- | Synchronous, non-idempotent |
-| `POST /predictions` | `Prefer: respond-async` | Asynchronous, non-idempotent |
-| `PUT /predictions/<prediction_id>` | \- | Synchronous, idempotent |
-| `PUT /predictions/<prediction_id>` | `Prefer: respond-async` | Asynchronous, idempotent |
+As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your `cog.yaml` and ensure they are not excluded in your `.dockerignore` file.
 
-Choose the endpoint that best fits your needs:
+While this will increase your image size and build time, it offers other advantages:
+
+- Faster `setup()` time
+- Ensures idempotency and reduces your model's reliance on external systems
+- Preserves reproducibility as your model will be self-contained in the image
 
-*   Use synchronous endpoints when you want to wait for the prediction result.
-*   Use asynchronous endpoints when you want to start a prediction and receive updates via webhooks.
-*   Use idempotent endpoints when you need to safely retry requests without creating duplicate predictions.
+> When using this method, you should use the `--separate-weights` flag on `cog build` to store weights in a [separate layer](https://github.com/replicate/cog/blob/12ac02091d93beebebed037f38a0c99cd8749806/docs/getting-started.md?plain=1#L219).
 
-Webhooks[#](#webhooks "Permanent link")
----------------------------------------
+### `Predictor.predict(**kwargs)`
 
-You can provide a `webhook` parameter in the client request body when creating a prediction.
+Run a single prediction.
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async  {     "input": {"prompt": "A picture of an onion with sunglasses"},     "webhook": "https://example.com/webhook/prediction" }`
+This _required_ method is where you call the model that was loaded during `setup()`, but you may also want to add pre- and post-processing code here.
 
-The server makes requests to the provided URL with the current state of the prediction object in the request body at the following times.
+The `predict()` method takes an arbitrary list of named arguments, where each argument name must correspond to an [`Input()`](#inputkwargs) annotation.
 
-*   `start`: Once, when the prediction starts (`status` is `starting`).
-*   `output`: Each time a predict function generates an output (either once using `return` or multiple times using `yield`)
-*   `logs`: Each time the predict function writes to `stdout`
-*   `completed`: Once, when the prediction reaches a terminal state (`status` is `succeeded`, `canceled`, or `failed`)
+`predict()` can return strings, numbers, [`cog.Path`](#path) objects representing files on disk, or lists or dicts of those types. You can also define a custom [`Output()`](#outputbasemodel) for more complex return types.
 
-Webhook requests for `start` and `completed` event types are sent immediately. Webhook requests for `output` and `logs` event types are sent at most once every 500ms. This interval is not configurable.
+#### Streaming output
 
-By default, the server sends requests for all event types. Clients can specify which events trigger webhook requests with the `webhook_events_filter` parameter in the prediction request body. For example, the following request specifies that webhooks are sent by the server only at the start and end of the prediction:
+Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output a images they are being generated.
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async  {     "input": {"prompt": "A picture of an onion with sunglasses"},     "webhook": "https://example.com/webhook/prediction",     "webhook_events_filter": ["start", "completed"] }`
+To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[<type>]` where `<type>` can be one of `str`, `int`, `float`, `bool`, `cog.File`, or `cog.Path`.
 
-Generating unique prediction IDs[#](#generating-unique-prediction-ids "Permanent link")
----------------------------------------------------------------------------------------
+```py
+from cog import BasePredictor, Path
+from typing import Iterator
 
-Endpoints for creating and canceling a prediction idempotently accept a `prediction_id` parameter in their path. The server can run only one prediction at a time. The client must ensure that running prediction is complete before creating a new one with a different ID.
+class Predictor(BasePredictor):
+    def predict(self) -> Iterator[Path]:
+        done = False
+        while not done:
+            output_path, done = do_stuff()
+            yield Path(output_path)
+```
 
-Clients are responsible for providing unique prediction IDs. We recommend generating a UUIDv4 or [UUIDv7](https://uuid7.com/), base32-encoding that value, and removing padding characters (`==`). This produces a random identifier that is 26 ASCII characters long.
+If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings.
 
-`>> from uuid import uuid4 >> from base64 import b32encode >> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=') 'wjx3whax6rf4vphkegkhcvpv6a'`
+```py
+from cog import BasePredictor, Path, ConcatenateIterator
 
-File uploads[#](#file-uploads "Permanent link")
------------------------------------------------
+class Predictor(BasePredictor):
+    def predict(self) -> ConcatenateIterator[str]:
+        tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
+        for token in tokens:
+            yield token + " "
+```
 
-A model's `predict` function can produce file output by yielding or returning a `cog.Path` or `cog.File` value.
+## `Input(**kwargs)`
 
-By default, files are returned as a base64-encoded [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs).
+Use cog's `Input()` function to define each of the parameters in your `predict()` method:
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8  {     "input": {"prompt": "A picture of an onion with sunglasses"}, }`
+```py
+class Predictor(BasePredictor):
+    def predict(self,
+            image: Path = Input(description="Image to enlarge"),
+            scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0)
+    ) -> Path:
+```
 
-`HTTP/1.1 200 OK Content-Type: application/json  {     "status": "succeeded",     "output": "data:image/png;base64,..." }`
+The `Input()` function takes these keyword arguments:
 
-When creating a prediction synchronously, the client can configure a base URL to upload output files to instead by setting the `output_file_prefix` parameter in the request body:
+- `description`: A description of what to pass to this input for users of the model.
+- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
+- `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
+- `le`: For `int` or `float` types, the value must be less than or equal to this number.
+- `min_length`: For `str` types, the minimum length of the string.
+- `max_length`: For `str` types, the maximum length of the string.
+- `regex`: For `str` types, the string must match this regular expression.
+- `choices`: For `str` or `int` types, a list of possible values for this input.
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8  {     "input": {"prompt": "A picture of an onion with sunglasses"},     "output_file_prefix": "https://example.com/upload", }`
+Each parameter of the `predict()` method must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](#input-and-output-types) for the full list of supported types.
 
-When the model produces a file output, the server sends the following request to upload the file to the configured URL:
+Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
 
-`PUT /upload HTTP/1.1 Host: example.com Content-Type: multipart/form-data  --boundary Content-Disposition: form-data; name="file"; filename="image.png" Content-Type: image/png  <binary data> --boundary--`
+```py
+class Predictor(BasePredictor):
+    def predict(self,
+        prompt: str = "default prompt", # this is valid
+        iterations: int                 # also valid
+    ) -> str:
+        # ...
+```
 
-If the upload succeeds, the server responds with output:
+## Output
 
-`HTTP/1.1 200 OK Content-Type: application/json  {     "status": "succeeded",     "output": "http://example.com/upload/image.png" }`
+Cog predictors can return a simple data type like a string, number, float, or boolean. Use Python's `-> <type>` syntax to annotate the return type.
 
-If the upload fails, the server responds with an error.
+Here's an example of a predictor that returns a string:
 
-> \[!IMPORTANT\]  
-> File uploads for predictions created asynchronously require `--upload-url` to be specified when starting the HTTP server.
+```py
+from cog import BasePredictor
 
-Endpoints[#](#endpoints "Permanent link")
------------------------------------------
+class Predictor(BasePredictor):
+    def predict(self) -> str:
+        return "hello"
+```
 
-### `GET /openapi.json`[#](#get-openapijson "Permanent link")
+### Returning an object
 
-The [OpenAPI](https://swagger.io/specification/) specification of the API, which is derived from the input and output types specified in your model's [Predictor](https://cog.run/python/) and [Training](https://cog.run/training/) objects.
+To return a complex object with multiple values, define an `Output` object with multiple fields to return from your `predict()` method:
 
-### `POST /predictions`[#](#post-predictions "Permanent link")
+```py
+from cog import BasePredictor, BaseModel, File
 
-Makes a single prediction.
+class Output(BaseModel):
+    file: File
+    text: str
 
-The request body is a JSON object with the following fields:
+class Predictor(BasePredictor):
+    def predict(self) -> Output:
+        return Output(text="hello", file=io.StringIO("hello"))
+```
 
-*   `input`: A JSON object with the same keys as the [arguments to the `predict()` function](https://cog.run/python/). Any `File` or `Path` inputs are passed as URLs.
+Each of the output object's properties must be one of the supported output types. For the full list, see [Input and output types](#input-and-output-types). Also, make sure to name the output class as `Output` and nothing else.
 
-The response body is a JSON object with the following fields:
+### Returning a list
 
-*   `status`: Either `succeeded` or `failed`.
-*   `output`: The return value of the `predict()` function.
-*   `error`: If `status` is `failed`, the error message.
+The `predict()` method can return a list of any of the supported output types. Here's an example that outputs multiple files:
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8  {     "input": {         "image": "https://example.com/image.jpg",         "text": "Hello world!"     } }`
+```py
+from cog import BasePredictor, Path
+
+class Predictor(BasePredictor):
+    def predict(self) -> list[Path]:
+        predictions = ["foo", "bar", "baz"]
+        output = []
+        for i, prediction in enumerate(predictions):
+            out_path = Path(f"/tmp/out-{i}.txt")
+            with out_path.open("w") as f:
+                f.write(prediction)
+            output.append(out_path)
+        return output
+```
 
-`HTTP/1.1 200 OK Content-Type: application/json  {     "status": "succeeded",     "output": "data:image/png;base64,..." }`
+Files are named in the format `output.<index>.<extension>`, e.g. `output.0.txt`, `output.1.txt`, and `output.2.txt` from the example above.
 
-If the client sets the `Prefer: respond-async` header in their request, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`.
+### Optional properties
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async  {     "input": {"prompt": "A picture of an onion with sunglasses"} }`
+To conditionally omit properties from the Output object, define them using `typing.Optional`:
 
-`HTTP/1.1 202 Accepted Content-Type: application/json  {     "status": "starting", }`
+```py
+from cog import BaseModel, BasePredictor, Path
+from typing import Optional
 
-### `PUT /predictions/<prediction_id>`[#](#put-predictionsprediction_id "Permanent link")
+class Output(BaseModel):
+    score: Optional[float]
+    file: Optional[Path]
 
-Make a single prediction. This is the idempotent version of the `POST /predictions` endpoint.
+class Predictor(BasePredictor):
+    def predict(self) -> Output:
+        if condition:
+            return Output(score=1.5)
+        else:
+            return Output(file=io.StringIO("hello"))
+```
 
-`PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1 Content-Type: application/json; charset=utf-8  {     "input": {"prompt": "A picture of an onion with sunglasses"} }`
+## Input and output types
 
-`HTTP/1.1 200 OK Content-Type: application/json  {     "status": "succeeded",     "output": "data:image/png;base64,..." }`
+Each parameter of the `predict()` method must be annotated with a type. The method's return type must also be annotated. The supported types are:
 
-If the client sets the `Prefer: respond-async` header in their request, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`.
+- `str`: a string
+- `int`: an integer
+- `float`: a floating point number
+- `bool`: a boolean
+- [`cog.File`](#file): a file-like object representing a file
+- [`cog.Path`](#path): a path to a file on disk
+- [`cog.Secret`](#secret): a string containing sensitive information
 
-`PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async  {     "input": {"prompt": "A picture of an onion with sunglasses"} }`
+## `File()`
 
-`HTTP/1.1 202 Accepted Content-Type: application/json  {     "id": "wjx3whax6rf4vphkegkhcvpv6a",     "status": "starting" }`
+> [!WARNING]  
+> `cog.File` is deprecated and will be removed in a future version of Cog. Use [`cog.Path`](#path) instead.
 
-### `POST /predictions/<prediction_id>/cancel`[#](#post-predictionsprediction_idcancel "Permanent link")
+The `cog.File` object is used to get files in and out of models. It represents a _file handle_.
 
-A client can cancel an asynchronous prediction by making a `POST /predictions/<prediction_id>/cancel` request using the prediction `id` provided when the prediction was created.
+For models that return a `cog.File` object, the prediction output returned by Cog's built-in HTTP server will be a URL.
 
-For example, if the client creates a prediction by sending the request:
+```python
+from cog import BasePredictor, File, Input, Path
+from PIL import Image
 
-`POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async  {     "id": "abcd1234",     "input": {"prompt": "A picture of an onion with sunglasses"}, }`
+class Predictor(BasePredictor):
+    def predict(self, source_image: File = Input(description="Image to enlarge")) -> File:
+        pillow_img = Image.open(source_image)
+        upscaled_image = do_some_processing(pillow_img)
+        return File(upscaled_image)
+```
 
-The client can cancel the prediction by sending the request:
+## `Path()`
 
-`POST /predictions/abcd1234/cancel HTTP/1.1`
+The `cog.Path` object is used to get files in and out of models. It represents a _path to a file on disk_.
 
-A prediction cannot be canceled if it's created synchronously, without the `Prefer: respond-async` header, or created without a provided `id`.
+`cog.Path` is a subclass of Python's [`pathlib.Path`](https://docs.python.org/3/library/pathlib.html#basic-use) and can be used as a drop-in replacement.
 
-If a prediction exists with the provided `id`, the server responds with status `200 OK`. Otherwise, the server responds with status `404 Not Found`.
+For models that return a `cog.Path` object, the prediction output returned by Cog's built-in HTTP server will be a URL.
 
-When a prediction is canceled, Cog raises `cog.server.exceptions.CancelationException` in the model's `predict` function. This exception may be caught by the model to perform necessary cleanup. The cleanup should be brief, ideally completing within a few seconds. After cleanup, the exception must be re-raised using a bare raise statement. Failure to re-raise the exception may result in the termination of the container.
+This example takes an input file, resizes it, and returns the resized image:
 
-`from cog import Path from cog.server.exceptions import CancelationException  def predict(image: Path) -> Path:     try:         return process(image)     except CancelationException as e:         cleanup()          raise e`</content>
-</page>
+```python
+import tempfile
+from cog import BasePredictor, Input, Path
 
-<page>
-  <title>Windows - Cog</title>
-  <url>https://cog.run/wsl2/wsl2/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/wsl2/wsl2.md "Edit this page")
+class Predictor(BasePredictor):
+    def predict(self, image: Path = Input(description="Image to enlarge")) -> Path:
+        upscaled_image = do_some_processing(image)
 
-Using `cog` on Windows 11 with WSL 2[#](#using-cog-on-windows-11-with-wsl-2 "Permanent link")
----------------------------------------------------------------------------------------------
+        # To output `cog.Path` objects the file needs to exist, so create a temporary file first.
+        # This file will automatically be deleted by Cog after it has been returned.
+        output_path = Path(tempfile.mkdtemp()) / "upscaled.png"
+        upscaled_image.save(output_path)
+        return Path(output_path)
+```
 
-*   [0\. Prerequisites](#0-prerequisites)
-*   [1\. Install the GPU driver](#1-install-the-gpu-driver)
-*   [2\. Unlocking features](#2-unlocking-features)
-*   [2.1. Unlock WSL2](#21-unlock-wsl2)
-*   [2.2. Unlock virtualization](#22-unlock-virtualization)
-*   [2.3. Reboot](#23-reboot)
-*   [3\. Update MS Linux kernel](#3-update-ms-linux-kernel)
-*   [4\. Configure WSL 2](#4-configure-wsl-2)
-*   [5\. Configure CUDA WSL-Ubuntu Toolkit](#5-configure-cuda-wsl-ubuntu-toolkit)
-*   [6\. Install Docker](#6-install-docker)
-*   [7\. Install `cog` and pull an image](#7-install-cog-and-pull-an-image)
-*   [8\. Run a model in WSL 2](#8-run-a-model-in-wsl-2)
-*   [9\. References](#9-references)
+## `Secret`
 
-Running cog on Windows is now possible thanks to WSL 2. Follow this guide to enable WSL 2 and GPU passthrough on Windows 11.
+The `cog.Secret` type is used to signify that an input holds sensitive information,
+like a password or API token.
 
-**Windows 10 is not officially supported, as you need to be on an insider build in order to use GPU passthrough.**
+`cog.Secret` is a subclass of Pydantic's [`SecretStr`](https://docs.pydantic.dev/latest/api/types/#pydantic.types.SecretStr).
+Its default string representation redacts its contents to prevent accidental disclure.
+You can access its contents with the `get_secret_value()` method.
 
-0\. Prerequisites[#](#0-prerequisites "Permanent link")
--------------------------------------------------------
+```python
+from cog import BasePredictor, Secret
 
-Before beginning installation, make sure you have:
 
-*   Windows 11.
-*   NVIDIA GPU.
-*   RTX 2000/3000 series
-*   Kesler/Tesla/Volta/Ampere series
-*   Other configurations are not guaranteed to work.
+class Predictor(BasePredictor):
+    def predict(self, api_token: Secret) -> None:
+        # Prints '**********'
+        print(api_token)        
 
-1\. Install the GPU driver[#](#1-install-the-gpu-driver "Permanent link")
--------------------------------------------------------------------------
+        # Use get_secret_value method to see the secret's content.
+        print(api_token.get_secret_value())
+```
 
-Per NVIDIA, the first order of business is to install the latest Game Ready drivers for you NVIDIA GPU.
+A predictor's `Secret` inputs are represented in OpenAPI with the following schema:
 
-[https://www.nvidia.com/download/index.aspx](https://www.nvidia.com/download/index.aspx)
+```json
+{
+  "type": "string",
+  "format": "password",
+  "x-cog-secret": true,
+}
+```
 
-I have an NVIDIA RTX 2070 Super, so filled out the form as such:
+Models uploaded to Replicate treat secret inputs differently throughout its system.
+When you create a prediction on Replicate,
+any value passed to a `Secret` input is redacted after being sent to the model.
 
-Click "search", and follow the dialogue to download and install the driver.
+> [!WARNING]  
+> Passing secret values to untrusted models can result in 
+> unintended disclosure, exfiltration, or misuse of sensitive data.
 
-Restart your computer once the driver has finished installation.
+## `List`
 
-2\. Unlocking features[#](#2-unlocking-features "Permanent link")
------------------------------------------------------------------
+The List type is also supported in inputs. It can hold any supported type.
 
-Open Windows Terminal as an administrator.
+Example for **List[Path]**:
+```py
+class Predictor(BasePredictor):
+   def predict(self, paths: list[Path]) -> str:
+       output_parts = []  # Use a list to collect file contents
+       for path in paths:
+           with open(path) as f:
+             output_parts.append(f.read())
+       return "".join(output_parts)
+```
+The corresponding cog command:
+```bash
+$ echo test1 > 1.txt
+$ echo test2 > 2.txt
+$ cog predict -i paths=@1.txt -i paths=@2.txt
+Running prediction...
+test1
 
-*   Use start to search for "Terminal"
-*   Right click -> Run as administrator...
+test2
+```
+- Note the repeated inputs with the same name "paths" which constitute the list
 
-Run the following powershell command to enable the Windows Subsystem for Linux and Virtual Machine Platform capabilities.
 
-### 2.1. Unlock WSL2[#](#21-unlock-wsl2 "Permanent link")
 
-`dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart`
 
-If you see an error about permissions, make sure the terminal you are using is run as an administrator and that you have an account with administrator-level privileges.
 
-### 2.2. Unlock virtualization[#](#22-unlock-virtualization "Permanent link")
+---
 
-`dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart`
 
-If this command fails, make sure to [enable virtualization capabilities](https://docs.microsoft.com/en-us/windows/wsl/troubleshooting#error-0x80370102-the-virtual-machine-could-not-be-started-because-a-required-feature-is-not-installed) in your computer's BIOS/UEFI. A successful output will print `The operation completed successfully.`
 
-### 2.3. Reboot[#](#23-reboot "Permanent link")
 
-Before moving forward, make sure you reboot your computer so that Windows 11 will have WSL2 and virtualization available to it.
 
-3\. Update MS Linux kernel[#](#3-update-ms-linux-kernel "Permanent link")
--------------------------------------------------------------------------
+# Redis queue API
 
-Download and run the [WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi) msi installer. When prompted for elevated permissions, click 'yes' to approve the installation.
+> **Note:** The redis queue API is no longer supported and has been removed from Cog.
 
-To ensure you are using the correct WSL kernel, `open Windows Terminal as an administrator` and enter:
 
-This will return a complicated string such as:
 
-`Linux version 5.10.102.1-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220)`
 
-The version we are interested in is `Linux version 5.10.102.1`. At this point, you should have updated your kernel to be at least `Linux version 5.10.43.3`.
 
-If you can't get the correct kernel version to show:
+---
 
-Open `Settings` → `Windows Update` → `Advanced options` and ensure `Receive updates for other Microsoft products` is enabled. Then go to `Windows Update` again and click `Check for updates`.
 
-4\. Configure WSL 2[#](#4-configure-wsl-2 "Permanent link")
------------------------------------------------------------
 
-First, configure Windows to use the virtualization-based version of WSL (version 2) by default. In a Windows Terminal with administrator privileges, type the following:
 
-`wsl --set-default-version 2`
 
-Now, you will need to go to the Microsoft Store and [Download Ubuntu 18.04](https://www.microsoft.com/store/apps/9N9TNGVNDL3Q)
+# Training interface reference
 
-Launch the "Ubuntu" app available in your Start Menu. Linux will require its own user account and password, which you will need to enter now:
+> [!NOTE]  
+> The training API is still experimental, and is subject to change.
 
-By default, a shimmed version of the CUDA tooling is provided by your Windows GPU drivers.
+Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fune-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2).
 
-Important: you should _never_ use instructions for installing CUDA-toolkit in a generic linux fashion. in WSL 2, you _always_ want to use the provided `CUDA Toolkit using WSL-Ubuntu Package`.
+## How it works
 
-First, open PowerShell or Windows Command Prompt in administrator mode by right-clicking and selecting "Run as administrator". Then enter the following command:
+If you've used Cog before, you've probably seen the [Predictor](./python.md) class, which defines the interface for creating predictions against your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights.
 
-This should drop you into your running linux VM. Now you can run the following bash commands to install the correct version of cuda-toolkit for WSL-Ubuntu. Note that the version of CUDA used below may not be the version of CUDA your GPU supports.
+`cog.yaml`:
 
-`sudo apt-key del 7fa2af80 # if this line fails, you may remove it. wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb sudo dpkg -i cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb sudo apt-get update sudo apt-get -y install cuda-toolkit-11-7`
+```yaml
+build:
+  python_version: "3.10"
+train: "train.py:train"
+```
 
-6\. Install Docker[#](#6-install-docker "Permanent link")
----------------------------------------------------------
+`train.py`:
 
-Download and install [Docker Desktop for Windows](https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe). It has WSL 2 support built in by default.
+```python
+from cog import BasePredictor, File
+import io
 
-Once installed, run `Docker Desktop`, you can ignore the first-run tutorial. Go to **Settings → General** and ensure **Use the WSL 2 based engine** has a checkmark next to it. Click **Apply & Restart**.
+def train(param: str) -> File:
+    return io.StringIO("hello " + param)
+```
 
-Reboot your computer one more time.
+Then you can run it like this:
 
-7\. Install `cog` and pull an image[#](#7-install-cog-and-pull-an-image "Permanent link")
------------------------------------------------------------------------------------------
+```
+$ cog train -i param=train
+...
 
-Open Windows Terminal and enter your WSL 2 VM:
+$ cat weights
+hello train
+```
 
-Download and install `cog` inside the VM:
+## `Input(**kwargs)`
 
-``sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog``
+Use Cog's `Input()` function to define each of the parameters in your `train()` function:
+
+```py
+from cog import Input, Path
+
+def train(
+    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
+    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
+    seed: int = Input(description="random seed to use for training", default=None)
+) -> str:
+  return "hello, weights"
+```
+
+The `Input()` function takes these keyword arguments:
 
-Make sure it's available by typing:
+- `description`: A description of what to pass to this input for users of the model.
+- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
+- `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
+- `le`: For `int` or `float` types, the value must be less than or equal to this number.
+- `min_length`: For `str` types, the minimum length of the string.
+- `max_length`: For `str` types, the maximum length of the string.
+- `regex`: For `str` types, the string must match this regular expression.
+- `choices`: For `str` or `int` types, a list of possible values for this input.
 
-`which cog # should output /usr/local/bin/cog cog --version # should output the cog version number.`
+Each parameter of the `train()` function must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](./python.md#input-and-output-types) for the full list of supported types.
 
-8\. Run a model in WSL 2[#](#8-run-a-model-in-wsl-2 "Permanent link")
----------------------------------------------------------------------
+Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
+
+```py
+def predict(self,
+  training_data: str = "foo bar", # this is valid
+  iterations: int                 # also valid
+) -> str:
+  # ...
+```
 
-Finally, make sure it works. Let's try running `afiaka87/glid-3-xl` locally:
+## Training Output
 
-`cog predict 'r8.im/afiaka87/glid-3-xl' -i prompt="a fresh avocado floating in the water" -o prediction.json`
+Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a `TrainingOutput` object with multiple fields to return from your `train()` function, and specify it as the return type for the train function using Python's `->` return type annotation:
 
-While your prediction is running, you can use `Task Manager` to keep an eye on GPU memory consumption:
+```python
+from cog import BaseModel, Input, Path
 
-This model just barely manages to fit under 8 GB of VRAM.
+class TrainingOutput(BaseModel):
+    weights: Path
 
-Notice that output is returned as JSON for this model as it has a complex return type. You will want to convert the base64 string in the json array to an image.
+def train(
+    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
+    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
+    seed: int = Input(description="random seed to use for training", default=42)
+) -> TrainingOutput:
+  weights_file = generate_weights("...")
+  return TrainingOutput(weights=Path(weights_file))
+```
 
-`jq` can help with this:
+## Testing
+
+If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a `COG_WEIGHTS` environment variable when running `predict`:
 
-The following bash uses `jq` to grab the first element in our prediction array and converts it from a base64 string to a `png` file.
+```console
+cog predict -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK"
+```
 
-`jq -cs '.[0][0][0]' prediction.json | cut --delimiter "," --field 2 | base64 --ignore-garbage --decode > prediction.png`
 
-When using WSL 2, you can access Windows binaries with the `.exe` extension. This lets you open photos easily within linux.
 
-`explorer.exe prediction.png`
 
-9\. References[#](#9-references "Permanent link")
--------------------------------------------------
 
-*   [https://docs.nvidia.com/cuda/wsl-user-guide/index.html](https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
-*   [https://developer.nvidia.com/cuda-downloads?target\_os=Linux&target\_arch=x86\_64&Distribution=WSL-Ubuntu&target\_version=2.0](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0)
-*   [https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/](https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/)
-*   [https://docs.microsoft.com/en-us/windows/wsl/install-manual#step-4---download-the-linux-kernel-update-package](https://docs.microsoft.com/en-us/windows/wsl/install-manual#step-4---download-the-linux-kernel-update-package)
-*   [https://github.com/replicate/cog](https://github.com/replicate/cog)</content>
-</page>
+---
 
-<page>
-  <title>Contributing - Cog</title>
-  <url>https://cog.run/CONTRIBUTING/</url>
-  <content>[](https://github.com/replicate/cog/edit/main/docs/CONTRIBUTING.md "Edit this page")
 
-Contributing guide[#](#contributing-guide "Permanent link")
------------------------------------------------------------
 
-Making a contribution[#](#making-a-contribution "Permanent link")
------------------------------------------------------------------
 
-### Signing your work[#](#signing-your-work "Permanent link")
 
-Each commit you contribute to Cog must be signed off (not to be confused with **[signing](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work)**). It certifies that you wrote the patch, or have the right to contribute it. It is called the [Developer Certificate of Origin](https://developercertificate.org/) and was originally developed for the Linux kernel.
+# `cog.yaml` reference
 
-If you can certify the following:
+`cog.yaml` defines how to build a Docker image and how to run predictions on your model inside that image.
 
-`By making a contribution to this project, I certify that:  (a) The contribution was created in whole or in part by me and I     have the right to submit it under the open source license     indicated in the file; or  (b) The contribution is based upon previous work that, to the best     of my knowledge, is covered under an appropriate open source     license and I have the right under that license to submit that     work with modifications, whether created in whole or in part     by me, under the same open source license (unless I am     permitted to submit under a different license), as indicated     in the file; or  (c) The contribution was provided directly to me by some other     person who certified (a), (b) or (c) and I have not modified     it.  (d) I understand and agree that this project and the contribution     are public and that a record of the contribution (including all     personal information I submit with it, including my sign-off) is     maintained indefinitely and may be redistributed consistent with     this project or the open source license(s) involved.`
+It has three keys: [`build`](#build), [`image`](#image), and [`predict`](#predict). It looks a bit like this:
 
-Then add this line to each of your Git commit messages, with your name and email:
+```yaml
+build:
+  python_version: "3.11"
+  python_packages:
+    - pytorch==2.0.1
+  system_packages:
+    - "ffmpeg"
+    - "git"
+predict: "predict.py:Predictor"
+```
 
-### How to sign off your commits[#](#how-to-sign-off-your-commits "Permanent link")
+Tip: Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `cog.yaml` file that can be used as a starting point for setting up your model.
 
-If you're using the `git` CLI, you can sign a commit by passing the `-s` option: `git commit -s -m "Reticulate splines"`
+## `build`
 
-You can also create a git hook which will sign off all your commits automatically. Using hooks also allows you to sign off commits when using non-command-line tools like GitHub Desktop or VS Code.
+This stanza describes how to build the Docker image your model runs in. It contains various options within it:
 
-First, create the hook file and make it executable:
+<!-- Alphabetical order, please! -->
 
-`cd your/checkout/of/cog touch .git/hooks/prepare-commit-msg chmod +x .git/hooks/prepare-commit-msg`
+### `cuda`
 
-Then paste the following into the file:
+Cog automatically picks the correct version of CUDA to install, but this lets you override it for whatever reason by specifying the minor (`11.8`) or patch (`11.8.0`) version of CUDA to use.
 
-`#!/bin/sh  NAME=$(git config user.name) EMAIL=$(git config user.email)  if [ -z "$NAME" ]; then     echo "empty git config user.name"     exit 1 fi  if [ -z "$EMAIL" ]; then     echo "empty git config user.email"     exit 1 fi  git interpret-trailers --if-exists doNothing --trailer \     "Signed-off-by: $NAME <$EMAIL>" \     --in-place "$1"`
+For example:
 
-### Acknowledging contributions[#](#acknowledging-contributions "Permanent link")
+```yaml
+build:
+  cuda: "11.8"
+```
 
-We welcome contributions from everyone, and consider all forms of contribution equally valuable. This includes code, bug reports, feature requests, and documentation. We use [All Contributors](https://allcontributors.org/) to maintain a list of all the people who have contributed to Cog.
+### `gpu`
 
-To acknowledge a contribution, add a comment to an issue or pull request in the following format:
+Enable GPUs for this model. When enabled, the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image will be used, and Cog will automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.
 
-`@allcontributors please add @username for doc,code,ideas`
+For example:
 
-A bot will automatically open a pull request to add the contributor to the project README.
+```yaml
+build:
+  gpu: true
+```
 
-Common contribution types include: `doc`, `code`, `bug`, and `ideas`. See the full list at [allcontributors.org/docs/en/emoji-key](https://allcontributors.org/docs/en/emoji-key)
+When you use `cog run` or `cog predict`, Cog will automatically pass the `--gpus=all` flag to Docker. When you run a Docker image built with Cog, you'll need to pass this option to `docker run`.
 
-Development environment[#](#development-environment "Permanent link")
----------------------------------------------------------------------
+### `python_packages`
 
-You'll need to [install Go 1.21](https://golang.org/doc/install). If you're using a newer Mac with an M1 chip, be sure to download the `darwin-arm64` installer package. Alternatively you can run `brew install go` which will automatically detect and use the appropriate installer for your system architecture.
+A list of Python packages to install from the PyPi package index, in the format `package==version`. For example:
 
-Install the Python dependencies:
+```yaml
+build:
+  python_packages:
+    - pillow==8.3.1
+    - tensorflow==2.5.0
+```
 
-`python -m pip install '.[dev]'`
+To install Git-hosted Python packages, add `git` to the `system_packages` list, then use the `git+https://` syntax to specify the package name. For example:
 
-Once you have Go installed, run:
+```yaml
+build:
+  system_packages:
+    - "git"
+  python_packages:
+    - "git+https://github.com/huggingface/transformers"
+```
 
-`make install PREFIX=$(go env GOPATH)`
+You can also pin Python package installations to a specific git commit:
 
-This installs the `cog` binary to `$GOPATH/bin/cog`.
+```yaml
+build:
+  system_packages:
+    - "git"
+  python_packages:
+    - "git+https://github.com/huggingface/transformers@2d1602a"
+```
 
-To run the tests:
+Note that you can use a shortened prefix of the 40-character git commit SHA, but you must use at least six characters, like `2d1602a` above.
 
-The project is formatted by goimports. To format the source code, run:
+### `python_requirements`
 
-If you encounter any errors, see the troubleshooting section below?
+A pip requirements file specifying the Python packages to install. For example:
 
-Project structure[#](#project-structure "Permanent link")
----------------------------------------------------------
+```yaml
+build:
+  python_requirements: requirements.txt
+```
 
-As much as possible, this is attempting to follow the [Standard Go Project Layout](https://github.com/golang-standards/project-layout).
+Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies.
 
-*   `cmd/` - The root `cog` command.
-*   `pkg/cli/` - CLI commands.
-*   `pkg/config` - Everything `cog.yaml` related.
-*   `pkg/docker/` - Low-level interface for Docker commands.
-*   `pkg/dockerfile/` - Creates Dockerfiles.
-*   `pkg/image/` - Creates and manipulates Cog Docker images.
-*   `pkg/predict/` - Runs predictions on models.
-*   `pkg/util/` - Various packages that aren't part of Cog. They could reasonably be separate re-usable projects.
-*   `python/` - The Cog Python library.
-*   `test-integration/` - High-level integration tests for Cog.
+### `python_version`
 
-Concepts[#](#concepts "Permanent link")
----------------------------------------
+The minor (`3.11`) or patch (`3.11.1`) version of Python to use. For example:
 
-There are a few concepts used throughout Cog that might be helpful to understand.
+```yaml
+build:
+  python_version: "3.11.1"
+```
 
-*   **Config**: The `cog.yaml` file.
-*   **Image**: Represents a built Docker image that serves the Cog API, containing a **model**.
-*   **Input**: Input from a **prediction**, as key/value JSON object.
-*   **Model**: A user's machine learning model, consisting of code and weights.
-*   **Output**: Output from a **prediction**, as arbitrarily complex JSON object.
-*   **Prediction**: A single run of the model, that takes **input** and produces **output**.
-*   **Predictor**: Defines how Cog runs **predictions** on a **model**.
+Cog supports all active branches of Python: 3.8, 3.9, 3.10, 3.11, 3.12, 3.13. If you don't define a version, Cog will use the latest version of Python 3.12 or a version of Python that is compatible with the versions of PyTorch or TensorFlow you specify.
 
-Running tests[#](#running-tests "Permanent link")
--------------------------------------------------
+Note that these are the versions supported **in the Docker container**, not your host machine. You can run any version(s) of Python you wish on your host machine.
 
-To run the entire test suite:
+### `run`
 
-To run just the Golang tests:
+A list of setup commands to run in the environment after your system packages and Python packages have been installed. If you're familiar with Docker, it's like a `RUN` instruction in your `Dockerfile`.
 
-To run just the Python tests:
+For example:
 
-To stand up a server for one of the integration tests:
+```yaml
+build:
+  run:
+    - curl -L https://github.com/cowsay-org/cowsay/archive/refs/tags/v3.7.0.tar.gz | tar -xzf -
+    - cd cowsay-3.7.0 && make install
+```
 
-`make install pip install -r requirements-dev.txt make test cd test-integration/test_integration/fixtures/file-project cog build docker run -p 5001:5000 --init --platform=linux/amd64 cog-file-project`
+Your code is _not_ available to commands in `run`. This is so we can build your image efficiently when running locally.
 
-Then visit [localhost:5001](http://localhost:5001/) in your browser.
+Each command in `run` can be either a string or a dictionary in the following format:
 
-Running the docs server[#](#running-the-docs-server "Permanent link")
----------------------------------------------------------------------
+```yaml
+build:
+  run:
+    - command: pip install
+      mounts:
+        - type: secret
+          id: pip
+          target: /etc/pip.conf
+```
 
-To run the docs website server locally:
+You can use secret mounts to securely pass credentials to setup commands, without baking them into the image. For more information, see [Dockerfile reference](https://docs.docker.com/engine/reference/builder/#run---mounttypesecret).
 
-Publishing a release[#](#publishing-a-release "Permanent link")
----------------------------------------------------------------
+### `system_packages`
 
-This project has a [GitHub Actions workflow](https://github.com/replicate/cog/blob/39cfc5c44ab81832886c9139ee130296f1585b28/.github/workflows/ci.yaml#L107) that uses [goreleaser](https://goreleaser.com/quick-start/#quick-start) to facilitate the process of publishing new releases. The release process is triggered by manually creating and pushing a new annotated git tag.
+A list of Ubuntu APT packages to install. For example:
 
-> Deciding what the annotated git tag should be requires some interpretation. Cog generally follows [SemVer 2.0.0](https://semver.org/spec/v2.0.0.html), and since the major version is `0`, the rules get [a bit more loose](https://semver.org/spec/v2.0.0.html#spec-item-4). Broadly speaking, the rules for when to increment the patch version still hold, but backward-incompatible changes **will not** require incrementing the major version. In this way, the minor version may be incremented whether the changes are additive or subtractive. This all changes once the major version is incremented to `1`.
+```yaml
+build:
+  system_packages:
+    - "ffmpeg"
+    - "libavcodec-dev"
+```
 
-To publish a new release `v0.13.12` referencing commit `fabdadbead`, for example, one would run the following in one's local checkout of cog:
+## `image`
 
-`git tag --sign --annotate --message 'Release v0.13.12' v0.13.12 fabdadbead git push origin v0.13.12`
+The name given to built Docker images. If you want to push to a registry, this should also include the registry name.
 
-Then visit [github.com/replicate/cog/actions](https://github.com/replicate/cog/actions) to monitor the release process.
+For example:
 
-### Publishing a prerelease[#](#publishing-a-prerelease "Permanent link")
+```yaml
+image: "r8.im/your-username/your-model"
+```
 
-Prereleases are a useful way to give testers a way to try out new versions of Cog without affecting the documented `latest` download URL which people normally use to install Cog.
+r8.im is Replicate's registry, but this can be any Docker registry.
 
-To publish a prerelease version, append a [SemVer prerelease identifer](https://semver.org/#spec-item-9) like `-alpha` or `-beta` to the git tag name. Goreleaser will detect this and mark it as a prerelease in GitHub Releases.
+If you don't set this, then a name will be generated from the directory name.
 
-`git checkout some-prerelease-branch git fetch --all --tags git tag -a v0.1.0-alpha -m "Prerelease v0.1.0" git push --tags`
+If you set this, then you can run `cog push` without specifying the model name. 
 
-Troubleshooting[#](#troubleshooting "Permanent link")
------------------------------------------------------
+If you specify an image name argument when pushing (like `cog push your-username/custom-model-name`), the argument will be used and the value of `image` in cog.yaml will be ignored.
 
-### `cog command not found`[#](#cog-command-not-found "Permanent link")
+## `predict`
 
-The compiled `cog` binary will be installed in `$GOPATH/bin/cog`, e.g. `~/go/bin/cog`. Make sure that Golang's bin directory is present on your system PATH by adding it to your shell config (`.bashrc`, `.zshrc`, etc):
+The pointer to the `Predictor` object in your code, which defines how predictions are run on your model.
 
-`export PATH=~/go/bin:$PATH`
+For example:
 
-* * *
+```yaml
+predict: "predict.py:Predictor"
+```
 
-Still having trouble? Please [open an issue](https://github.com/replicate/cog/issues) on GitHub.</content>
-</page>
\ No newline at end of file
+See [the Python API documentation for more information](python.md).

_{Ben Firshman} 💻 📖	_{Andreas Jansson} 💻 📖 🚧	_{Zeke Sikelianos} 💻 📖 🔧	_{Rory Byrne} 💻 📖 ⚠️	_{Michael Floering} 💻 📖 🤔	_{Ben Evans} 📖	_{shashank agarwal} 💻 📖
_VictorXLR 💻 📖 ⚠️	_{hung anna} 🐛	_{Brian Whitman} 🐛	_JimothyJohn 🐛	_ericguizzo 🐛	_{Dominic Baggott} 💻 ⚠️	_{Dashiell Stander} 🐛 💻 ⚠️
_{Shuwei Liang} 🐛 💬	_{Eric Allam} 🤔	_{Iván Perdomo} 🐛	_{Charles Frye} 📖	_{Luan Pham} 🐛 📖	_TommyDew 💻	_{Jesse Andrews} 💻 📖 ⚠️
_{Nick Stenning} 💻 📖 🎨 🚇 ⚠️	_{Justin Merrell} 📖	_{Rurik Ylä-Onnenvuori} 🐛	_Youka 🐛	_{Clay Mullis} 📖	_Mattt 💻 📖 🚇	_{Eng Zer Jun} ⚠️
_BB 💻	_williamluer 📖	_{Simon Eskildsen} 💻	_F 🐛 💻	_{Philip Potter} 🐛 💻	_{Joanne Chen} 📖	_technillogue 💻
_{Aron Carroll} 📖 💻 🤔	_{Bohdan Mykhailenko} 📖 🐛	_{Daniel Radu} 📖 🐛	_{Itay Etelis} 💻	_{Gennaro Schiano} 📖	_{André Knörig} 📖