Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
inference_pipeline.md		inference_pipeline.md
text_generation_inference.md		text_generation_inference.md

README.md

Inference

Inference is the process of using a trained language model to generate predictions or responses. While inference might seem straightforward, deploying models efficiently at scale requires careful consideration of various factors like performance, cost, and reliability. Large Language Models (LLMs) present unique challenges due to their size and computational requirements.

We'll explore both simple and production-ready approaches using the transformers library and text-generation-inference, two popular frameworks for LLM inference. For production deployments, we'll focus on Text Generation Inference (TGI), which provides optimized serving capabilities.

Module Overview

LLM inference can be categorized into two main approaches: simple pipeline-based inference for development and testing, and optimized serving solutions for production deployments. We'll cover both approaches, starting with the simpler pipeline approach and moving to production-ready solutions.

Title	Description	Exercise	Link	Colab
Pipeline Inference	Basic inference with transformers pipeline	🐢 Set up a basic pipeline 🐕 Configure generation parameters 🦁 Create a simple web server	Link	Colab
TGI Deployment	Production deployment with TGI	🐢 Deploy a model with TGI 🐕 Configure performance optimizations 🦁 Set up monitoring and scaling	Link	Colab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7_inference

7_inference

README.md

Inference

Module Overview

Contents

1. Basic Pipeline Inference

2. Production Inference with TGI

Exercise Notebooks

Resources

Files

7_inference

Directory actions

More options

Directory actions

More options

Latest commit

History

7_inference

Folders and files

parent directory

README.md

Inference

Module Overview

Contents

1. Basic Pipeline Inference

2. Production Inference with TGI

Exercise Notebooks

Resources