TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torchscripted models.
- Serving Quick Start - Basic server usage tutorial
- Model Archive Quick Start - Tutorial that shows you how to package a model archive file.
- Installation - Installation procedures
- Model loading - How to load a model in TorchServe?
- Serving Models - Explains how to use TorchServe
- REST API - Specification on the API endpoint for TorchServe
- gRPC API - TorchServe supports gRPC APIs for both inference and management calls
- Packaging Model Archive - Explains how to package model archive file, use
model-archiver
. - Inference API - How to check for the health of a deployed model and get inferences
- Management API - How to manage and scale models
- Logging - How to configure logging
- Metrics - How to configure metrics
- Prometheus and Grafana metrics - How to configure metrics API with Prometheus formatted metrics in a Grafana dashboard
- Captum Explanations - Built in support for Captum explanations for both text and images
- Batch inference with TorchServe - How to create and serve a model with batch inference in TorchServe
- Workflows - How to create workflows to compose Pytorch models and Python functions in sequential and parallel pipelines
- Image Classifier - This handler takes an image and returns the name of object in that image
- Text Classifier - This handler takes a text (string) as input and returns the classification text based on the model vocabulary
- Object Detector - This handler takes an image and returns list of detected classes and bounding boxes respectively
- Image Segmenter- This handler takes an image and returns output shape as [CL H W], CL - number of classes, H - height and W - width
- HuggingFace Language Model - This handler takes an input sentence and can return sequence classifications, token classifications or Q&A answers
- Multi Modal Framework - Build and deploy a classifier that combines text, audio and video input data
- Dual Translation Workflow -
- Model Zoo - List of pre-trained model archives ready to be served for inference with TorchServe.
- Examples - Many examples of how to package and deploy models with TorchServe
- Workflow Examples - Examples of how to compose models in a workflow with TorchServe
- Advanced configuration - Describes advanced TorchServe configurations.
- A/B test models - A/B test your models for regressions before shipping them to production
- Custom Service - Describes how to develop custom inference services.
- Encrypted model serving - S3 server side model encryption via KMS
- Snapshot serialization - Serialize model artifacts to AWS Dynamo DB
- Benchmarking and Profiling - Use JMeter or Apache Bench to benchmark your models and TorchServe itself
- TorchServe on Kubernetes - Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service
- mlflow-torchserve - Deploy mlflow pipeline models into TorchServe
- Kubeflow pipelines - Kubeflow pipelines and Google Vertex AI Managed pipelines
- NVIDIA MPS - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU