Skip to content

Latest commit

 

History

History

rapids_triton_example

Triton + Rapids Example

Triton

Triton Inference Server simplifies the deployment of AI models at scale in production. It lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA® TensorRT, PyTorch, ONNX Runtime, or custom) from local storage or cloud platform on any GPU- or CPU-based infrastructure (cloud, data center, or edge) and deploy them on the cloud.

Check out the Triton documentation at link

Using Rapids and Triton together

We use Triton's python backend, which allows you to serve Python "models" that can execute arbitrary python (and thus RAPIDS) code.

Here we showcase a simple example of using RAPIDS+Pytorch with Triton.

Build

build.sh creates a Triton+RAPIDS docker container which you can use to deploy your rapids code with Triton.

bash build.sh

Model

  1. Tokenization of strings into numerical vectors using cuDF's subwordTokenizer.

  2. Sentiment Prediction using Pytorch model

  3. Ensemble Model Configuration is present in models/end_to_end_model/config.pbtxt

Serving

Triton inference server is started using start_server.sh.

bash start_server.sh

Client Code

The client logic to interact with the served Triton model is present in example_client.ipynb.