This demo showcase the use of onnxruntime-rs on BERT with a GPU on CUDA 11 served by actix-web and tokenized with Hugging Face tokenizer.
- Linux x86_64
- NVIDIA GPU with CUDA 11 (Not sure if CUDA 10 works)
- Rust (obviously)
- git lfs for the models
export ORT_USE_CUDA=1
git lfs install
cargo build --release
cargo run --release
or
export LD_LIBRARY_PATH=path/to/onnxruntime-linux-x64-gpu-1.8.0/lib:${LD_LIBRARY_PATH}
./target/release/onnx-server
curl http://localhost:8080/\?data=Hello+World
To compare with standart python server with FastAPI, I've added the code for the same server in src called python_alternative.py
pip install -r requirements.txt
cd src
uvicorn python_alternative:app --reload --workers 1
curl http://localhost:8000/\?data=Hello+World
The training pipeline is in another repo: https://github.com/haixuanTao/bert-onnx-rs-pipeline