ONNXRunTime provides inofficial julia bindings for onnxruntime. It exposes both a low level interface, that mirrors the official C-API, as well as an high level interface.
Contributions are welcome.
The high level API works as follows:
julia> import ONNXRunTime as ORT
julia> path = ORT.testdatapath("increment2x3.onnx"); # path to a toy model
julia> model = ORT.load_inference(path);
julia> input = Dict("input" => randn(Float32,2,3))
Dict{String, Matrix{Float32}} with 1 entry:
"input" => [1.68127 1.18192 -0.474021; -1.13518 1.02199 2.75168]
julia> model(input)
Dict{String, Matrix{Float32}} with 1 entry:
"output" => [2.68127 2.18192 0.525979; -0.135185 2.02199 3.75168]
For GPU usage the CUDA and cuDNN packages are required and the CUDA runtime needs to be set to 12.0 or a later 12.x version. To set this up, do
pkg> add CUDA cuDNN
julia> import CUDA
julia> CUDA.set_runtime_version!(v"12.0")
Then GPU inference is simply
julia> import CUDA, cuDNN
julia> ORT.load_inference(path, execution_provider=:cuda)
CUDA provider options can be specified
julia> ORT.load_inference(path, execution_provider=:cuda,
provider_options=(;cudnn_conv_algo_search=:HEURISTIC))
Memory allocated by a model is eventually automatically released after
it goes out of scope, when the model object is deleted by the garbage
collector. It can also be immediately released with release(model)
.
The low level API mirrors the offical C-API. The above example looks like this:
using ONNXRunTime.CAPI
using ONNXRunTime: testdatapath
api = GetApi();
env = CreateEnv(api, name="myenv");
so = CreateSessionOptions(api);
path = testdatapath("increment2x3.onnx");
session = CreateSession(api, env, path, so);
mem = CreateCpuMemoryInfo(api);
input_array = randn(Float32, 2,3)
input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array));
run_options = CreateRunOptions(api);
input_names = ["input"];
output_names = ["output"];
inputs = [input_tensor];
outputs = Run(api, session, run_options, input_names, inputs, output_names);
output_tensor = only(outputs);
output_array = GetTensorMutableData(api, output_tensor);
- Use the onnxruntime python bindings via PyCall.jl.
- ONNX.jl
- ONNXNaiveNASflux.jl
- ONNXLowLevel.jl cannot run inference but can be used to investigate, create, or manipulate ONNX files.
-
Support for CUDA.jl is changed from version 3 to versions 4 and 5.
-
Support for Julia versions less than 1.9 is dropped. The reason for this is to switch the conditional support of GPUs from being based on the Requires package to being a package extension. As a consequence the ONNXRunTime GPU support can now be precompiled and the CUDA.jl versions can be properly controlled via Compat.
For GPU tests using ONNXRunTime, naturally the tests must depend on and import CUDA and cuDNN. Additionally a supported CUDA runtime version needs to be used, which can be somewhat tricky to set up for the tests.
First some background. What CUDA.set_runtime_version!(v"12.0")
effectively does is to
- Add a
LocalPreferences.toml
file containing
[CUDA_Runtime_jll]
version = "12.0"
- In
Project.toml
, add
[extras]
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
If your test environment is defined by a test
target in the top
Project.toml
you need to
-
Add a
LocalPreferences.toml
in your top directory with the same contents as above. -
Add
CUDA_Runtime_jll
to theextras
section ofProject.toml
. -
Add
CUDA_Runtime_jll
to thetest
target ofProject.toml
.
If your test environment is defined by a Project.toml
in the test
directory, you instead need to
-
Add a
test/LocalPreferences.toml
file with the same contents as above. -
Add
CUDA_Runtime_jll
to theextras
section oftest/Project.toml
.