DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning on Windows. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers.
When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictabiltiy of results across hardware is critical.
The DirectML Execution Provider is an optional component of ONNX Runtime that uses DirectML to accelerate inference of ONNX models. The DirectML execution provider is capable of greatly improving evaluation time of models using commodity GPU hardware, without sacrificing broad hardware support or requiring vendor-specific extensions to be installed.
The DirectML Execution Provider is currently in preview.
The DirectML execution provider requires any DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:
- NVIDIA Kepler (GTX 600 series) and above
- AMD GCN 1st Gen (Radeon HD 7000 series) and above
- Intel Haswell (4th-gen core) HD Integrated Graphics and above
DirectML is compatible with Windows 10, version 1709 (10.0.16299; RS3, "Fall Creators Update") and newer.
For general information about building onnxruntime, see BUILD.md.
Requirements for building the DirectML execution provider:
- Visual Studio 2017 toolchain (see cmake configuration instructions)
- The Windows 10 SDK (10.0.18362.0) for Windows 10, version 1903 (or newer)
To build onnxruntime with the DML EP included, supply the --use_dml
parameter to build.bat
. e.g.
build.bat --config RelWithDebInfo --build_shared_lib --parallel --use_dml
The DirectML execution provider supports building for both x64 (default) and x86 architectures.
Note that building onnxruntime with the DirectML execution provider enabled causes the the DirectML redistributable package to be automatically downloaded as part of the build. This package contains a pre-release version of DirectML, and its use is governed by a license whose text may be found as part of the NuGet package.
When using the C API with a DML-enabled build of onnxruntime (see Building from source), the DirectML execution provider can be enabled using one of the two factory functions included in include/onnxruntime/core/providers/dml/dml_provider_factory.h
.
Creates a DirectML Execution Provider which executes on the hardware adapter with the given device_id
, also known as the adapter index. The device ID corresponds to the enumeration order of hardware adapters as given by IDXGIFactory::EnumAdapters. A device_id
of 0 always corresponds to the default adapter, which is typically the primary display GPU installed on the system. A negative device_id
is invalid.
OrtStatus* OrtSessionOptionsAppendExecutionProvider_DML(
_In_ OrtSessionOptions* options,
int device_id
);
Creates a DirectML Execution Provider using the given DirectML device, and which executes work on the supplied D3D12 command queue. The DirectML device and D3D12 command queue must have the same parent ID3D12Device, or an error will be returned. The D3D12 command queue must be of type DIRECT
or COMPUTE
(see D3D12_COMMAND_LIST_TYPE). If this function succeeds, the inference session once created will maintain a strong reference on both the dml_device
and command_queue
objects.
OrtStatus* OrtSessionOptionsAppendExecutionProviderEx_DML(
_In_ OrtSessionOptions* options,
_In_ IDMLDevice* dml_device,
_In_ ID3D12CommandQueue* cmd_queue
);
See Also
DMLCreateDevice function
ID3D12Device::CreateCommandQueue method
Direct3D 12 programming guide
The DirectML execution provider currently supports ONNX opset 9 (ONNX v1.4). Evaluating models which require a higher opset version is not supported, and may produce unexpected results.
The DirectML execution provider does not support the use of memory pattern optimizations or parallel execution in onnxruntime. When supplying session options during InferenceSession creation, these options must be disabled or an error will be returned.
If using the onnxruntime C API, you must call DisableMemPattern
and SetSessionExecutionMode
functions to set the options required by the DirectML execution provider.
See onnxruntime\include\onnxruntime\core\session\onnxruntime_c_api.h.
OrtStatus*(ORT_API_CALL* DisableMemPattern)(_Inout_ OrtSessionOptions* options)NO_EXCEPTION;
OrtStatus*(ORT_API_CALL* SetSessionExecutionMode)(_Inout_ OrtSessionOptions* options, ExecutionMode execution_mode)NO_EXCEPTION;
If creating the onnxruntime InferenceSession object directly, you must set the appropriate fields on the onnxruntime::SessionOptions
struct. Specifically, execution_mode
must be set to ExecutionMode::ORT_SEQUENTIAL
, and enable_mem_pattern
must be false
.
Additionally, as the DirectML execution provider does not support parallel execution, it does not support multi-threaded calls to Run
on the same inference session. That is, if an inference session using the DirectML execution provider, only one thread may call Run
at a time. Multiple threads are permitted to call Run
simultaneously if they operate on different inference session objects.
A complete sample of onnxruntime using the DirectML execution provider can be found under samples/c_cxx/fns_candy_style_transfer.