Torch-7 FFI binding and C warpper for Intel MKLDNN library, and MKLDNN library is designed by Intel to accelerate Deep Neural Network(DNN) computation on CPU, in particular Intel® Xeon processors (HSW, BDW, Xeon Phi), which is competitive to cuDNN library.
Modules are API compatible with their nn
equivalents. Fully unit-tested against nn
implementations.
Conversion between nn
and mklnn
is available through mklnn.convert
function.
- Install torch with this instructions
- MKLML library auto-download and setting(see this link)
- Install mkltorch (luarocks install mkltorch)
- Install mklnn (luarocks install mklnn)
Convnet Benchmark performance from this link
- distro: The Out-Of-Box Torch is installed from distro with openblas
- distro+mklnn: mklml version
- distro+cudnn: cudnn version
Inference | distro | distro+mklnn | distro+cudnn |
---|---|---|---|
alexnet | :-----------------: | :---------------: | :---------------: |
overfeat | :-----------------: | :---------------: | :---------------: |
vgg_a | :-----------------: | :---------------: | :---------------: |
googlenet | :-----------------: | :---------------: | :---------------: |
require 'mklnn' -- will automatically require mkltorch
The following OP are supported in this package:
-- All inputs have to be 3D or 4D(batch-mode)
mklnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW = 1], [dH = 1], [padW = 0], [padH = 0], [groups = 1])
mklnn.SpatialMaxPooling(kW, kH, dW, dH, padW, padH)
mklnn.SpatialAveragePooling(kW, kH, dW, dH, padW, padH)
mklnn.SpatialBatchNormalization(nFeature, eps, momentum, affine)
mklnn.SpatialCrossMapLRN(size, alpha, beta, k)
mklnn.Concat(dimension)
mklnn.ReLU([inplace=false])
-- Two layout conversion op
mklnn.U2I() -- convert the user layout(default NCHW) to internal layout(required by MKLDNN library)
mklnn.I2U() -- convert the internel layout to user layout
-- Op in plan, and this list will increase
mklnn.SpatialFullConvolution()
Conversion is done by mklnn.convert
function which takes a network and backend arguments('mkl' or 'nn') and goes over
network modules recursively substituting equivalents.
require 'nn'
require 'mklnn'
net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,4,4))
net:add(nn.ReLU())
mklnet = mklnn.convert(net, 'mkl')
print(mklnet)
will result in:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> output]
(1): mklnn.U2I
(2): mklnn.SpatialConvolution(3 -> 96, 11x11, 4,4)
(3): mklnn.ReLU
(4): mklnn.I2U
}
Get another demo from this link to perform an Convnet benchmark test.