This document describes which routines are supported in CLBlast. For other information about CLBlast, see the main README.
Full API documentation is available in a separate API documentation file.
The different data-types supported by the library are:
- S: Single-precision 32-bit floating-point (
float
). - D: Double-precision 64-bit floating-point (
double
). - C: Complex single-precision 2x32-bit floating-point (
std::complex<float>
). - Z: Complex double-precision 2x64-bit floating-point (
std::complex<double>
). - H: Half-precision 16-bit floating-point (
cl_half
). See section 'Half precision' below for more information.
CLBlast supports almost all the Netlib BLAS routines plus a couple of extra non-BLAS routines. The supported BLAS routines are marked with '✔' in the following tables. Routines marked with '-' do not exist: they are not part of BLAS at all.
Level-1 | S | D | C | Z | H |
---|---|---|---|---|---|
xSWAP | ✔ | ✔ | ✔ | ✔ | ✔ |
xSCAL | ✔ | ✔ | ✔ | ✔ | ✔ |
xCOPY | ✔ | ✔ | ✔ | ✔ | ✔ |
xAXPY | ✔ | ✔ | ✔ | ✔ | ✔ |
xDOT | ✔ | ✔ | - | - | ✔ |
xDOTU | - | - | ✔ | ✔ | - |
xDOTC | - | - | ✔ | ✔ | - |
xNRM2 | ✔ | ✔ | ✔ | ✔ | ✔ |
xASUM | ✔ | ✔ | ✔ | ✔ | ✔ |
IxAMAX | ✔ | ✔ | ✔ | ✔ | ✔ |
Level-2 | S | D | C | Z | H |
---|---|---|---|---|---|
xGEMV | ✔ | ✔ | ✔ | ✔ | ✔ |
xGBMV | ✔ | ✔ | ✔ | ✔ | ✔ |
xHEMV | - | - | ✔ | ✔ | - |
xHBMV | - | - | ✔ | ✔ | - |
xHPMV | - | - | ✔ | ✔ | - |
xSYMV | ✔ | ✔ | - | - | ✔ |
xSBMV | ✔ | ✔ | - | - | ✔ |
xSPMV | ✔ | ✔ | - | - | ✔ |
xTRMV | ✔ | ✔ | ✔ | ✔ | ✔ |
xTBMV | ✔ | ✔ | ✔ | ✔ | ✔ |
xTPMV | ✔ | ✔ | ✔ | ✔ | ✔ |
xGER | ✔ | ✔ | - | - | ✔ |
xGERU | - | - | ✔ | ✔ | - |
xGERC | - | - | ✔ | ✔ | - |
xHER | - | - | ✔ | ✔ | - |
xHPR | - | - | ✔ | ✔ | - |
xHER2 | - | - | ✔ | ✔ | - |
xHPR2 | - | - | ✔ | ✔ | - |
xSYR | ✔ | ✔ | - | - | ✔ |
xSPR | ✔ | ✔ | - | - | ✔ |
xSYR2 | ✔ | ✔ | - | - | ✔ |
xSPR2 | ✔ | ✔ | - | - | ✔ |
xTRSV | ✔ | ✔ | ✔ | ✔ |
Level-3 | S | D | C | Z | H |
---|---|---|---|---|---|
xGEMM | ✔ | ✔ | ✔ | ✔ | ✔ |
xSYMM | ✔ | ✔ | ✔ | ✔ | ✔ |
xHEMM | - | - | ✔ | ✔ | - |
xSYRK | ✔ | ✔ | ✔ | ✔ | ✔ |
xHERK | - | - | ✔ | ✔ | - |
xSYR2K | ✔ | ✔ | ✔ | ✔ | ✔ |
xHER2K | - | - | ✔ | ✔ | - |
xTRMM | ✔ | ✔ | ✔ | ✔ | ✔ |
xTRSM | ✔ | ✔ | ✔ | ✔ |
Furthermore, there are also batched versions of BLAS routines available, processing multiple smaller computations in one go for better performance:
Batched | S | D | C | Z | H |
---|---|---|---|---|---|
xAXPYBATCHED | ✔ | ✔ | ✔ | ✔ | ✔ |
xGEMMBATCHED | ✔ | ✔ | ✔ | ✔ | ✔ |
xGEMMSTRIDEDBATCHED | ✔ | ✔ | ✔ | ✔ | ✔ |
In addition, some extra non-BLAS routines are also supported by CLBlast, classified as level-X. They are experimental and should be used with care:
Level-X | S | D | C | Z | H |
---|---|---|---|---|---|
xSUM | ✔ | ✔ | ✔ | ✔ | ✔ |
IxAMIN | ✔ | ✔ | ✔ | ✔ | ✔ |
IxMAX | ✔ | ✔ | ✔ | ✔ | ✔ |
IxMIN | ✔ | ✔ | ✔ | ✔ | ✔ |
xHAD | ✔ | ✔ | ✔ | ✔ | ✔ |
xOMATCOPY | ✔ | ✔ | ✔ | ✔ | ✔ |
xIM2COL | ✔ | ✔ | ✔ | ✔ | ✔ |
Some less commonly used BLAS routines are not yet supported yet by CLBlast. They are xROTG, xROTMG, xROT, xROTM, xTBSV, and xTPSV.
The half-precision fp16 format is a 16-bits floating-point data-type. Some OpenCL devices support the cl_khr_fp16
extension, reducing storage and bandwidth requirements by a factor 2 compared to single-precision floating-point. In case the hardware also accelerates arithmetic on half-precision data-types, this can also greatly improve compute performance of e.g. level-3 routines such as GEMM. Devices which can benefit from this are among others Intel GPUs, ARM Mali GPUs, and NVIDIA's latest Pascal GPUs. Half-precision is in particular interest for the deep-learning community, in which convolutional neural networks can be processed much faster at a minor accuracy loss.
Since there is no half-precision data-type in C or C++, OpenCL provides the cl_half
type for the host device. Unfortunately, internally this translates to a 16-bits integer, so computations on the host using this data-type should be avoided. For convenience, CLBlast provides the clblast_half.h
header (C99 and C++ compatible), defining the half
type as a short-hand to cl_half
and the following basic functions:
half FloatToHalf(const float value)
: Converts a 32-bits floating-point value to a 16-bits floating-point value.float HalfToFloat(const half value)
: Converts a 16-bits floating-point value to a 32-bits floating-point value.
The samples/haxpy.c example shows how to use these convenience functions when calling the half-precision BLAS routine HAXPY.