The OpenCL C 1.2 language provides an expressive variant of the C language with which to program heterogeneous architectures. There is already a significant body of code in the wild written in the OpenCL C language - both in open source and proprietary software. This document explains how the OpenCL C language is mapped onto an implementation of the Vulkan standard for high-performance graphics and compute.
The following subjects are covered:
- Which SPIR-V features are used.
- How the Vulkan API makes use of the Vulkan variant of SPIR-V produced.
- How OpenCL C language constructs are mapped down onto Vulkan's variant of SPIR-V.
- Restrictions on the OpenCL C language as is to be consumed by a Vulkan implementation.
The SPIR-V as produced from the OpenCL C language can make use of the following additional extensions:
- SPV_KHR_variable_pointers - to enable the support of more expressive pointers that the OpenCL C language can make use of.
The SPIR-V as produced from the OpenCL C language can make use of the following capabilities:
Shader
as we are targeting the OpenCL C language at a Vulkan implementation.- If pointers are used such that the operations in the OpenCL C require the use
of the SPV_KHR_variable_pointers extension:
- From the SPV_KHR_variable_pointers extension,
VariablePointer
.
- From the SPV_KHR_variable_pointers extension,
A Vulkan implementation that is to consume the SPIR-V produced from the OpenCL C language must conform to the following the rules:
- If the short/ushort types are used in the OpenCL C:
- The
shaderInt16
field ofVkPhysicalDeviceFeatures
must be set to true.
- The
- If images are used in the OpenCL C:
- The
shaderStorageImageReadWithoutFormat
field ofVkPhysicalDeviceFeatures
must be set to true. - The
shaderStorageImageWriteWithoutFormat
field ofVkPhysicalDeviceFeatures
must be set to true.
- The
- If pointers are used such that the operations in the OpenCL C require the use
of the SPV_KHR_variable_pointers extension:
- A call to
vkCreateDevice()
where theppEnabledExtensionNames
field ofVkDeviceCreateInfo
contains the extension string "VK_KHR_variable_pointers" must succeed.
- A call to
OpenCL C kernel argument types are mapped to Vulkan descriptor types in the following way:
- If the argument to the kernel is a read only image, the matching Vulkan
descriptor set type is
VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE
. - If the argument to the kernel is a write only image, the matching Vulkan
descriptor set type is
VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
. - If the argument to the kernel is a sampler, the matching Vulkan
descriptor set type is
VK_DESCRIPTOR_TYPE_SAMPLER
. - If the argument to the kernel is a constant or global pointer type, the
matching Vulkan descriptor set type is
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
. - If the argument to the kernel is a plain-old-data type, the matching Vulkan
descriptor set type is
VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
.
Note: If -cluster-pod-kernel-args
is used, then all plain-old-data kernel
arguments are collected into a single structure to be passed in to the compute
shader as a single storage buffer resource.
Some OpenCL C language features that are not natively expressible in Vulkan's variant of SPIR-V, require a subtle mapping to how Vulkan SPIR-V represents the corresponding functionality.
An additional preprocessor macro VULKAN
is set, to allow developers to guard
OpenCL C functionality based on whether the Vulkan API is being targeted or not.
This value is set to 100, to match Vulkan version 1.0.
OpenCL C language kernels take the form:
void kernel foo(global int* a, global float* b, uint c);
SPIR-V tracks OpenCL C language kernels using OpEntryPoint
opcodes that denote
the entry-points where an API interacts with a compute kernel.
Vulkan's variant of SPIR-V requires that the entry-points be void
return
functions, and that they take no arguments.
To pass data into Vulkan SPIR-V shaders, OpVariable
s are declared outside of
the functions, and decorated with DescriptorSet
and Binding
decorations, to
denote that the shaders can interact with their data.
The default way to map an OpenCL C language kernel to a Vulkan SPIR-V compute shader is as follows:
- Each kernel is assigned a corresponding descriptor set, such that the first kernel has descriptor set 0, and each subsequent kernel is an increment of 1 from the previous.
- Each argument within each kernel is assigned a binding in that kernel's
corresponding
DescriptorSet
such that theBinding
is equal to the position of the argument in the kernel's argument list, where the first argument for the kernel has aBinding
of 0. - If the argument to the kernel is a
global
orconstant
pointer, it is placed into a SPIR-VOpTypeStruct
that is decorated withBufferBlock
, and anOpVariable
of this structure type is created and decorated with the correspondingDescriptorSet
andBinding
, using theUniform
storage class. - If the argument to the kernel is a plain-old-data type, it is placed into a
SPIR-V
OpTypeStruct
that is decorated withBufferBlock
, and anOpVariable
of this structure type is created and decorated with the correspondingDescriptorSet
andBinding
, using theUniform
storage class. - If the argument to the kernel is an image or sampler, an
OpVariable
of theOpTypeImage
orOpTypeSampler
type is created and decorated with the correspondingDescriptorSet
andBinding
, using theUniformConstant
storage class.
The compiler can tell you what descriptor set and bindings are used for kernel
arguemnts. Use option -descriptormap
to name a file that should contain
the mapping information.
Example:
clspv foo.cl -descriptormap=foomap.csv
The descriptor map is a text file with comma-separated values.
Consider this example:
// First kernel in the translation unit, and no sampler map is used.
void kernel foo(global int* a, float f, global float* b, uint c) {...}
It generates the following descriptor map:
kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0
kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,1,offset,0
kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,2,offset,0
kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,3,offset,0
For kernel arguments, the fields are:
kernel
to indicate a kernel argument- kernel name
arg
to indicate a kernel argument- argument name
argOrdinal
to indicate a kernel argument ordinal position field- the argument's 0-based position in the kernel's parameter list
descriptorSet
- the DescriptorSet value
binding
- the Binding value
offset
- The byte offset inside the storage buffer where you should write the argument value. This will always be zero, unless you cluster plain-old-data kernel arguments. (See below.)
If a sampler map is used, then samplers use descriptor set 0 and kernel descriptor
set numbers start at 1. For example, if the sampler map file is mysamplermap
containing:
CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST,
CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR
Then compiling with:
clspv foo.cl -samplermap=mysamplermap -descriptormap=mydescriptormap
Then mydescriptormap
will contain:
sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0
sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1
kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0
kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,1,offset,0
kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,2,offset,0
kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,3,offset,0
Descriptors can be scarce. So the compiler also has an option
-cluster-pod-kernel-args
which can be used to reduce the number of descriptors.
When the option is used:
- All plain-old-data (POD) kernel arguments are collected into a single struct and passed into the compute shader via a single storage buffer resource.
- The binding numbers are assigned as previously, except:
- Binding numbers for non-POD arguments are assigned as if there were no POD arguments.
- The binding number for the struct containing the POD arguments is one more than the highest non-POD argument.
For example:
// First kernel in the translation unit, and no sampler map is used.
void kernel foo(global int* a, float f, global float* b, uint c);
In the default case, the bindings are:
a
is mapped to a storage buffer with descriptor set 0, binding 0f
is mapped to a storage buffer with descriptor set 0, binding 1b
is mapped to a storage buffer with descriptor set 0, binding 2c
is mapped to a storage buffer with descriptor set 0, binding 3
If -cluster-pod-kernel-args
is used:
a
is mapped to a storage buffer with descriptor set 0, binding 0b
is mapped to a storage buffer with descriptor set 0, binding 1f
andc
are POD arguments, so they are mapped to the first and second members of a struct, and that struct is mapped to a storage buffer with descriptor set 0 and binding 2
That is, compiling as follows:
clspv foo.cl -cluster-pod-kernel-args -descriptormap=myclusteredmap
will produce the following in myclusteredmap
:
kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0
kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,1,offset,0
kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,2,offset,0
kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,2,offset,4
If foo
were the second kernel in the translation unit, then its arguments
would use descriptor set 1.
Compiling with the same sampler map from before:
clspv foo.cl -cluster-pod-kernel-args -descriptormap=myclusteredmap -samplermap=mysamplermap
produces the following descriptor map:
sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0
sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1
kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0
kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,1,offset,0
kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,2,offset,0
kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,2,offset,4
TODO(dneto): Give an example using images.
The OpenCL C language allows the work-group size to be set just before executing
the kernel on the device, at clEnqueueNDRangeKernel()
time.
Vulkan requires that the work group size be specified no later than when the
VkPipeline
is created, which in OpenCL terms corresponds to when the
cl_kernel
is created.
To allow for the maximum flexibility to developers who are used to being able to
specify the work group size in the host API and not in the device-side kernel
language, we use specialization constants to allow for setting the work group
size at VkPipeline
creation time.
The work-group size, as exposed in the Vulkan SPIR-V produced by compiling OpenCL C source files, is set such that:
- The x dimension of the work-group size is stored in a specialization
constant that is decorated with the
SpecId
of 0, whose value defaults to 1. - The y dimension of the work-group size is stored in a specialization
constant that is decorated with the
SpecId
of 1, whose value defaults to 1. - The z dimension of the work-group size is stored in a specialization
constant that is decorated with the
SpecId
of 2, whose value defaults to 1.
The __attribute__((reqd_work_group_size(X, Y, Z)))
kernel attribute forces the
SPIR-V generated for that kernel to ignore any attempt to specialize the
work-group size via the specialization constants approach explained above.
The following list of attributes are ignored, and thus have no functional impact on the produced SPIR-V:
__attribute__((work_group_size_hint(X, Y, Z)))
__attribute__((packed))
__attribute__ ((endian(host)))
__attribute__ ((endian(device)))
__attribute__((vec_type_hint(<typen>)))
Signed integer types are mapped down onto their unsigned equivalents in SPIR-V as produced from OpenCL C.
Signed integer modulus (%
) operations, where either argument to the modulus is
a negative integer, will result in an undefined result.
OpenCL C language built-in functions are mapped, where possible, onto their GLSL
4.5 built-in equivalents.
For example, the OpenCL C language built-in function tan()
is mapped onto
GLSL's built-in function tan()
.
The OpenCL C built-in sign()
function does not differentiate between a signed
and unsigned 0.0 input value, nor does it return 0.0 if the input value is a
NaN.
The OpenCL C built-in mad24()
and mul24()
functions do not perform their
operations using 24-bit integers. Instead, they use 32-bit integers, and thus
have no performance-improving characteristics over normal 32-bit integer
arithmetic.
The OpenCL C work-item functions map to Vulkan SPIR-V as follows:
get_work_dim()
will always return 3.get_global_size()
is implemented by multiplying the result fromget_local_size()
by the result fromget_num_groups()
.get_global_id()
is mapped to a SPIR-V variable decorated withGlobalInvocationId
.get_local_size()
is mapped to a SPIR-V variable decorated withWorkgroupSize
.get_local_id()
is mapped to a SPIR-V variable decorated withLocalInvocationId
.get_num_groups()
is mapped to a SPIR-V variable decorated withNumWorkgroups
.get_group_id()
is mapped to a SPIR-V variable decorated withWorkgroupId
.get_global_offset()
will always return 0.
Some OpenCL C language features that have no expressible equivalents in Vulkan's variant of SPIR-V are restricted.
OpenCL C language kernels must not be called from other kernels.
Pointer types in the local
address space must not be used as kernel
arguments.
Pointers of type half
must not be used as kernel arguments.
Booleans are an abstract type - they have no known compile-time size.
Using a boolean type as the argument to the sizeof()
operator will result in
an undefined value.
The boolean type must not be used to form global
, or constant
variables,
nor be used within a struct or union type in the global
, or constant
address
spaces.
The char
, char2
, char3
, uchar
, uchar2
, and uchar3
types
must not be used.
The double
, double2
, double3
, double4
, long
, long2
, long3
,
long4
, ulong
, ulong2
, ulong3
, and ulong4
types must not be used.
The image2d_array_t
, image1d_t
, image1d_buffer_t
, and image1d_array_t
types must not be used.
Any sampler_t
's must be passed in via a kernel argument, or the sampler
must be in the sampler map (see the -samplemap command line argument).
The event_t
type must not be used.
Pointers are an abstract type - they have no known compile-time size.
Using a pointer type as the argument to the sizeof()
operator will result in
an undefined value.
Pointer-to-integer casts must not be used.
Integer-to-pointer casts must not be used.
Pointers in the local
address space must not be used as kernel arguments.
Pointers must not be compared for equality or inequality.
Vectors of 8 and 16 elements must not be used.
Recursively defined struct types must not be used.
Since pointers have no known compile-time size, the pointer-sized types
size_t
, ptrdiff_t
, uintptr_t
, and intptr_t
do not represent types that
are the same size as a pointer.
Instead, those types are mapped to 32-bit integer types.
For any OpenCL C language built-in functions that are mapped onto their GLSL 4.5 built-in equivalents, the precision requirements of the OpenCL C language built-ins are not necessarily honoured.
The atomic_xchg()
built-in function that takes a floating-point argument
must not be used.
The step()
, and smoothstep()
built-in functions must not be used.
The convert_<type>_rte()
, convert_<type>_rtz()
, convert_<type>_rtp()
,
convert_<type>_rtn()
, convert_<type>_sat()
, convert_<type>_sat_rte()
,
convert_<type>_sat_rtz()
, convert_<type>_sat_rtp()
, and
convert_<type>_sat_rtn()
built-in functions must not be used.
The acospi()
, asinpi()
, atanpi()
, atan2pi()
, cbrt()
, copysign()
,
cospi()
, erf()
, erfc()
, expm1()
, fdim()
, fmod()
, hypot()
,
ilogb()
, lgamma()
, lgamma_r()
, log1p()
, logb()
, maxmag()
,
minmag()
, nan()
, nextafter()
, pown()
, remainder()
, remquo()
,
rint()
, rootn()
, sincos()
, sinpi()
, tanpi()
, and tgamma()
built-in
functions must not be used.
The abs_diff()
, add_sat()
, hadd()
, mad_hi()
, mad_sat()
, mul_hi()
,
rhadd()
, rotate()
, sub_sat()
and upsample()
built-in functions
must not be used.
The islessgreater()
, isfinite()
, isnormal()
, isordered()
,
isunordered()
, bitselect()
, and select()
built-in functions must not
be used.
The vload<size>()
, vstore<size>()
, vstore_half_rtp()
, vstore_half_rtn()
,
vstore_half<size>_rtp()
, vstore_half<size>_rtn()
, vstorea_half<size>_rtp()
, and vstorea_half<size>_rtn()
built-in functions must not be used.
The vload_half()
, vload_half<size>()
, vstore_half()
, vstore_half_rte()
,
vstore_half_rtz()
, vstore_half<size>()
, vstore_half<size>_rte()
,
vstore_half<size>_rtz()
, vloada_half<size>()
, vstorea_half<size>()
,
vstorea_half<size>_rte()
, and vstorea_half<size>_rtz()
built-in functions
are only allowed to use the global
and constant
address spaces.
The vstore_half_rte()
, vstore_half_rtz()
, vstore_half<size>_rte()
,
vstore_half<size>_rtz()
, vstorea_half<size>_rte()
, and
vstorea_half<size>_rtz()
built-in functions are not guaranteed to round the
result correctly if the destination address was not declared as a half* on the
kernel entry point.
The async_work_group_copy()
, async_work_group_strided_copy()
,
wait_group_events()
, and prefetch()
built-in functions must not be used.
The shuffle()
and shuffle2()
built-in functions must not be used.
The printf()
built-in function must not be used.
The get_image_channel_data_type()
, get_image_channel_order()
,
read_imagei()
, read_imageui()
, write_imagei()
and write_imageui()
built-in functions must not be used.
The versions of the read_imagef()
built-in functions that use integer vector
types to specify which coordinate to sample must not be used.