Skip to content

Running Kernels in Isolation

James Price edited this page Jul 11, 2015 · 7 revisions

As well as providing an interface via the OpenCL 1.2 runtime API, Oclgrind also has a lightweight method of running a single kernel in isolation from any OpenCL host code. This interface is provided via the oclgrind-kernel command, which takes as input a simulator file describing how to run the kernel. Most of the flags that are accepted by oclgrind are also accepted by oclgrind-kernel (use --help to see a full list of available options). By default, oclgrind-kernel will not output anything, other than any errors it encounters when running the kernel. Individual memory objects can be flagged for outputting after kernel execution, or the entire contents of global memory can be dumped with the -g flag.

The format of the simulator file should follow this structure:

kernel_file
kernel_name
global size
local size

argument 0 definition

argument 1 definition

...

argument N definition
  • kernel_file: This is the filename containing the kernel you wish to run in either OpenCL C (.cl) or SPIR (.bc) form.
  • kernel_name: This is the name of the kernel you wish to run.
  • global size: This is a tuple specifying the global size (NDRange) you wish to run.
  • local size: This is a tuple specifying the local size (work-group size) you wish to use.

Comments can be supplied using the # character, which will cause everything on the line following it to be ignored.

Argument Definitions

All argument definitions begin with an argument header, which is enclosed in angle brackets (<>) and must be contained on a single line. An argument header consists of one or more whitespace delimited tokens that describe how the argument data should be generated, which are described below.

  • size=N - Specifies that the argument data should be N bytes (N is always interpreted as a decimal integer).

  • hex - Specifies that argument data is interpreted as hexadecimal values.

  • null - Passes a null-pointer for a buffer argument. Not valid with any other argument descriptor.

  • noinit - Leave the contents of a buffer uninitialised.

  • fill=value - Generates the argument data and fills it with the given value.

  • range=start:inc:end - Generates the argument data with a values ranging from start to end, in increments of inc. The number of values in this range must exactly match the argument size (with respect to the argument data type).

  • dump - Outputs the contents of the memory object when the kernel execution is complete.

  • type - Overrides the type used when interpreting argument data and parameters to the fill and range generators. Can be one of char,uchar,short,ushort,int,uint,long,ulong,float,double.

The size parameter is required for all arguments, unless null is used. For private memory arguments passed by-value, the size parameter must match the size of the argument data-type.

For global memory buffers and arguments passed by value, the argument data must be specified following the argument header, unless noinit or the fill and range data generators are used. The argument data consists of a set of whitespace delimited values and interpreted using the type of the kernel argument, unless overridden to a different type in the argument header. The number of values specified must exactly match the argument size (with respect to the argument data type).

For local memory argument, the size parameter is the only valid parameter, and no argument data should follow the argument header.

There is currently no way to define image and sampler arguments.

Vector Addition Example

Given the following integer vector addition kernel:

kernel void vecadd(global int *a, global int *b, global int *c, int size)
{
  int i = get_global_id(0);
  if (i < size)
  {
    c[i] = a[i] + b[i];
  }
}

We can write a simulator file with the following contents:

vecadd.cl # File containing OpenCL program (can be OpenCL C or SPIR)
vecadd    # Name of kernel to run
8 1 1     # NDRange
4 1 1     # Work-group size

# First argument 'global int *a'
<size=32 range=0:1:7>

# Second argument 'global int *b'
<size=32 range=8:1:15>

# Third argument 'global int *c'
<size=32 fill=0 dump>

# Fourth argument 'int size'
<size=4>
8

Here we've initialised the first buffer (a) to contain the values 0-7, the second buffer (b) to contain the values 8-15, and the output buffer (c) to all zeros. The scalar size argument is set to 8, and we've specified that the contents of c should be output when the kernel has finished.

When we run this with oclgrind-kernel, we get the following output:

$ oclgrind-kernel vecadd.sim 

Argument 'c': 32 bytes
  c[0] = 8
  c[1] = 10
  c[2] = 12
  c[3] = 14
  c[4] = 16
  c[5] = 18
  c[6] = 20
  c[7] = 22

Here we can see the contents of the c argument, and can verify that the buffer contains the correct result.