Skip to content

Running Kernels in Isolation

James Price edited this page Apr 22, 2014 · 7 revisions

As well as providing an interface via the OpenCL 1.2 runtime API, Oclgrind also has a lightweight method of running a single kernel in isolation from any OpenCL host code. This interface is provided via the oclgrind-kernel command, which takes as input a simulator file describing how to run the kernel. Most of the flags that are accepted by oclgrind are also accepted by oclgrind-kernel (use --help to see a full list of available options). By default, oclgrind-kernel will not output anything, other than any errors it encounters when running the kernel. To dump the contents of global memory after the kernel has finished, use the -g flag.

The format of the simulator file should follow this structure:

kernel_file
kernel_name
global size
local size

argument 0 definition

argument 1 definition

...

argument N definition
  • kernel_file: This is the filename containing the kernel you wish to run in either OpenCL C (.cl) or SPIR (.bc) form.
  • kernel_name: This is the name of the kernel you wish to run.
  • global size: This is a tuple specifying the global size (NDRange) you wish to run.
  • local size: This is a tuple specifying the local size (work-group size) you wish to use.

Comments can be supplied using the # character, which will cause everything on the line following it to be ignored.

Argument Definitions

All argument definitions begin with an argument header, which is enclosed in angle brackets (<>) and must be contained on a single line. An argument header consists of one or more whitespace delimited tokens that describe how the argument data should be generated, which are described below.

  • size=N - Specifies that the argument data should be N bytes (N is always interpreted as a decimal integer).

  • hex - Specifies that argument data is interpreted as hexadecimal values.

  • fill=value - Generates the argument data and fills it with the given value.

  • range=start:inc:end - Generates the argument data with a values ranging from start to end, in increments of inc. The number of values in this range must exactly match the argument size (with respect to the argument data type).

  • type - Overrides the type used when interpreting argument data and parameters to the fill and range generators. Can be one of char,uchar,short,ushort,int,uint,long,ulong,float,double.

The size parameter is required for all arguments. For private memory arguments passed by-value, the size parameter must match the size of the argument data-type.

For private and global memory arguments, the argument data must be specified following the argument header, unless the fill or range data generators are used. The argument data consists of a set of whitespace delimited values and interpreted using the type of the kernel argument, unless overridden to a different type in the argument header. The number of values specified must exactly match the argument size (with respect to the argument data type).

For local memory argument, the size parameter is the only valid parameter, and no argument data should follow the argument header.

There is currently no way to define image and sampler arguments.

Vector Addition Example

Given the following integer vector addition kernel:

kernel void vecadd(global int *a, global int *b, global int *c, int size)
{
  int i = get_global_id(0);
  if (i < size)
  {
    c[i] = a[i] + b[i];
  }
}

We can write a simulator file with the following contents:

vecadd.cl
vecadd
8 1 1
4 1 1

<size=32 range=0:1:7>
<size=32 range=8:1:15>
<size=32 fill=0>

<size=4>
8

Here we've initialised the first buffer (a) to contain the values 0-7, the second buffer (b) to contain the values 8-15, and the output buffer (c) to all zeros. The scalar size argument is set to 8.

When we run this with oclgrind-kernel, we get the following output:

$ oclgrind-kernel -g vecadd.sim
Global Memory:

   1000000000000: 00 00 00 00
   1000000000004: 01 00 00 00
   1000000000008: 02 00 00 00
   100000000000C: 03 00 00 00
   1000000000010: 04 00 00 00
   1000000000014: 05 00 00 00
   1000000000018: 06 00 00 00
   100000000001C: 07 00 00 00
   2000000000000: 08 00 00 00
   2000000000004: 09 00 00 00
   2000000000008: 0A 00 00 00
   200000000000C: 0B 00 00 00
   2000000000010: 0C 00 00 00
   2000000000014: 0D 00 00 00
   2000000000018: 0E 00 00 00
   200000000001C: 0F 00 00 00
   3000000000000: 08 00 00 00
   3000000000004: 0A 00 00 00
   3000000000008: 0C 00 00 00
   300000000000C: 0E 00 00 00
   3000000000010: 10 00 00 00
   3000000000014: 12 00 00 00
   3000000000018: 14 00 00 00
   300000000001C: 16 00 00 00

Here we can see each of the three buffers (notice that different buffers can be identified by the most significant byte of the address), and can verify that the output buffer contains the correct result.

Clone this wiki locally