Skip to content

Parallel Blocks

petravandenbos-utwente edited this page Jan 27, 2021 · 8 revisions

In this section we explain how to verify parallel algorithms by creating parallel blocks in PVL. First, we give an example of a simple method using parallel block. Then, we discuss a more complex example with a barrier inside the parallel block. A barrier is used to synchronize threads inside a parallel block.

Simple Parallel Blocks

Parallel Block without Barrier

This example shows a simple method that adds two arrays in parallel and stores the result in another array:

1 context_everywhere a != null && b != null && c != null;
2 context_everywhere a.length == size && b.length == size && c.length == size;
3 context (\forall* int i; i >= 0 &&  i < size; Perm(a[i], 1\2));
4 context (\forall* int i; i >= 0 &&  i < size; Perm(b[i], 1\2));
5 context (\forall* int i; i >= 0 &&  i < size; Perm(c[i], 1));
6 ensures (\forall int i; i >= 0 &&  i < size; c[i] == a[i] + b[i]);
7 void Add(int[] a, int[] b, int[] c, int size){
8
9    par threads (int tid = 0 .. size)
10    context Perm(a[tid], 1\2) ** Perm(b[tid], 1\2) ** Perm(c[tid], 1);
11    ensures c[tid] == a[tid] + b[tid];
12   {
13      c[tid] = a[tid] + b[tid];
14   }
15 }

In this method there is a parallel block (lines 9-14) named "threads". The keyword "par" is used to define a parallel block, followed by an arbitrary name after that defines the name of that block. Moreover, we define the number of threads in the parallel block as well as a name for the thread identifier. In this example, we have "size" threads in the range from 0 to "size-1" and "tid" is used as thread identifier to refer to each thread.

In addition to the specification of the method (lines 1-6), we add thread-level specification to the parallel block (lines 10-11). The precondition of the method indicates that we have read permission over all locations in arrays "a" and "b" and write permission for array "c" (lines 3-5). In the parallel block, we specify that each thread ("tid") has read permission to its own location in arrays "a" and "b" and write permission to its own location in array "c" (line 10). After termination of the parallel block as postcondition (1) we have the same permission (line 10) and (2) the result of each location in array "c" is the sum of the two corresponding locations in arrays "a" and "b" (line 11). From the postcondition of the parallel block, we can derive the postcondition of the method using universal quantifier for all locations in the arrays (line 3-6).

Parallel Block with Barrier

Next we show an example of a parallel block which uses a barrier to synchronize threads:

1  context_everywhere array != null && array.length == size;
2  requires (\forall* int i; i >= 0 &&  i < size; Perm(array[i], 1\2));
3  ensures (\forall* int i; i >= 0 &&  i < size; Perm(array[i], 1));
4  ensures (\forall int i; i >= 0 &&  i < size; (i != size-1 ==> array[i] == \old(array[i+1])) && 
5                                               (i == size-1 ==> array[i] == \old(array[0])) );
6  void leftRotation(int[] array, int size){
7
8    par threads (int tid = 0 .. size)
9     requires tid != size-1 ==> Perm(array[tid+1], 1\2);
10    requires tid == size-1 ==> Perm(array[0], 1\2);
11    ensures Perm(array[tid], 1);
12    ensures tid != size-1 ==> array[tid] == \old(array[tid+1]);
13    ensures tid == size-1 ==> array[tid] == \old(array[0]);
14   {
15      int temp;
16	if(tid != size-1){
17	    temp = array[tid+1];
18	}else{
19	    temp = array[0];
20	}
21
22      barrier(threads)
23      {
24        requires tid != size-1 ==> Perm(array[tid+1], 1\2);
25	  requires tid == size-1 ==> Perm(array[0], 1\2);
26	  ensures Perm(array[tid], 1);
27      }
28
29      array[tid] = temp;
30   }
31 }

This example illustrates a method named "leftRotation" that rotates the elements of an array to the left. In this example, we also have "size" threads in the range from 0 to "size-1" and "tid" is used as thread identifier. Inside the parallel block each thread ("tid") stores its right neighbor in a temporary location (i.e., "temp"), except thread "size-1" which stores the first element in the array (lines 15-20). Then each thread synchronizes at the barrier (line 22). The keyword "barrier" and the name of the parallel block as an argument (e.g., "threads" in the example) are used to define a barrier in PVL. After that, each thread writes the value read into its own location at index "tid" in the array (line 29).

To verify this method in VerCors, we annotate the barrier, in addition to the method and the parallel block. As precondition of the method, we have read permission over all locations in the array (line 2). At the beginning of the parallel block, each thread reads from its right neighbor, except thread "size-1" which reads from location 0 (lines 16-20). Therefore, we specify read permissions as precondition of the parallel block in lines 9-10. Since after the barrier each thread ("tid") writes into its own location at index ("tid"), we change the permissions in the barrier in such that each thread has write permissions to its own location (lines 24-26). When a thread reaches the barrier, it has to fulfill the barrier preconditions, and then it may assume the barrier postconditions. Moreover, the barrier postconditions must follow from the barrier preconditions.

As postcondition of the parallel block (1) first each thread has write permission to its own location (this comes from the postcondition of the barrier) in line 11 and (2) the elements are truly shifted to the left (lines 12-13). From the postcondition of the parallel block, we can establish the same postcondition for the method (lines 3-5).

Complicated Parallel Blocks

Nested Parallel Blocks

In the previous examples, we defined a parallel block of threads to work on one-dimensional arrays. It is also possible to define two-dimensional thread layouts to work on two-dimensional arrays. As an example we define a two-dimensional thread layout to transpose a matrix in parallel:

1  context_everywhere inp != null && out == size;
2  context_everywhere inp.length == size && out.length == size; 
3  context_everywhere (\forall int i; 0 <= i && i < size; inp[i].length == size && out[i].length == size);
4  context (\forall* int i; 0 <= i && i < size; 
5           (\forall* int j; 0 <= j && j < size; Perm(inp[i][j], read)));
6  context (\forall* int i; 0 <= i && i < size; 
7           (\forall* int j; 0 <= j && j < size; Perm(out[i][j], write)));
8  ensures (\forall int i; 0 <= i && i < size; 
9           (\forall* int j; 0 <= j && j < size; out[i][j] == inp[j][i]));
10 void transpose(int[][] inp, int[][] out, int size){
11
12   par threadX (int tidX = 0 .. size)
13    context (\forall* int i; i >= 0 && i < size; Perm(inp[i][tidX], read));
14    context (\forall* int i; i >= 0 && i < size; Perm(out[tidX][i], write));
15    ensures (\forall int i; i >= 0 && i < size; out[tidX][i] == inp[i][tidX]);
16   {
17     par threadY (int tidY = 0 .. size)
19      context Perm(inp[tidY][tidX], read);
20      context Perm(out[tidX][tidY], write);
21      ensures out[tidX][tidY] == inp[tidY][tidX];
22     {
23       out[tidX][tidY] = inp[tidY][tidX]; 
24     } 
25   }
26 }

As we can see defining nested parallel blocks allow us to define thread layout in different dimensions. We can simplify the above example syntactically into one parallel block:

1  context_everywhere inp != null && out != null;
2  context_everywhere inp.length == size && out.length == size; 
3  context_everywhere (\forall int i; 0 <= i && i < size; inp[i].length == size && out[i].length == size);
4  context (\forall* int i; 0 <= i && i < size; 
5           (\forall* int j; 0 <= j && j < size; Perm(inp[i][j], read)));
6  context (\forall* int i; 0 <= i && i < size; 
7           (\forall* int j; 0 <= j && j < size; Perm(out[i][j], write)));
8  ensures (\forall int i; 0 <= i && i < size; 
9           (\forall* int j; 0 <= j && j < size; out[i][j] == inp[j][i]));
10 void transpose(int[][] inp, int[][] out, int size){
11
12   par threadXY (int tidX = 0 .. size, int tidY = 0 .. size)
13    context Perm(inp[tidY][tidX], read);
14    context Perm(out[tidX][tidY], write);
15    ensures out[tidX][tidY] == inp[tidY][tidX];
16   {
17     out[tidX][tidY] = inp[tidY][tidX]; 
18   } 
19 }

Simultaneous Parallel Blocks

There might be some scenarios that we have multiple parallel blocks and all threads in each parallel block are working in disjoint memory locations. In this case we can define multiple parallel blocks to run simultaneously. That means, threads in each parallel block run independently of other threads in different parallel blocks. Below is an example of such a scenario:

1  context_everywhere a != null && b != null && c != null;
2  context_everywhere a.length == size && b.length == size && c.length == size; 
3  context (\forall* int i; i >= 0 &&  i < size; Perm(a[i], write));
4  context (\forall* int i; i >= 0 &&  i < size; Perm(b[i], write));
5  context (\forall* int i; i >= 0 &&  i < size; Perm(c[i], write));
6  ensures (\forall int i; i >= 0 &&  i < size; a[i] == \old(a[i]) + 1 && b[i] == \old(b[i]) + 1 && 
7                                               c[i] == \old(c[i]) + 1);
8  void simPar(int[] a, int[] b, int[] c, int size){
9
10   par thread1 (int tid1 = 0 .. size)
11    context Perm(a[tid1], write);
12    ensures a[tid1] == \old(a[tid1]) + 1;
13   {
14     a[tid1] = a[tid1] + 1; 
15   } and
16   par thread2 (int tid2 = 0 .. size)
17    context Perm(b[tid2], write);
18    ensures b[tid2] == \old(b[tid2]) + 1;
19   {
20     b[tid2] = b[tid2] + 1;
21   } and
22   par thread3 (int tid3 = 0 .. size)
23    context Perm(c[tid3], write);
24    ensures c[tid3] == \old(c[tid3]) + 1;
25   {
26     c[tid3] = c[tid3] + 1;
27   }
28 }

As we can see we use "and" between each parallel block (lines 15 and 21) to define simultaneous parallel blocks. This construction can be used in a situation where we decide to run simultaneous instructions. That means, we do not define threads in the parallel blocks, but the par blocks run simultaneously. Below is an example in this situation where "a", "b" and "c" are objects included a field "val":

1  context Perm(a.val, write) ** Perm(b.val, write) ** Perm(c.val, write);
2  ensures a.val == \old(a.val) + 1 && b.val == \old(b.val) + 1 && c.val == \old(c.val) + 1;                                            
3  void IncPar(Object a, Object b, Object c){
4
5    par 
6     context Perm(a.val, write);
7     ensures a.val == \old(a.val) + 1;
8    {
9      a.val = a.val + 1; 
10   } and
11   par 
12    context Perm(b.val, write);
13    ensures b.val == \old(b.val) + 1;
14   {
15     b.val = b.val + 1;
16   } and
17   par 
18    context Perm(c.val, write);
19    ensures c.val == \old(c.val) + 1;
20   {
21     c.val = c.val + 1;
22   }
23 }

In the above example, for each object we increase its "val" by one in parallel. As we can see, thread blocks are without names and thread identifiers in this case.

Clone this wiki locally