Skip to content

GPUhack example compilation

Peter Willendrup edited this page Nov 13, 2020 · 14 revisions

Compile an instrument for GPU execution

Note bene: Since the below instructions were written, compilation is also achievable via

mcrun -c --openacc
  • but the rest of the page is relevant for debugging purposes.

Generate the code

hpclogin2(username) $ cp $MCSTAS/examples/BNL_H8.instr .
hpclogin2(username) $ mcstas BNL_H8.instr
Warning: 'V_sample' is an obsolete component (not maintained).
CFLAGS=

(Optionally use mcsplit.py to separate libs and generated code)

hpclogin2(username) $ ls
BNL_H8.c  BNL_H8.instr	
hpclogin2(username) $ mcsplit.py BNL_H8.c
hpclogin2(username) $ ls
BNL_H8.c  BNL_H8.instr	_BNL_H8.c  _mccode-r.c	_mccode-r.h  _mccode_main.c  _mcstas-r.c  _mcstas-r.h

--> _BNL_H8.c works "the same" as BNL_H8.c, 
-->  ...by inclusion of the other .c/.h snippets

Compile using pgcc, targeting GPU

hpclogin2(username) $ pgcc -ta=tesla,managed -Minfo=accel -DOPENACC BNL_H8.c -o BNL_H8_gpu.out

Meaning of the various switches:

  • -ta=tesla - make the target architecture "tesla" i.e. graphic card. multicore also available for CPU use
  • cc70: "Compute capability 7.0" - needs not be set explicitly
  • managed: Make memory "CUDA managed memory" - we should try to eliminate this requirement
  • -Minfo=accel: Give extra information on acceleration achieved
  • -DOPENACC: Use the special McStas defines to enable support for PGI compiler / GPU

Output is (currently - 20200113)

PGC-W-0221-Redefinition of symbol MC_RAND_MAX (BNL_H8.c: 495)
PGC-W-0118-Function Table_File_List_store does not contain a return statement (BNL_H8.c: 5986)
noprintf:
   1143, Generating acc routine seq
         Generating Tesla code
str_comp:
   1148, Generating acc routine seq
         Generating Tesla code
str_len:
   1158, Generating acc routine seq
         Generating Tesla code
mcget_ncount:
   3299, Generating acc routine seq
         Generating Tesla code
mcget_run_num:
   3306, Generating acc routine seq
         Generating Tesla code
coords_set:
   3536, Generating acc routine seq
         Generating Tesla code
coords_get:
   3548, Generating acc routine seq
         Generating Tesla code
coords_add:
   3558, Generating acc routine seq
         Generating Tesla code
coords_sub:
   3571, Generating acc routine seq
         Generating Tesla code
coords_neg:
   3584, Generating acc routine seq
         Generating Tesla code
coords_scale:
   3595, Generating acc routine seq
         Generating Tesla code
coords_sp:
   3606, Generating acc routine seq
         Generating Tesla code
coords_xp:
   3615, Generating acc routine seq
         Generating Tesla code
coords_len:
   3626, Generating acc routine seq
         Generating Tesla code
coords_mirror:
   3632, Generating acc routine seq
         Generating Tesla code
coords_print:
   3650, Generating acc routine seq
         Generating Tesla code
rot_set_rotation:
   3698, Generating acc routine seq
         Generating Tesla code
rot_test_identity:
   3734, Generating acc routine seq
         Generating Tesla code
rot_mul:
   3746, Generating acc routine seq
         Generating Tesla code
rot_copy:
   3764, Generating acc routine seq
         Generating Tesla code
rot_transpose:
   3776, Generating acc routine seq
         Generating Tesla code
rot_apply:
   3793, Generating acc routine seq
         Generating Tesla code
vec_prod_func:
   3823, Generating acc routine seq
         Generating Tesla code
scalar_prod:
   3835, Generating acc routine seq
         Generating Tesla code
norm_func:
   3840, Generating acc routine seq
         Generating Tesla code
mccoordschange:
   3950, Generating acc routine seq
         Generating Tesla code
mccoordschange_polarisation:
   3982, Generating acc routine seq
         Generating Tesla code
normal_vec:
   4000, Generating acc routine seq
         Generating Tesla code
solve_2nd_order:
   4060, Generating acc routine seq
         Generating Tesla code
_randvec_target_circle:
   4109, Generating acc routine seq
         Generating Tesla code
_randvec_target_rect_angular:
   4176, Generating acc routine seq
         Generating Tesla code
_randvec_target_rect_real:
   4254, Generating acc routine seq
         Generating Tesla code
kiss_srandom:
   4527, Generating acc routine seq
         Generating Tesla code
kiss_random:
   4538, Generating acc routine seq
         Generating Tesla code
_hash:
   4565, Generating acc routine seq
         Generating Tesla code
fast_kiss:
   4574, Generating acc routine seq
         Generating Tesla code
_randnorm2:
   4643, Generating acc routine seq
         Generating Tesla code
_gaussian_double:
   4653, Generating acc routine seq
         Generating Tesla code
_randtriangle:
   4667, Generating acc routine seq
         Generating Tesla code
_uniform_double:
   4674, Generating acc routine seq
         Generating Tesla code
_rand01:
   4683, Generating acc routine seq
         Generating Tesla code
_randpm1:
   4692, Generating acc routine seq
         Generating Tesla code
_rand0max:
   4701, Generating acc routine seq
         Generating Tesla code
_randminmax:
   4709, Generating acc routine seq
         Generating Tesla code
mcsetstate:
   5295, Generating acc routine seq
         Generating Tesla code
mcgetstate:
   5325, Generating acc routine seq
         Generating Tesla code
mcgenstate:
   5347, Generating acc routine seq
         Generating Tesla code
mcrestore_neutron:
   5386, Generating acc routine seq
         Generating Tesla code
inside_rectangle:
   5411, Generating acc routine seq
         Generating Tesla code
box_intersect:
   5429, Generating acc routine seq
         Generating Tesla code
cylinder_intersect:
   5545, Generating acc routine seq
         Generating Tesla code
sphere_intersect:
   5601, Generating acc routine seq
         Generating Tesla code
plane_intersect:
   5625, Generating acc routine seq
         Generating Tesla code
Table_Index:
   6527, Generating acc routine seq
         Generating Tesla code
Table_Value:
   6596, Generating acc routine seq
         Generating Tesla code
Table_Value2d:
   6676, Generating acc routine seq
         Generating Tesla code
Table_Interp1d:
   7111, Generating acc routine seq
         Generating Tesla code
Table_Interp1d_nearest:
   7128, Generating acc routine seq
         Generating Tesla code
Table_Interp2d:
   7147, Generating acc routine seq
         Generating Tesla code
StdReflecFunc:
   7244, Generating acc routine seq
         Generating Tesla code
TableReflecFunc:
   7300, Generating acc routine seq
         Generating Tesla code
init:
  10107, Generating update device(_Origin_var)
  10108, Generating update device(_Source_var)
  10109, Generating update device(_D0_Source_var)
  10110, Generating update device(_SC1_var)
  10111, Generating update device(_D1_SC1_Out_var)
  10112, Generating update device(_As1_var)
  10113, Generating update device(_As2_var)
  10114, Generating update device(_As3_var)
  10115, Generating update device(_As4_var)
  10116, Generating update device(_D2_A4_var)
  10117, Generating update device(_Mono_Cradle_var)
  10118, Generating update device(_PG1Xtal_var)
  10119, Generating update device(_Mono_Out_var)
  10120, Generating update device(_D4_SC2_In_var)
  10121, Generating update device(_SC2_var)
  10122, Generating update device(_D5_SC2_Out_var)
  10123, Generating update device(_Sample_Cradle_var)
  10124, Generating update device(_Sample_Out_var)
  10125, Generating update device(_Sample_var)
  10126, Generating update device(_D7_SC3_In_var)
  10127, Generating update device(_SC3_var)
  10128, Generating update device(_D8_SC3_Out_var)
  10129, Generating update device(_Ana_Cradle_var)
  10130, Generating update device(_PG2Xtal_var)
  10131, Generating update device(_Ana_Out_var)
  10132, Generating update device(_D10_SC4_In_var)
  10133, Generating update device(_SC4_var)
  10134, Generating update device(_He3H_var)
  10138, Generating update device(_instrument_var)
class_Progress_bar_trace:
  10175, Generating acc routine seq
         Generating Tesla code
class_Source_simple_trace:
  10243, Generating acc routine seq
         Generating Tesla code
class_PSD_monitor_trace:
  10326, Generating acc routine seq
         Generating Tesla code
class_Guide_trace:
  10384, Generating acc routine seq
         Generating Tesla code
class_Slit_trace:
  10523, Generating acc routine seq
         Generating Tesla code
class_Monochromator_flat_trace:
  10552, Generating acc routine seq
         Generating Tesla code
class_V_sample_trace:
  10723, Generating acc routine seq
         Generating Tesla code
raytrace:
  10933, Generating acc routine seq
         Generating Tesla code
raytrace_all:
  11475, Generating Tesla code
      11476, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11475, Local memory used for particleN
raytrace_all_funnel:
  11520, Generating Tesla code
      11521, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11520, Generating implicit copyout(particles[:innerloop])
  11534, Generating Tesla code
      11535, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11534, Local memory used for _particle_save
         Generating implicit copy(.I0001,.I0004)
  11554, Generating Tesla code
      11555, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11554, Local memory used for _particle_save
         Generating implicit copy(.I0001,.I0004)
  11574, Generating Tesla code
      11575, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11574, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11594, Generating Tesla code
      11595, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11594, Local memory used for _particle_save
         Generating implicit copy(.I0001,.I0004)
  11614, Generating Tesla code
      11615, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11614, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11634, Generating Tesla code
      11635, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11634, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11654, Generating Tesla code
      11655, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11654, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11674, Generating Tesla code
      11675, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11674, Local memory used for _particle_save
         Generating implicit copy(.I0001,.I0004)
  11694, Generating Tesla code
      11695, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11694, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11714, Generating Tesla code
      11715, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11714, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11734, Generating Tesla code
      11735, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11734, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11751, Generating Tesla code
      11752, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11751, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11771, Generating Tesla code
      11772, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11771, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11788, Generating Tesla code
      11789, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11788, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11808, Generating Tesla code
      11809, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11808, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11828, Generating Tesla code
      11829, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11828, Local memory used for _particle_save
         Generating implicit copy(.I0001,.I0004)
  11848, Generating Tesla code
      11849, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11848, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11865, Generating Tesla code
      11866, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11865, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11882, Generating Tesla code
      11883, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11882, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11902, Generating Tesla code
      11903, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11902, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11922, Generating Tesla code
      11923, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11922, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11942, Generating Tesla code
      11943, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11942, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11962, Generating Tesla code
      11963, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11962, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11979, Generating Tesla code
      11980, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11979, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  11999, Generating Tesla code
      12000, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  11999, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  12016, Generating Tesla code
      12017, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  12016, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  12036, Generating Tesla code
      12037, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  12036, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  12056, Generating Tesla code
      12057, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
  12056, Local memory used for _particle_save
         Generating implicit copy(.I0004,.I0001)
  12074, Generating Tesla code
      12075, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
finally:
  12319, Generating update self(_Origin_var)
  12320, Generating update self(_Source_var)
  12321, Generating update self(_D0_Source_var)
  12322, Generating update self(_SC1_var)
  12323, Generating update self(_D1_SC1_Out_var)
  12324, Generating update self(_As1_var)
  12325, Generating update self(_As2_var)
  12326, Generating update self(_As3_var)
  12327, Generating update self(_As4_var)
  12328, Generating update self(_D2_A4_var)
  12329, Generating update self(_Mono_Cradle_var)
  12330, Generating update self(_PG1Xtal_var)
  12331, Generating update self(_Mono_Out_var)
  12332, Generating update self(_D4_SC2_In_var)
  12333, Generating update self(_SC2_var)
  12334, Generating update self(_D5_SC2_Out_var)
  12335, Generating update self(_Sample_Cradle_var)
  12336, Generating update self(_Sample_Out_var)
  12337, Generating update self(_Sample_var)
  12338, Generating update self(_D7_SC3_In_var)
  12339, Generating update self(_SC3_var)
  12340, Generating update self(_D8_SC3_Out_var)
  12341, Generating update self(_Ana_Cradle_var)
  12342, Generating update self(_PG2Xtal_var)
  12343, Generating update self(_Ana_Out_var)
  12344, Generating update self(_D10_SC4_In_var)
  12345, Generating update self(_SC4_var)
  12346, Generating update self(_He3H_var)
  12348, Generating update self(_instrument_var)
PGC/x86-64 Linux 19.4-0: compilation completed with warnings

Next topic: GPUhack example run

(Optional compile using mpicc, targeting CPU)

Please compile and run such binaries via e.g. the generic interactive nodes, accessible via

linuxsh

Compile:

host(username) $ mpicc -DUSE_MPI -lmpi -lm -DRNG_ALG=2 BNL_H8.c -o BNL_H8_mpi.out

Run:

host(username) $ mpirun -np 10 ./BNL_H8_mpi.out -dtest_mpi_10 lambda=2.36
Clone this wiki locally