This simple code demonstrates autovectorization for a vector addition routine.
Ensure that you have the intel compiler available in your environment:
setpkgs -a intel_cluster_studio_compiler
To build run:
make
to delete the executable:
make clean
The program accepts one command line argument, the length of the vector. For example:
./vec_add_icc 1000
A bash script is included that will time the execution time and pass in a vector length of 1000000000 by default. Just type:
bash run.sh
-
Build the binary with and without vectorization enabled. Building with level-three optimization will ensure autovectorization is enabled. Verify that the for loop is vectorized by reading the vectorization report generated by the compiler. Time the execution of both versions of the binary for a vector length of 1000000000.
-
Compare the walltime with versions of the binary that are built without vectorization enabled. Using optimization level zero will ensure that vectorization is disabled.
-
Convert the for loop to a while loop. Does it still vectorize?
-
Move the actual vector addition step (c = a + b) into a second for loop. With level three optimization enabled, how does the autovectorization behavior compare (read reporting carefully)? How does the walltime compare?
-
Revert back to the original code. Now start the loop at i=1 (ignore the first element for the vector addition) and add this line to the end of the loop body:
c[i] = c[i-1] + 82.3;
What changes? Why?