Files
CG
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
Note: please observe that in the routine conj_grad three implementations of the sparse matrix-vector multiply have been supplied. The default matrix-vector multiply is not loop unrolled. The alternate implementations are unrolled to a depth of 2 and unrolled to a depth of 8. Please experiment with these to find the fastest for your particular architecture. If reporting timing results, any of these three may be used without penalty. Performance examples: The non-unrolled version of the multiply is actually (slightly: maybe %5) faster on the sp2-66MHz-WN on 16 nodes than is the unrolled-by-2 version below. On the Cray t3d, the reverse is true, i.e., the unrolled-by-two version is some 10% faster. The unrolled-by-8 version below is significantly faster on the Cray t3d - overall speed of code is 1.5 times faster.