Skip to content

Benchmark of fgemv for Givaro::Integer in the field of Givaro::ZRing on a multicore server

ZHG2017 edited this page Mar 26, 2019 · 23 revisions

Note p = (0 for sequential, 1 for <Recursive,Thread>, 2 for <Row,Thread>, 3 for <Row, Grain>)

Benchmark using OpenMP

OMP_NUM_THREADS=1

Time: 7.58827 Gflops: 0.00421703 -q 0 -b 100 -p 0 -m 4000 -k 4000 -t 1 -i 10 -s 1020440166 -g 64
Time: 13.7061 Gflops: 0.00233473 -q 0 -b 200 -p 0 -m 4000 -k 4000 -t 1 -i 10 -s 1020440166 -g 64
Time: 30.8435 Gflops: 0.00414999 -q 0 -b 100 -p 0 -m 8000 -k 8000 -t 1 -i 10 -s 1020440166 -g 64
Time: 35.8357 Gflops: 0.00357186 -q 0 -b 200 -p 0 -m 8000 -k 8000 -t 1 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=8

4000x4000 and 100 bits

Time: 1.69488 Gflops: 0.0188803 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64
Time: 1.56445 Gflops: 0.0204545 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64
Time: 0.756837 Gflops: 0.0422812 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 2.85925 Gflops: 0.0111918 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64
Time: 3.10003 Gflops: 0.0103225 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64
Time: 2.05891 Gflops: 0.0155422 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 8 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 7.8522 Gflops: 0.0163012 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64
Time: 7.8062 Gflops: 0.0163972 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64
Time: 4.26278 Gflops: 0.0300274 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 13.8473 Gflops: 0.0092437 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64
Time: 14.5544 Gflops: 0.00879462 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64
Time: 8.93516 Gflops: 0.0143254 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 8 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=16

4000x4000 and 100 bits

Time: 1.38343 Gflops: 0.0231309 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 1.52409 Gflops: 0.0209962 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 0.625066 Gflops: 0.0511946 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 2.8539 Gflops: 0.0112127 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 2.8737 Gflops: 0.0111355 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 1.53008 Gflops: 0.020914 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 5.96511 Gflops: 0.0214581 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 6.13564 Gflops: 0.0208617 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 3.49505 Gflops: 0.0366233 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 10.6427 Gflops: 0.012027 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 12.0617 Gflops: 0.0106121 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 8.41178 Gflops: 0.0152167 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64

Vary GrainSize from 16 to 256 for 4000x4000 and 100 bits with OMP_NUM_THREADS=16

GrainSize=16

Time: 0.510995 Gflops: 0.062623 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 16

GrainSize=32

Time: 0.522177 Gflops: 0.0612819 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 32

GrainSize=64

Time: 0.654195 Gflops: 0.0489151 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 64

GrainSize=128

Time: 0.952734 Gflops: 0.0335875 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 128

GrainSize=256

Time: 1.56165 Gflops: 0.0204912 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 256

Benchmark using KAAPI

OMP_NUM_THREADS=16

4000x4000 and 100 bits

Time: 1.89803 Gflops: 0.0168596 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 1.78987 Gflops: 0.0178784 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 0.819725 Gflops: 0.0390375 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 3.38081 Gflops: 0.00946519 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 3.39292 Gflops: 0.00943141 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64
Time: 1.96796 Gflops: 0.0162605 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 16 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 7.70975 Gflops: 0.0166024 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 7.88548 Gflops: 0.0162324 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 4.9875 Gflops: 0.0256641 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 14.4491 Gflops: 0.00885866 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 14.5161 Gflops: 0.00881781 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64
Time: 10.8692 Gflops: 0.0117764 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 16 -i 10 -s 1020440166 -g 64

Vary GrainSize from 16 to 256 for 4000x4000 and 100 bits with OMP_NUM_THREADS=16

GrainSize=16

Time: 0.865666 Gflops: 0.0369658 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 16

GrainSize=32

Time: 0.739671 Gflops: 0.0432625 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 32

GrainSize=64

Time: 0.819725 Gflops: 0.0390375 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 64

GrainSize=128

Time: 1.22491 Gflops: 0.0261244 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 128

GrainSize=256

Time: 1.94347 Gflops: 0.0164654 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -i 3 -s 1020440166 -g 256

Clone this wiki locally