Skip to content

Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server

ZHG2017 edited this page May 16, 2019 · 11 revisions

Note p = (0 for sequential, 1 for <Recursive,Thread>, 2 for <Row,Thread>, 3 for <Row, Grain>)

Benchmark using OpenMP

OMP_NUM_THREADS=1

Time: 4.71389 Gflops: 0.00678845 -q 0 -b 100 -p 0 -m 4000 -k 4000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 8.42978 Gflops: 0.00379607 -q 0 -b 200 -p 0 -m 4000 -k 4000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 18.8713 Gflops: 0.00678278 -q 0 -b 100 -p 0 -m 8000 -k 8000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 33.7645 Gflops: 0.00379096 -q 0 -b 200 -p 0 -m 8000 -k 8000 -t 1 -N 1 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=8

4000x4000 and 100 bits

Time: 1.22053 Gflops: 0.026218 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 0.903147 Gflops: 0.0354317 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 0.581564 Gflops: 0.055024 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 2.49864 Gflops: 0.012807 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 2.05261 Gflops: 0.0155899 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 1.48643 Gflops: 0.0215281 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 5.76146 Gflops: 0.0222166 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 4.50027 Gflops: 0.0284428 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 2.52823 Gflops: 0.0506284 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 13.4817 Gflops: 0.00949435 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 10.415 Gflops: 0.01229 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 6.56747 Gflops: 0.01949 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=16

4000x4000 and 100 bits

Time: 0.841616 Gflops: 0.0380221 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 0.632509 Gflops: 0.0505922 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 0.292114 Gflops: 0.109546 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 1.85376 Gflops: 0.0172623 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.50288 Gflops: 0.0212924 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.04936 Gflops: 0.0304947 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 3.39264 Gflops: 0.0377288 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 3.16215 Gflops: 0.0404788 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.68661 Gflops: 0.0758919 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 8.18479 Gflops: 0.0156388 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64