Skip to content

Benchmark of fgemv for Givaro::Integer in the field of RNS on a multicore server

ZHG2017 edited this page May 16, 2019 · 11 revisions

Note p = (0 for sequential, 1 for <Recursive,Thread>, 2 for <Row,Thread>, 3 for <Row, Grain>)

Benchmark using OpenMP

OMP_NUM_THREADS=1

Time: 4.71389 Gflops: 0.00678845 -q 0 -b 100 -p 0 -m 4000 -k 4000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 8.42978 Gflops: 0.00379607 -q 0 -b 200 -p 0 -m 4000 -k 4000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 18.8713 Gflops: 0.00678278 -q 0 -b 100 -p 0 -m 8000 -k 8000 -t 1 -N 1 -i 10 -s 1020440166 -g 64
Time: 33.7645 Gflops: 0.00379096 -q 0 -b 200 -p 0 -m 8000 -k 8000 -t 1 -N 1 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=8

4000x4000 and 100 bits

Time: 1.22053 Gflops: 0.026218 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 0.903147 Gflops: 0.0354317 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 0.581564 Gflops: 0.055024 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 2.49864 Gflops: 0.012807 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 2.05261 Gflops: 0.0155899 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 1.48643 Gflops: 0.0215281 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 5.76146 Gflops: 0.0222166 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 4.50027 Gflops: 0.0284428 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 2.52823 Gflops: 0.0506284 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 13.4817 Gflops: 0.00949435 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 10.415 Gflops: 0.01229 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64
Time: 6.56747 Gflops: 0.01949 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 8 -N 8 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=16

4000x4000 and 100 bits

Time: 0.841616 Gflops: 0.0380221 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 0.632509 Gflops: 0.0505922 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 0.292114 Gflops: 0.109546 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 1.85376 Gflops: 0.0172623 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.50288 Gflops: 0.0212924 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.04936 Gflops: 0.0304947 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 3.39264 Gflops: 0.0377288 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 3.16215 Gflops: 0.0404788 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 1.68661 Gflops: 0.0758919 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 8.18479 Gflops: 0.0156388 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 6.72505 Gflops: 0.0190333 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64
Time: 5.08036 Gflops: 0.0251951 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 16 -N 16 -i 10 -s 1020440166 -g 64

OMP_NUM_THREADS=32

4000x4000 and 100 bits

Time: 0.527186 Gflops: 0.0606996 -q 0 -b 100 -p 1 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 0.493293 Gflops: 0.0648701 -q 0 -b 100 -p 2 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 0.258855 Gflops: 0.123621 -q 0 -b 100 -p 3 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64

4000x4000 and 200 bits

Time: 1.33995 Gflops: 0.0238815 -q 0 -b 200 -p 1 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 1.31883 Gflops: 0.024264 -q 0 -b 200 -p 2 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 0.887425 Gflops: 0.0360594 -q 0 -b 200 -p 3 -m 4000 -k 4000 -t 32 -N 32 -i 10 -s 1020440166 -g 64

8000x8000 and 100 bits

Time: 2.70477 Gflops: 0.0473239 -q 0 -b 100 -p 1 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 2.59823 Gflops: 0.0492644 -q 0 -b 100 -p 2 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 1.6257 Gflops: 0.0787351 -q 0 -b 100 -p 3 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64

8000x8000 and 200 bits

Time: 6.01443 Gflops: 0.0212821 -q 0 -b 200 -p 1 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 5.74915 Gflops: 0.0222641 -q 0 -b 200 -p 2 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64
Time: 4.58431 Gflops: 0.0279213 -q 0 -b 200 -p 3 -m 8000 -k 8000 -t 32 -N 32 -i 10 -s 1020440166 -g 64