Matrix multiplication for nfloat_complex #2037

fredrik-johansson · 2024-07-19T10:55:49Z

The following algorithms are used:

Classical
Fixed-point + Karatsuba (reducing to three real matrix multiplications)
Waksman + Karatsuba
Blocked fmpz_mat multiplication + Karatsuba
Reordering into four real matrix multiplications (for large multiplications when the real and imaginary parts are not balanced)

Speedup vs classical multiplication, for uniformly random [-1,1] + [-1,1]i matrices:

           2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
    64  0.977 0.995 0.992 0.998 1.000 1.000 0.998 1.000 1.005 1.350 1.475 1.586 1.571 1.814 2.976 3.237 
   128  0.996 0.997 1.000 1.000 1.006 1.004 1.003 1.004 1.005 1.129 1.252 1.378 1.420 1.650 2.390 2.935 
   192  1.003 0.991 1.000 1.104 1.256 1.060 1.068 1.088 1.099 1.512 1.436 1.509 1.536 1.865 2.709 3.834 
   256  0.995 1.000 1.003 1.146 1.196 1.114 1.106 1.128 1.136 1.410 1.329 1.449 1.476 1.905 3.092 3.904 
   320  0.996 1.274 1.380 1.577 1.471 1.455 1.457 1.470 1.481 1.488 1.490 1.490 1.514 2.369 3.460 4.749 
   384  0.997 1.232 1.324 1.495 1.411 1.379 1.401 1.396 1.460 1.472 1.443 1.509 1.540 2.524 3.603 4.644 
   448  0.998 1.280 1.380 1.594 1.492 1.501 1.527 1.576 1.585 1.620 1.631 1.634 1.621 2.323 3.380 4.376 
   512  0.994 1.248 1.333 1.550 1.503 1.358 1.520 1.548 1.580 1.597 1.620 1.609 1.646 2.280 3.246 4.292 
  1024  0.996 1.128 1.300 1.606 1.725 1.798 1.850 1.845 1.869 1.888 1.887 2.142 2.161 3.218 4.824 6.474 
  2048  0.999 1.103 1.288 1.547 1.739 1.813 1.831 1.870 1.891 1.917 1.926 2.446 2.686 3.966 6.058 8.124 
  4096  0.996 2.827 1.360 1.593 1.762 1.821 1.876 1.910 1.931 2.213 2.529 3.223 3.599 5.519 8.790 ^C

Speedup vs acf:

           2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
    64  1.824 1.831 1.845 1.930 2.572 1.777 1.614 1.542 1.630 2.034 2.042 1.543 1.499 1.529 1.436 1.099 
   128  1.722 1.767 1.728 1.788 2.504 1.868 1.750 1.673 1.666 1.869 2.050 1.359 1.339 1.378 1.391 1.252 
   192  2.089 2.114 2.248 2.464 2.654 2.323 2.333 2.421 1.419 1.584 1.715 1.779 1.611 1.555 1.342 1.433 
   256  1.918 1.956 2.049 2.325 2.301 2.215 2.204 2.388 1.450 1.571 1.705 1.592 1.493 1.472 1.418 1.318 
   320  1.435 1.865 2.053 2.349 2.256 2.184 2.182 2.518 1.615 1.528 1.451 1.272 1.188 1.439 1.343 1.350 
   384  1.504 1.888 1.906 2.269 2.175 2.156 2.195 2.548 1.669 1.509 1.385 1.168 1.127 1.369 1.359 1.324 
   448  1.536 1.981 2.197 2.364 2.279 2.334 2.333 2.588 1.899 1.760 1.655 1.339 1.282 1.398 1.376 1.313 
   512  1.564 1.924 2.105 2.519 2.330 2.351 2.539 2.727 1.903 1.773 1.644 1.377 1.313 1.330 1.346 1.334 
  1024  1.531 1.697 1.963 2.448 2.734 2.846 2.960 3.099 1.807 1.694 1.447 1.333 1.365 1.335 1.332 1.323 
  2048  1.554 1.645 1.960 2.303 2.583 2.707 2.864 2.126 1.731 1.487 1.349 1.356 1.342 1.334 1.336 1.340 
  4096  1.378 1.643 1.886 2.297 2.556 2.624 2.708 2.756 2.849 1.344 1.346 1.346 1.348 1.358 1.335 1.348

There is an 1.3x asymptotic speedup over acf which currently always does four real multiplications in the block case.

As for nfloat_mat, the tuning may not be optimal for non-uniform matrices.

fredrik-johansson added 3 commits July 19, 2024 11:04

matrix multiplication for nfloat_complex

a6c2a9c

add test case

27a43da

test code tweak

60d5397

fredrik-johansson merged commit 91f0ece into flintlib:main Jul 19, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix multiplication for nfloat_complex #2037

Matrix multiplication for nfloat_complex #2037

fredrik-johansson commented Jul 19, 2024

Matrix multiplication for nfloat_complex #2037

Matrix multiplication for nfloat_complex #2037

Conversation

fredrik-johansson commented Jul 19, 2024