Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix multiplication for nfloat_complex #2037

Merged
merged 3 commits into from
Jul 19, 2024

Conversation

fredrik-johansson
Copy link
Collaborator

The following algorithms are used:

  • Classical
  • Fixed-point + Karatsuba (reducing to three real matrix multiplications)
  • Waksman + Karatsuba
  • Blocked fmpz_mat multiplication + Karatsuba
  • Reordering into four real matrix multiplications (for large multiplications when the real and imaginary parts are not balanced)

Speedup vs classical multiplication, for uniformly random [-1,1] + [-1,1]i matrices:

           2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
    64  0.977 0.995 0.992 0.998 1.000 1.000 0.998 1.000 1.005 1.350 1.475 1.586 1.571 1.814 2.976 3.237 
   128  0.996 0.997 1.000 1.000 1.006 1.004 1.003 1.004 1.005 1.129 1.252 1.378 1.420 1.650 2.390 2.935 
   192  1.003 0.991 1.000 1.104 1.256 1.060 1.068 1.088 1.099 1.512 1.436 1.509 1.536 1.865 2.709 3.834 
   256  0.995 1.000 1.003 1.146 1.196 1.114 1.106 1.128 1.136 1.410 1.329 1.449 1.476 1.905 3.092 3.904 
   320  0.996 1.274 1.380 1.577 1.471 1.455 1.457 1.470 1.481 1.488 1.490 1.490 1.514 2.369 3.460 4.749 
   384  0.997 1.232 1.324 1.495 1.411 1.379 1.401 1.396 1.460 1.472 1.443 1.509 1.540 2.524 3.603 4.644 
   448  0.998 1.280 1.380 1.594 1.492 1.501 1.527 1.576 1.585 1.620 1.631 1.634 1.621 2.323 3.380 4.376 
   512  0.994 1.248 1.333 1.550 1.503 1.358 1.520 1.548 1.580 1.597 1.620 1.609 1.646 2.280 3.246 4.292 
  1024  0.996 1.128 1.300 1.606 1.725 1.798 1.850 1.845 1.869 1.888 1.887 2.142 2.161 3.218 4.824 6.474 
  2048  0.999 1.103 1.288 1.547 1.739 1.813 1.831 1.870 1.891 1.917 1.926 2.446 2.686 3.966 6.058 8.124 
  4096  0.996 2.827 1.360 1.593 1.762 1.821 1.876 1.910 1.931 2.213 2.529 3.223 3.599 5.519 8.790 ^C

Speedup vs acf:

           2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
    64  1.824 1.831 1.845 1.930 2.572 1.777 1.614 1.542 1.630 2.034 2.042 1.543 1.499 1.529 1.436 1.099 
   128  1.722 1.767 1.728 1.788 2.504 1.868 1.750 1.673 1.666 1.869 2.050 1.359 1.339 1.378 1.391 1.252 
   192  2.089 2.114 2.248 2.464 2.654 2.323 2.333 2.421 1.419 1.584 1.715 1.779 1.611 1.555 1.342 1.433 
   256  1.918 1.956 2.049 2.325 2.301 2.215 2.204 2.388 1.450 1.571 1.705 1.592 1.493 1.472 1.418 1.318 
   320  1.435 1.865 2.053 2.349 2.256 2.184 2.182 2.518 1.615 1.528 1.451 1.272 1.188 1.439 1.343 1.350 
   384  1.504 1.888 1.906 2.269 2.175 2.156 2.195 2.548 1.669 1.509 1.385 1.168 1.127 1.369 1.359 1.324 
   448  1.536 1.981 2.197 2.364 2.279 2.334 2.333 2.588 1.899 1.760 1.655 1.339 1.282 1.398 1.376 1.313 
   512  1.564 1.924 2.105 2.519 2.330 2.351 2.539 2.727 1.903 1.773 1.644 1.377 1.313 1.330 1.346 1.334 
  1024  1.531 1.697 1.963 2.448 2.734 2.846 2.960 3.099 1.807 1.694 1.447 1.333 1.365 1.335 1.332 1.323 
  2048  1.554 1.645 1.960 2.303 2.583 2.707 2.864 2.126 1.731 1.487 1.349 1.356 1.342 1.334 1.336 1.340 
  4096  1.378 1.643 1.886 2.297 2.556 2.624 2.708 2.756 2.849 1.344 1.346 1.346 1.348 1.358 1.335 1.348 

There is an 1.3x asymptotic speedup over acf which currently always does four real multiplications in the block case.

As for nfloat_mat, the tuning may not be optimal for non-uniform matrices.

@fredrik-johansson fredrik-johansson merged commit 91f0ece into flintlib:main Jul 19, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant