Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed-point matrix multiplication improvements #2062

Merged
merged 16 commits into from
Sep 6, 2024

Conversation

fredrik-johansson
Copy link
Collaborator

  • We define add_ss* / sub_dd* to exist all the way up to 8 limbs for all architectures. Note that this is a bit iffy, as the C fallbacks might produce poor assembly depending on the compiler and the inline asm versions can exhaust the register allocation. On x86-32 I had to switch to C fallbacks for the longer macros. I think with 7-8 operands failures are possible on x86-64 too depending on how the macros are used, though the uses in the current codebase don't seem to hit this limit. All of this trouble because compilers don't understand carry flags, sigh.

  • Fixed-point matrix multiplication is optimized for medium size matrices by using dot products with inlined code and by incorporating Strassen multiplication, along with new tuning values. The internal nfixed representation is also made semi-public.

I will post some more notes about possible improvements in a followup issue.

Speedup for nfloat_mat_mul (with uniform matrices) in this PR:

prec \ n

          2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
   64  1.13  1.04  1.05  1.04  1.03  2.05  1.30  1.08  0.98  1.25  1.04  1.01  1.01  1.01  1.04  1.00 
  128  1.06  1.02  1.00  1.02  0.98  1.51  1.20  1.03  0.99  1.06  1.00  0.98  0.99  1.00  1.00  0.99 
  192  0.99  1.01  0.99  1.29  1.48  1.82  1.44  1.38  1.34  1.40  1.07  1.01  0.98  1.00  0.99 
  256  1.01  1.01  1.01  1.04  1.11  1.30  1.18  1.16  1.19  1.24  0.99  0.99  1.00  0.99  0.99 
  320  1.03  1.08  1.03  1.10  1.03  1.40  1.31  1.26  1.32  1.33  1.13  1.05  0.98  1.02  1.00 
  384  1.02  1.00  1.04  1.08  0.99  1.23  1.24  1.21  1.26  1.30  1.13  1.03  1.01  1.01  1.01 
  448  1.05  0.98  1.02  1.07  0.94  1.10  1.13  1.07  1.15  1.17  1.11  1.06  1.03  1.02  1.00 
  512  0.98  0.95  0.99  0.97  0.99  1.00  1.00  1.02  1.07  1.10  1.11  1.07  1.06  1.02  1.00 
  576  0.99  0.99  0.99  0.98  1.03  1.01  1.03  1.08  1.13  1.18  0.97  0.96  0.93  1.04  1.01 
  640  0.99  1.02  1.00  1.03  1.02  1.03  1.05  1.08  1.15  1.10  1.02  0.93  0.96  1.02  1.02 
  704  0.99  0.99  0.97  1.00  1.00  1.00  1.03  1.06  1.13  1.17  1.06  1.00  0.98  1.04  0.99 
  768  0.97  0.99  0.98  0.98  1.00  0.99  1.01  1.05  1.12  1.16  1.07  1.02  0.99  1.04  1.01 
  832  1.00  1.01  0.98  0.99  1.02  0.99  1.04  1.05  1.17  1.17  1.06  1.02  0.98  1.03 
  896  1.00  0.98  1.03  1.00  1.00  1.00  1.05  1.07  1.15  1.19  1.14  1.06  1.04  1.04 
  960  0.99  0.99  0.99  1.01  0.99  1.00  1.02  1.07  1.19  1.17  1.17  1.07  1.07  1.02 
 1024  1.00  0.94  0.98  0.99  0.99  0.99  1.02  1.06  1.16  1.18  1.18  1.14  1.06  0.99 
 1536  0.99  0.99  0.99  0.99  1.00  0.99  1.05  1.09  1.17  1.19  1.15  1.05  0.99  1.00 
 2048  0.97  0.99  0.99  0.99  0.99  0.99  1.06  1.13  1.16  1.18  1.11  1.05  1.02  1.01 
 2560  0.99  0.99  0.99  0.99  0.99  0.99  1.05  1.08  1.17  1.17  1.05  1.00  1.00 
 3072  1.00  0.99  0.99  0.98  0.98  0.99  1.06  1.08  1.17  1.20  0.99  0.99 
 3584  0.99  0.98  0.99  0.99  0.99  0.99  1.06  1.07  1.19  1.09  1.02  0.99 
 4096  0.97  0.94  0.95  0.93  0.94  0.94  1.01  1.03  1.09  1.14  0.98  0.97 

Speedup of nfloat_complex_mat_mul:

          2     3     4     8    16    24    32    48    64    80    96   128   144   256   512  1024 
   64  0.95  0.98  1.00  1.13  1.40  1.15  1.16  1.18  1.24  1.00  1.02  1.05  0.98  0.97  1.00  1.01 
  128  0.98  1.00  1.02  1.17  1.37  1.23  1.20  1.23  1.26  1.21  1.17  1.15  1.24  1.02  1.06 
  192  0.96  0.96  1.15  1.46  1.61  1.49  1.42  1.45  1.46  1.22  1.20  1.16  1.20  1.06  1.08 
  256  0.99  1.02  1.11  1.13  1.29  1.19  1.21  1.23  1.31  1.09  1.08  1.15  1.23  1.24  1.06 
  320  1.02  1.04  1.06  1.11  1.35  1.20  1.20  1.23  1.31  1.36  1.36  1.48  1.48  1.05  1.07 
  384  0.99  0.99  1.04  1.08  1.34  1.22  1.20  1.23  1.30  1.29  1.35  1.41  1.40  1.11  1.07 
  448  0.98  0.94  0.98  1.05  1.25  1.11  1.11  1.10  1.18  1.20  1.22  1.31  1.31  1.08  1.08 
  512  1.00  1.00  1.00  0.99  0.98  0.99  0.99  1.03  1.06  1.12  1.14  1.20  1.21  1.12  1.07 
  576  0.98  0.99  0.99  1.04  1.02  1.03  1.03  1.08  1.14  1.18  1.21  1.17  1.17  1.15  1.06 
  640  1.02  0.99  1.02  1.02  1.03  1.02  1.05  1.10  1.14  1.19  1.23  1.21  1.18  1.13 
  704  0.99  0.96  0.99  0.99  1.01  0.98  1.01  1.08  1.12  1.17  1.19  1.27  1.19  1.10 
  768  1.02  0.96  1.01  1.02  0.97  1.00  1.03  1.07  1.12  1.17  1.21  1.27  1.20  1.06 
  832  1.01  0.96  1.02  1.00  1.00  0.97  1.04  1.10  1.16  1.19  1.22  1.24  1.18  1.07 
  896  1.01  0.98  1.01  1.00  1.01  0.99  1.05  1.09  1.14  1.18  1.22  1.28  1.21  1.04 
  960  1.00  0.97  1.01  1.01  1.00  1.00  1.04  1.10  1.14  1.19  1.20  1.31  1.21  1.03 
 1024  1.01  0.98  1.00  1.00  0.98  1.00  1.04  1.09  1.16  1.19  1.22  1.28  1.19  1.01 
 1536  1.00  1.01  0.99  1.00  0.99  1.00  1.06  1.09  1.17  1.21  1.22  1.14  1.07  1.01 
 2048  1.00  0.99  1.00  1.00  1.00  1.01  1.06  1.09  1.15  1.18  1.20  1.02  1.03  1.02 
 2560  1.00  1.00  0.99  1.00  1.01  1.01  1.06  1.10  1.17  1.21  1.21  1.02  1.02 
 3072  0.99  1.00  1.00  1.00  1.00  0.99  1.07  1.09  1.18  1.19  1.20  0.99 
 3584  1.01  1.00  0.99  0.99  1.00  0.99  1.06  1.08  1.18  1.09  0.99  1.00 
 4096  0.99  0.95  0.96  0.95  0.96  0.96  1.07  1.09  1.18  0.99  1.00  1.00 

@fredrik-johansson fredrik-johansson merged commit 6c38679 into flintlib:main Sep 6, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant