-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in-place (scaled) matrix transposition: imatcopy #1017
Comments
I had a look at in-place matrix transposes at some point, but they are far from trivial to implement. If the matrix is square, it is just a question of swapping A(i,j) and A(j,i). A simple recursive implementation is even reasonably efficient. If the matrix is not square, the cycle lengths can be longer, so we almost certainly require some memory (not just for an efficient implementation, but for a reference too). |
Indeed cache-oblivious transpose is really hard to get it right and quite challenging in terms of low-level decisions. But I would be really happy if we get it. I do think it won't be as performant as the copy-transpose but there is still a lot to reap compared to a double loop. |
AB := alpha*op(AB)." (emphasis mine) Since the source and destination are both AB, I gather that "a normal matrix copy" (trans=N) is basically changing the stride in place, from lda to ldb? (And applying scaling.) Gustavson has several papers on in-place transpose, e.g., https://doi.org/10.1145/2168773.2168775. Something akin to this in implemented in PLASMA. |
@mgates3, while I agree that imatcopy is not a great name, it's a fairly common one:
In situations like these I think it's preferable to go with the crowd, and just have good documentation about what can be done with the function. @thijssteel and @ilayn: it'd be fine with me to use some workspace. When I said in-place, I really just meant "don't force the user to make a full copy." |
@rileyjmurray Respectfully, I disagree. If Reference BLAS / LAPACK is a (de facto) standard, it should be reasonably self-consistent. This goes to my point on gemmt and batch gemm, that we as a community need a mechanism to propose and adopt new standard routines. |
@mgates3, fair enough! I don't actually have a strong opinion on what the function should be called. |
Intel MKL has a useful function for (scaled) in-place transposition:
https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-1/mkl-imatcopy.html
I raised the issue of having this function as a utility in LAPACK proper, and Julian (@langou) expressed support for that. I'm making a note of it here so we don't lose track of this.
The text was updated successfully, but these errors were encountered: