-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995
Comments
That is a very interesting observation. I don't know MRRR that well (but I don't think any of the currently active contributors knows it well). My initial hunch is that it is just the selection of the algorithm. The documentation of zstemr says this:
Perhaps it is using the more expensive algorithm for the scaled matrix? Definitely worth investigating. |
When asked for all vectors, DSYEVR tries the MRRR scheme, and if it fails it runs the more expensive inverse-iteration approach. It appears that the jump in runtime for at least some cases like the ones reported above is due to this fallback. (Inverse iteration is especially slow for clustered eigenvalues.) In a few cases I tried, the internal MRRR routine DLARRV (which is only used when vectors are wanted) fails for a magnified copy of a matrix for which it gave good results. A quick fix would be to recalibrate the existing scaling logic in DSTEMR. |
The cause of the failure reported by DSTEMR for certain matrices of large norm is an inconsistency in scaling used in a test in DLARRF: Lines 467 to 477 in 5426147
Lines 289 to 292 in 5426147
Presumably the test at line 491 should use This may account for some of the failures noted by @oamarques in #100. |
The slowdown reported by @amilsted is indeed due to ZSTEMR failing and then ZHEEVR resorting to plan B (inverse-iteration), as noted by @RalphAS. The failure stems from DLARRF (in the ZSTEMR chain), as also noted. (On a linux box, I have not observed a ~20x slowdown in ZHEEVR for N=4000, rather ~1.3x, since the reduction to tridiagonal form typically takes most of the time. That is a detail though.) I could reproduce the failure for N=500, generating a matrix A with ZLATMR (generates random matrices of various types for testing LAPACK programs). Given A*c, c = 1.0e-13, 1.0e+00, 1.0e+13, ZSTEMR works for the first two values of c and fails for the third. This suggests an issue with scaling; we will continue looking to see what the fix could be. BTW, we tested these algorithms with all kinds of matrices we could think of (see https://github.com/oamarques/STCollection) but we evidently missed something 😊 |
Good to know this is reproducible. The 20x slowdown does not appear with random matrices - I've only seen things get that bad on certain non-random matrices that we encounter fairly frequently. I am thinking that you probably don't need an example of that, as fixing the fallback behavior that causes the 1.3x slowdown likely fixes thte 20x slowdown too! |
it occurs to me that you could try the quick fix for dlarrf.f suggested in #995 (comment) (if you have the time) if you still have the kind of matrices occurring frequently enough in your work ? I'm not a mathematician though, and wouldn't know if there is something fundamentally unsatisfactory or unelegant in that approach) |
Well, there's an easier workaround, which is just to scale the matrix to ~1 before solving. I am using LAPACK rather indirectly (via Julia or numpy and OpenBLAS) so trying the fix is not so easy. |
I see... guess I could sneak that tentative fix into the next OpenBLAS, but not sure if I'd want to diverge from Reference-LAPACK while the matter is still undecided. Just thought you are the one with the definite data for reproducing the issue |
If the change is in an OpenBLAS branch I could probably give it a try. I could also probably provide an example matrix that exhibits the large slowdown. |
Description
Requesting all eigenvalues and eigenvectors of a large matrix
M = A * c
with ZHEEVR can take an amount of time that strongly depends the overall real scale factorc
.For example, a gaussian random matrix of size 4096 x 4096 with values close to 1.0 takes ~7.5s on my system with ZHEEVR. Scaling the matrix by a factor of 1e13 makes the process take ~14s, an increase of almost 100%. There is a similar slowdown with DSYEVR.
In a more extreme case, with a different example matrix of the same size, the process takes 180s with ZHEEVR when the scale of eigenvalues is ~1e12 vs. 8s when the scale is ~1. This is a ~20x slowdown. In this case, the matrix is real, but in complex representation. I am happy to upload the matrix if you can tell me where you want it. It is sparse and may have degeneracies (computed eigenvalues are close up to about 1e-9 relative). Random unitary transformations of the same matrix show the same slowdown.
With the same matrix, using DSYEVR stills show a slowdown with the "bad" scaling, but not by as much (~20s vs 6s).
The slowdown only occurs when eigenvectors are requested. The ABSTOL parameter is set to 0 (I am calling via Julia and scipy - Julia definitely sets ABSTOL to 0. Probably scipy does too?).
The accuracy of the eigendecomposition does not seem to be strongly related to the scale. I get sufficiently accurate results in all cases.
The extra time is taken in a single-threaded part of the algorithm (I have seen the problem with both MKL and OpenBLAS).
HEEV does not seem to exhibit the problem and solves fast.
I was surprised that an overall scaling factor makes such a big difference to solution speed. I would have expected the scaling would not matter.
This seems like a problem of significance, since both Julia and Scipy now choose syevr/heevr as the default Hermitian/Symmetric eigensolve algorithms.
Checklist
The text was updated successfully, but these errors were encountered: