Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

amilsted · 2024-03-06T22:23:29Z

Description

Requesting all eigenvalues and eigenvectors of a large matrix M = A * c with ZHEEVR can take an amount of time that strongly depends the overall real scale factor c.

For example, a gaussian random matrix of size 4096 x 4096 with values close to 1.0 takes ~7.5s on my system with ZHEEVR. Scaling the matrix by a factor of 1e13 makes the process take ~14s, an increase of almost 100%. There is a similar slowdown with DSYEVR.

In a more extreme case, with a different example matrix of the same size, the process takes 180s with ZHEEVR when the scale of eigenvalues is ~1e12 vs. 8s when the scale is ~1. This is a ~20x slowdown. In this case, the matrix is real, but in complex representation. I am happy to upload the matrix if you can tell me where you want it. It is sparse and may have degeneracies (computed eigenvalues are close up to about 1e-9 relative). Random unitary transformations of the same matrix show the same slowdown.

With the same matrix, using DSYEVR stills show a slowdown with the "bad" scaling, but not by as much (~20s vs 6s).

The slowdown only occurs when eigenvectors are requested. The ABSTOL parameter is set to 0 (I am calling via Julia and scipy - Julia definitely sets ABSTOL to 0. Probably scipy does too?).

The accuracy of the eigendecomposition does not seem to be strongly related to the scale. I get sufficiently accurate results in all cases.

The extra time is taken in a single-threaded part of the algorithm (I have seen the problem with both MKL and OpenBLAS).

HEEV does not seem to exhibit the problem and solves fast.

I was surprised that an overall scaling factor makes such a big difference to solution speed. I would have expected the scaling would not matter.

This seems like a problem of significance, since both Julia and Scipy now choose syevr/heevr as the default Hermitian/Symmetric eigensolve algorithms.

Checklist

I've included a minimal example to reproduce the issue
I'd be willing to make a PR to solve this issue

The text was updated successfully, but these errors were encountered:

thijssteel · 2024-03-07T07:35:44Z

That is a very interesting observation.

I don't know MRRR that well (but I don't think any of the currently active contributors knows it well).

My initial hunch is that it is just the selection of the algorithm. The documentation of zstemr says this:

If TRYRAC = .TRUE., indicates that the code should check whether
the tridiagonal matrix defines its eigenvalues to high relative
accuracy. If so, the code uses relative-accuracy preserving
algorithms that might be (a bit) slower depending on the matrix.
If the matrix does not define its eigenvalues to high relative
accuracy, the code can uses possibly faster algorithms.

Perhaps it is using the more expensive algorithm for the scaled matrix?

Definitely worth investigating.

RalphAS · 2024-03-10T04:40:07Z

When asked for all vectors, DSYEVR tries the MRRR scheme, and if it fails it runs the more expensive inverse-iteration approach. It appears that the jump in runtime for at least some cases like the ones reported above is due to this fallback. (Inverse iteration is especially slow for clustered eigenvalues.)

In a few cases I tried, the internal MRRR routine DLARRV (which is only used when vectors are wanted) fails for a magnified copy of a matrix for which it gave good results.

A quick fix would be to recalibrate the existing scaling logic in DSTEMR.

RalphAS · 2024-03-17T16:57:48Z

The cause of the failure reported by DSTEMR for certain matrices of large norm is an inconsistency in scaling used in a test in DLARRF:

lapack/SRC/dlarrf.f

Lines 467 to 477 in 5426147

    
           *        None of the representations investigated satisfied our 
        
           *        criteria. Take the best one we found. 
        
                    IF((SMLGROWTH.LT.FAIL).OR.NOFAIL) THEN 
        
                       LSIGMA = BESTSHIFT 
        
                       RSIGMA = BESTSHIFT 
        
                       FORCER = .TRUE. 
        
                       GOTO 5 
        
                    ELSE 
        
                       INFO = 1 
        
                       RETURN 
        
                    ENDIF

SMLGROWTH scales with the norm of the input data, but FAIL is a relative measure:

lapack/SRC/dlarrf.f

Lines 289 to 292 in 5426147

    
           S = DLAMCH( 'S' ) 
        
           SMLGROWTH = ONE / S 
        
           FAIL = DBLE(N-1)*MINGAP/(SPDIAM*EPS) 
        
           FAIL2 = DBLE(N-1)*MINGAP/(SPDIAM*SQRT(EPS))

Presumably the test at line 491 should use FAIL * SPDIAM.

This may account for some of the failures noted by @oamarques in #100.

oamarques · 2024-10-30T23:50:22Z

The slowdown reported by @amilsted is indeed due to ZSTEMR failing and then ZHEEVR resorting to plan B (inverse-iteration), as noted by @RalphAS. The failure stems from DLARRF (in the ZSTEMR chain), as also noted. (On a linux box, I have not observed a ~20x slowdown in ZHEEVR for N=4000, rather ~1.3x, since the reduction to tridiagonal form typically takes most of the time. That is a detail though.)

I could reproduce the failure for N=500, generating a matrix A with ZLATMR (generates random matrices of various types for testing LAPACK programs). Given A*c, c = 1.0e-13, 1.0e+00, 1.0e+13, ZSTEMR works for the first two values of c and fails for the third. This suggests an issue with scaling; we will continue looking to see what the fix could be.

BTW, we tested these algorithms with all kinds of matrices we could think of (see https://github.com/oamarques/STCollection) but we evidently missed something 😊

amilsted · 2025-01-02T20:01:10Z

Good to know this is reproducible. The 20x slowdown does not appear with random matrices - I've only seen things get that bad on certain non-random matrices that we encounter fairly frequently. I am thinking that you probably don't need an example of that, as fixing the fallback behavior that causes the 1.3x slowdown likely fixes thte 20x slowdown too!

martin-frbg · 2025-01-02T20:19:55Z

it occurs to me that you could try the quick fix for dlarrf.f suggested in #995 (comment) (if you have the time) if you still have the kind of matrices occurring frequently enough in your work ? I'm not a mathematician though, and wouldn't know if there is something fundamentally unsatisfactory or unelegant in that approach)

amilsted · 2025-01-02T22:10:40Z

Well, there's an easier workaround, which is just to scale the matrix to ~1 before solving. I am using LAPACK rather indirectly (via Julia or numpy and OpenBLAS) so trying the fix is not so easy.

martin-frbg · 2025-01-02T23:31:54Z

I see... guess I could sneak that tentative fix into the next OpenBLAS, but not sure if I'd want to diverge from Reference-LAPACK while the matter is still undecided. Just thought you are the one with the definite data for reproducing the issue

amilsted · 2025-01-03T02:09:20Z

If the change is in an OpenBLAS branch I could probably give it a try. I could also probably provide an example matrix that exhibits the large slowdown.

amilsted added the Type: Bug label Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

amilsted commented Mar 6, 2024 •

edited

Loading

thijssteel commented Mar 7, 2024

RalphAS commented Mar 10, 2024

RalphAS commented Mar 17, 2024

oamarques commented Oct 30, 2024

amilsted commented Jan 2, 2025

martin-frbg commented Jan 2, 2025

amilsted commented Jan 2, 2025

martin-frbg commented Jan 2, 2025

amilsted commented Jan 3, 2025

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Highly scale-dependent efficiency of ZHEEVR and DSYEVR #995

Comments

amilsted commented Mar 6, 2024 • edited Loading

thijssteel commented Mar 7, 2024

RalphAS commented Mar 10, 2024

RalphAS commented Mar 17, 2024

oamarques commented Oct 30, 2024

amilsted commented Jan 2, 2025

martin-frbg commented Jan 2, 2025

amilsted commented Jan 2, 2025

martin-frbg commented Jan 2, 2025

amilsted commented Jan 3, 2025

amilsted commented Mar 6, 2024 •

edited

Loading