Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick return with identity Houseolder transformation if there are no off-band elements to annihilate #980

Merged
merged 28 commits into from
Sep 20, 2023

Conversation

RMeli
Copy link
Member

@RMeli RMeli commented Sep 12, 2023

If the input matrix is banded and has a band size smaller than the target band size for the reduction, nans are produced due to a division by zero when defining tau:

const T norm = std::sqrt(x0_and_squares[1]);
const T x0 = x0_and_squares[0];
const T y = std::signbit(std::real(x0_and_squares[0])) ? norm : -norm;
const T tau = (y - x0) / y;

This PR introduces an early termination when x0_and_squares[1] == 0 by returning tau = 0, thus circumventing the problem. Fix #974. Thanks @albestro for the help in identifying the issue.


With this PR, the situation in CP2K is the following (using the DLA-Future eigensolver for every matrix of size 2x2 or higher):

------------------------------- Summary --------------------------------
Number of FAILED  tests 2
Number of WRONG   tests 0
Number of CORRECT tests 2935
Total number of   tests 2937

Summary: correct: 2935 / 2937; failed: 2; 28min
Status: FAILED

*************************** Testing ended ******************************

All regression tests still returning nans (see #974) pass. Of the two remaining tests one has been fixed in CP2K (missing DLAF/pika initialization when using CP2K as a library), while the other one also fails with ScaLAPACK.

@RMeli
Copy link
Member Author

RMeli commented Sep 12, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 12, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 12, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 12, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 12, 2023

Current CI failures should be fixed by #983.

@RMeli
Copy link
Member Author

RMeli commented Sep 13, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 13, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 13, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 13, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 13, 2023

Issue reproduced in CI.

cmake/DLAF_AddTest.cmake Outdated Show resolved Hide resolved
.github/format.sh Outdated Show resolved Hide resolved
@RMeli RMeli requested a review from msimberg September 18, 2023 11:18
@RMeli
Copy link
Member Author

RMeli commented Sep 18, 2023

cscs-ci run

@RMeli
Copy link
Member Author

RMeli commented Sep 19, 2023

cscs-ci run

@RMeli RMeli requested a review from msimberg September 19, 2023 10:17
@rasolca rasolca merged commit e158aa5 into eth-cscs:master Sep 20, 2023
3 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 20, 2023
@RMeli RMeli deleted the issue974 branch September 20, 2023 10:45
@msimberg msimberg added Type:Bug Something isn't working TODO:Task Category:CI not planned Feature currently outside of the roadmap that might be considered in the future Priority:Low Priority:High and removed TODO:Task Category:CI not planned Feature currently outside of the roadmap that might be considered in the future Priority:Low labels Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:High Type:Bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

NaNs observed in reduction to band
4 participants