Add parallel gmres implementation #662

quantumsteve · 2024-01-25T17:17:11Z

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

Describe what this PR changes and why. If it closes an issue, link to it here
with a supported keyword.

Haven't tried with GPUs. I think this will be easiest with CUDA-aware MPI.

Kept existing serial implementation to avoid additional copies. We may want to revisit this soon.

What type(s) of changes does this code introduce?

Put an x in the boxes that apply.

Does this introduce a breaking change?

Yes
No

What systems has this change been tested on?

Only tested on continuity_1 CPU and 4 processes. Need to try more PDEs with different numbers of processes.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

this PR is up to date with current the current state of 'develop'
code added or changed in the PR has been clang-formatted
this PR adds tests to cover any new code, or to catch a bug that is being fixed
documentation has been added (if appropriate)

src/solver.cpp

mkstoyanov · 2024-01-25T22:30:39Z

src/distribution.cpp

+  MPI_Comm const global_communicator = distro_handle.get_global_comm();
+
+  auto success =
+      MPI_Comm_split(global_communicator, my_row, my_col, &row_communicator);


The MPI_Comm_split() call causes global communication, see the algorithm description here:

https://www.mpich.org/static/docs/v3.1.3/www3/MPI_Comm_split.html

It's OK to use (and for prototype purposes), but eventually this step should be done only once. You can have a reducer class, where you initialize an object once and you keep the communicator, then you just call MPI_Allreduce() for every iteration.

…x abtraction. Only tested on continuity_1 CPU and 4 processes. Need to try more PDEs with different numbers of processes. Haven't tried with GPUs. I think this will be easiest with CUDA-aware MPI. Kept existing serial implementation to avoid additional copies. We may want to revisit this soon. Signed-off-by: Steven Hahn <[email protected]>

Signed-off-by: Steven Hahn <[email protected]>

quantumsteve marked this pull request as draft January 25, 2024 17:17

quantumsteve changed the title ~~Add parallel gmres implmentation~~ Add parallel gmres implementation Jan 25, 2024

quantumsteve commented Jan 25, 2024

View reviewed changes

src/solver.cpp Outdated Show resolved Hide resolved

mkstoyanov reviewed Jan 25, 2024

View reviewed changes

quantumsteve force-pushed the mpi_gmres branch from 0839089 to d2d48c1 Compare February 21, 2024 22:34

quantumsteve mentioned this pull request Feb 23, 2024

Parallel GMRES solver fails when n isn't a perfect square #667

Open

quantumsteve force-pushed the mpi_gmres branch 2 times, most recently from 7a69a8a to 449769e Compare June 25, 2024 20:47

quantumsteve force-pushed the mpi_gmres branch 2 times, most recently from 4038998 to a7f44ff Compare July 3, 2024 21:30

quantumsteve added 20 commits July 10, 2024 15:32

fix build?

5254c89

Signed-off-by: Steven Hahn <[email protected]>

fix clang build?

1175260

Signed-off-by: Steven Hahn <[email protected]>

multiply by elem_size

35f955b

Signed-off-by: Steven Hahn <[email protected]>

fix CUDA build

ddf1dc9

Signed-off-by: Steven Hahn <[email protected]>

fix solver_test?

b815b6f

Signed-off-by: Steven Hahn <[email protected]>

Adjust tolerances

791c954

Signed-off-by: Steven Hahn <[email protected]>

Fix CUDA build?

db5cc4f

Signed-off-by: Steven Hahn <[email protected]>

Fix distribution_test?

fe3c4c9

Signed-off-by: Steven Hahn <[email protected]>

avoid test timeout?

0e6d50f

Signed-off-by: Steven Hahn <[email protected]>

cleanup. Issue with GMRES restart?

64112da

Signed-off-by: Steven Hahn <[email protected]>

use serial implmentation for single rank

8197aaf

Signed-off-by: Steven Hahn <[email protected]>

Fix regressions

8427edd

Signed-off-by: Steven Hahn <[email protected]>

Using wrong input...

db6a232

Signed-off-by: Steven Hahn <[email protected]>

revert precision regressions

78f598a

Signed-off-by: Steven Hahn <[email protected]>

More testing

3cf6499

Signed-off-by: Steven Hahn <[email protected]>

Fix clang build

b95d953

Signed-off-by: Steven Hahn <[email protected]>

hack to print vectors with MPI enabled

fafaf62

Signed-off-by: Steven Hahn <[email protected]>

Fix some size mismatches?

dce19f0

Signed-off-by: Steven Hahn <[email protected]>

get building

78006d5

Signed-off-by: Steven Hahn <[email protected]>

quantumsteve added 5 commits July 10, 2024 15:32

fix build?

3937a00

Signed-off-by: Steven Hahn <[email protected]>

try again

f820699

Signed-off-by: Steven Hahn <[email protected]>

fix build

5506506

Signed-off-by: Steven Hahn <[email protected]>

mpi fixes

28db958

Signed-off-by: Steven Hahn <[email protected]>

fix CUDA build?

da9e786

Signed-off-by: Steven Hahn <[email protected]>

quantumsteve force-pushed the mpi_gmres branch from a7f44ff to da9e786 Compare July 11, 2024 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel gmres implementation #662

Add parallel gmres implementation #662

quantumsteve commented Jan 25, 2024

mkstoyanov Jan 25, 2024

Add parallel gmres implementation #662

Are you sure you want to change the base?

Add parallel gmres implementation #662

Conversation

quantumsteve commented Jan 25, 2024

Proposed changes

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

mkstoyanov Jan 25, 2024

Choose a reason for hiding this comment