-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel gmres implementation #662
base: develop
Are you sure you want to change the base?
Conversation
MPI_Comm const global_communicator = distro_handle.get_global_comm(); | ||
|
||
auto success = | ||
MPI_Comm_split(global_communicator, my_row, my_col, &row_communicator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MPI_Comm_split()
call causes global communication, see the algorithm description here:
https://www.mpich.org/static/docs/v3.1.3/www3/MPI_Comm_split.html
It's OK to use (and for prototype purposes), but eventually this step should be done only once. You can have a reducer class, where you initialize an object once and you keep the communicator, then you just call MPI_Allreduce()
for every iteration.
0839089
to
d2d48c1
Compare
7a69a8a
to
449769e
Compare
4038998
to
a7f44ff
Compare
…x abtraction. Only tested on continuity_1 CPU and 4 processes. Need to try more PDEs with different numbers of processes. Haven't tried with GPUs. I think this will be easiest with CUDA-aware MPI. Kept existing serial implementation to avoid additional copies. We may want to revisit this soon. Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Signed-off-by: Steven Hahn <[email protected]>
Please review the developer documentation
on the wiki of this project that contains help and requirements.
Proposed changes
Describe what this PR changes and why. If it closes an issue, link to it here
with a supported keyword.
Haven't tried with GPUs. I think this will be easiest with CUDA-aware MPI.
Kept existing serial implementation to avoid additional copies. We may want to revisit this soon.
What type(s) of changes does this code introduce?
Put an
x
in the boxes that apply.Does this introduce a breaking change?
What systems has this change been tested on?
Only tested on continuity_1 CPU and 4 processes. Need to try more PDEs with different numbers of processes.
Checklist
Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.