-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds distributed row gatherer #1589
base: neighborhood-communicator
Are you sure you want to change the base?
Conversation
6b4521b
to
ae60198
Compare
6acf7c4
to
8aa6ab9
Compare
49557f1
to
4a79442
Compare
8aa6ab9
to
77398bd
Compare
4a79442
to
172eb7d
Compare
77398bd
to
d278cad
Compare
98fa10a
to
79de4c3
Compare
One issue that I have is the constructor. It takes a
If I can't come up with anything better, I guess I will use that. |
79de4c3
to
b0e5c92
Compare
d278cad
to
d6112ef
Compare
b0e5c92
to
775854a
Compare
d6112ef
to
1582673
Compare
Do we need to have the |
8ec2b02
to
58d06ed
Compare
e0ef53e
to
8d566aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments regarding synchronizations.
58d06ed
to
dec99fb
Compare
c9d5643
to
8e3e932
Compare
dec99fb
to
a58baba
Compare
8e3e932
to
40cd2c0
Compare
98dcc4f
to
5e970e9
Compare
40cd2c0
to
fe864bb
Compare
5e970e9
to
0a8e28c
Compare
fe864bb
to
db7f6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work! LGTM!
namespace distributed { | ||
|
||
|
||
#if GINKGO_HAVE_OPENMPI_POST_4_1_X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be PRE and the other way around ? I dont think there is a macro like this defined.
int is_inactive; | ||
MPI_Status status; | ||
GKO_ASSERT_NO_MPI_ERRORS( | ||
MPI_Request_get_status(req_listener_, &is_inactive, &status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe move this MPI function into mpi.hpp
and create a wrapper for it ?
|
||
mutable array<char> send_workspace_; | ||
|
||
mutable MPI_Request req_listener_{MPI_REQUEST_NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be of type mpi::request
?
template <typename LocalIndexType> | ||
void RowGatherer<LocalIndexType>::apply_impl(const LinOp* alpha, const LinOp* b, | ||
const LinOp* beta, LinOp* x) const | ||
GKO_NOT_IMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also implement the advanced apply by replacing b_local->row_gather(idxs, buffer)
by b_local->row_gather(alpha, idxs, beta, buffer)
?
0a8e28c
to
bfc5233
Compare
db7f6ed
to
0ad4ee8
Compare
- only allocate if necessary - synchronize correct executor Co-authored-by: Pratik Nayak <[email protected]>
bfc5233
to
8697971
Compare
0ad4ee8
to
1f49b91
Compare
This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.
Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional
workspace
parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.This is the second part of splitting up #1546.
It also introduces some intermediate changes, which could be extracted out beforehand:
a type-erasedDenseCache
makingnow part of Use index_map in distributed::matrix #1544detail::run
easier to usePR Stack: