-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inclusive and exclusive scan (prefix sum) operations #488
Conversation
Has anyone seen a scan operation that generalizes on the order in which entries are accumulated (e.g., bottom-up, as drafted, vs. top-down)? Alternatively, can an OpenSHMEM team be created that reverses the PE order? For example, is the following valid...? shmem_team_t world_reversed;
shmem_team_split_strided(SHMEM_TEAM_WORLD, shmem_n_pes() - 1, -1, shmem_n_pes(), NULL, 0, &world_reversed); AFAICT, there's nothing precluding the creation of such a team. I'd guess a lot of first implementations of OpenSHMEM 1.5 might break on this, though. |
I think the MPI answer to this would be to create a communicator that renumbers the MPI processes. I don't think that a negative stride is forbidden by the OpenSHMEM spec, but I'm also not sure whether it's something that we intended to support. The legacy collectives didn't support a negative stride, since the stride argument was treated as 2^(stride). Can anyone remember better than me? @davidozog or @manjugv? |
Making a note to consider the behavior of scan operations and NaN values (cf. #467); for example: static double dst = 0;
static double src = 0;
src = (shmem_my_pe() == shmem_n_pes() / 2) ? NAN : shmem_my_pe();
shmem_sum_inscan(SHMEM_TEAM_WORLD, &dst, &src, 1);
if (shmem_my_pe() >= shmem_n_pes())
assert(isnan(dst)); |
Are we keeping the |
The \source{} and \dest{} arguments must either be the same | ||
symmetric address, or two different symmetric addresses | ||
corresponding to buffers that do not overlap in memory. That is, | ||
they must be completely overlapping or completely disjoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we apply the clarifications from #290 here, as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is #290 the right reference here? I don't see how that applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 It was #490. Please incorporate that (minor) change to the reductions text here.
@wrrobin This API accepts arrays. The example just happens to only use a single element. |
@naveen-rn Add me to this issue |
%% C11 | ||
\begin{C11synopsis} | ||
int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); | ||
int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the last parameter be named something like nscan
or nelem
instead of nreduce
?
int collect_at(shmem_team_t team, void *dest, const void *source, size_t nbytes, int who) { | ||
static size_t sym_nbytes; | ||
sym_nbytes = nbytes; | ||
shmem_team_sync(team); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this sync if the src and dest buffers are different and dest is statically initialized?
If not, we may need to address this case in the section matter above.
\LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the | ||
behavior is undefined. | ||
|
||
Before any \ac{PE} calls a scan routine, the \dest{} array on all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In response to the message in the example section below:
By omission, this is saying that the src arrays on all pes does not need to be ready.
} | ||
\apiargument{OUT}{dest}{ | ||
Symmetric address of an array, of length \VAR{nreduce} elements, | ||
to receive the result of the scan routines. The type of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to receive the result of the scan routines. The type of | |
to receive the result of the scan operations. The type of |
} | ||
\apiargument{IN}{source}{ | ||
Symmetric address of an array, of length \VAR{nreduce} elements, | ||
that contains one element for each separate scan routine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that contains one element for each separate scan routine. | |
that contains one element for each separate scan operation. |
|
||
Before any \ac{PE} calls a scan routine, the \dest{} array on all | ||
\acp{PE} participating in the operation must be ready to accept the | ||
results of the operation. Otherwise, the behavior is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidozog Proposed text for the collectives section committee. We would add it here and to the other collectives:
The \source{} buffer at the local \ac{PE} must be ready to be read by any PE in the team.
The application does not need to synchronize to ensure that the \source{} buffer is ready
across all \acp{PE} prior to calling this routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to bikeshed this and improve the text. :)
Collectives section committee -- there are several minor wording changes mentioned on this PR. Please incorporate those changes as section edits. |
This PR adds inclusive and exclusive scan (prefix sum) operations as
shmem_sum_inscan
andshmem_sum_exscan
operations. Some comments:MPI_Scan
.MPI_Exscan
leaves the contents of the destination buffer on rank 0 undefined. The proposedshmem_sum_exscan
explicitly zeros the destination buffer on PE 0. (Rationale:MPI_Exscan
supports multiple operators; this PR only supports addition.)