Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inclusive and exclusive scan (prefix sum) operations #488

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

nspark
Copy link
Contributor

@nspark nspark commented Feb 28, 2022

This PR adds inclusive and exclusive scan (prefix sum) operations as shmem_sum_inscan and shmem_sum_exscan operations. Some comments:

  • MPI's inclusive scan is just called MPI_Scan.
  • MPI_Exscan leaves the contents of the destination buffer on rank 0 undefined. The proposed shmem_sum_exscan explicitly zeros the destination buffer on PE 0. (Rationale: MPI_Exscan supports multiple operators; this PR only supports addition.)
  • C++ uses the "full" names: std::inclusive_scan, std::exclusive_scan
  • Inclusive scan seems somewhat more common, though I personally envision using exclusive scan more for indexing in gather-like operations.
  • The example is a bit weak.
  • I personally want value-based reductions/scans way more than array-based reductions/scans.

@nspark
Copy link
Contributor Author

nspark commented Mar 9, 2022

Has anyone seen a scan operation that generalizes on the order in which entries are accumulated (e.g., bottom-up, as drafted, vs. top-down)?

Alternatively, can an OpenSHMEM team be created that reverses the PE order? For example, is the following valid...?

shmem_team_t world_reversed;
shmem_team_split_strided(SHMEM_TEAM_WORLD, shmem_n_pes() - 1, -1, shmem_n_pes(), NULL, 0, &world_reversed);

AFAICT, there's nothing precluding the creation of such a team. I'd guess a lot of first implementations of OpenSHMEM 1.5 might break on this, though.

@jdinan
Copy link
Collaborator

jdinan commented Mar 10, 2022

I think the MPI answer to this would be to create a communicator that renumbers the MPI processes. I don't think that a negative stride is forbidden by the OpenSHMEM spec, but I'm also not sure whether it's something that we intended to support. The legacy collectives didn't support a negative stride, since the stride argument was treated as 2^(stride). Can anyone remember better than me? @davidozog or @manjugv?

@nspark
Copy link
Contributor Author

nspark commented Mar 23, 2022

Making a note to consider the behavior of scan operations and NaN values (cf. #467); for example:

static double dst = 0;
static double src = 0;
src = (shmem_my_pe() == shmem_n_pes() / 2) ? NAN : shmem_my_pe();
shmem_sum_inscan(SHMEM_TEAM_WORLD, &dst, &src, 1);
if (shmem_my_pe() >= shmem_n_pes())
  assert(isnan(dst));

@wrrobin
Copy link
Collaborator

wrrobin commented Mar 25, 2022

Are we keeping the src single element data type for now and add the support for arrays later? Or, is the example above a specific one? For arrays, do we need to provide the indices similar to what C++ has?

The \source{} and \dest{} arguments must either be the same
symmetric address, or two different symmetric addresses
corresponding to buffers that do not overlap in memory. That is,
they must be completely overlapping or completely disjoint.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply the clarifications from #290 here, as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #290 the right reference here? I don't see how that applies here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 It was #490. Please incorporate that (minor) change to the reductions text here.

content/shmem_scan.tex Outdated Show resolved Hide resolved
@nspark
Copy link
Contributor Author

nspark commented Nov 17, 2022

Are we keeping the src single element data type for now and add the support for arrays later? Or, is the example above a specific one? For arrays, do we need to provide the indices similar to what C++ has?

@wrrobin This API accepts arrays. The example just happens to only use a single element.

@kwaters4
Copy link
Contributor

@naveen-rn Add me to this issue

%% C11
\begin{C11synopsis}
int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
Copy link
Collaborator

@jdinan jdinan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the last parameter be named something like nscan or nelem instead of nreduce?

int collect_at(shmem_team_t team, void *dest, const void *source, size_t nbytes, int who) {
static size_t sym_nbytes;
sym_nbytes = nbytes;
shmem_team_sync(team);
Copy link
Collaborator

@BKP BKP Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this sync if the src and dest buffers are different and dest is statically initialized?
If not, we may need to address this case in the section matter above.

\LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the
behavior is undefined.

Before any \ac{PE} calls a scan routine, the \dest{} array on all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to the message in the example section below:
By omission, this is saying that the src arrays on all pes does not need to be ready.

}
\apiargument{OUT}{dest}{
Symmetric address of an array, of length \VAR{nreduce} elements,
to receive the result of the scan routines. The type of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to receive the result of the scan routines. The type of
to receive the result of the scan operations. The type of

}
\apiargument{IN}{source}{
Symmetric address of an array, of length \VAR{nreduce} elements,
that contains one element for each separate scan routine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that contains one element for each separate scan routine.
that contains one element for each separate scan operation.


Before any \ac{PE} calls a scan routine, the \dest{} array on all
\acp{PE} participating in the operation must be ready to accept the
results of the operation. Otherwise, the behavior is undefined.
Copy link
Collaborator

@jdinan jdinan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidozog Proposed text for the collectives section committee. We would add it here and to the other collectives:

The \source{} buffer at the local \ac{PE} must be ready to be read by any PE in the team.
The application does not need to synchronize to ensure that the \source{} buffer is ready
across all \acp{PE} prior to calling this routine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to bikeshed this and improve the text. :)

@jdinan
Copy link
Collaborator

jdinan commented Aug 23, 2024

Collectives section committee -- there are several minor wording changes mentioned on this PR. Please incorporate those changes as section edits.

@jdinan jdinan merged commit 1d6f40e into openshmem-org:master Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants