Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inclusive and exclusive scan (prefix sum) operations #488

Merged
merged 2 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions content/shmem_scan.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
\apisummary {
Performs inclusive or exclusive prefix sum operations
}

\begin{apidefinition}

%% C11
\begin{C11synopsis}
int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
Copy link
Collaborator

@jdinan jdinan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the last parameter be named something like nscan or nelem instead of nreduce?

\end{C11synopsis}
where \TYPE{} is one of the integer, real, or complex types supported
for the SUM operation as specified by Table \ref{teamreducetypes}.

%% C/C++
\begin{Csynopsis}
int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce);
\end{Csynopsis}
where \TYPE{} is one of the integer, real, or complex types supported
for the SUM operation and has a corresponding \TYPENAME{} as specified
jdinan marked this conversation as resolved.
Show resolved Hide resolved
by Table \ref{teamreducetypes}.

\begin{apiarguments}
\apiargument{IN}{team}{
The team over which to perform the operation.
}
\apiargument{OUT}{dest}{
Symmetric address of an array, of length \VAR{nreduce} elements,
to receive the result of the scan routines. The type of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to receive the result of the scan routines. The type of
to receive the result of the scan operations. The type of

\dest{} should match that implied in the SYNOPSIS section.
}
\apiargument{IN}{source}{
Symmetric address of an array, of length \VAR{nreduce} elements,
that contains one element for each separate scan routine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that contains one element for each separate scan routine.
that contains one element for each separate scan operation.

The type of \source{} should match that implied in the SYNOPSIS
section.
}
\apiargument{IN}{nreduce}{
The number of elements in the \dest{} and \source{} arrays.
}
\end{apiarguments}

\apidescription{

The \FUNC{shmem\_sum\_inscan} and \FUNC{shmem\_sum\_exscan} routines
are collective routines over an \openshmem team that compute one or
more scan (or prefix sum) operations across symmetric arrays on
multiple \acp{PE}. The scan operations are performed with the SUM
operator.

The \VAR{nreduce} argument determines the number of separate scan
operations to perform. The \source{} array on all \acp{PE}
participating in the operation provides one element for each scan.
The results of the scan operations are placed in the \dest{} array
on all \acp{PE} participating in the scan.

The \FUNC{shmem\_sum\_inscan} routine performs an inclusive scan
operation, while the \FUNC{shmem\_sum\_exscan} routine performs an
exclusive scan operation.

For \FUNC{shmem\_sum\_inscan}, the value of the $j$-th element in
the \VAR{dest} array on \ac{PE}~$i$ is defined as:
\begin{equation*}
\textrm{dest}_{i,j} = \displaystyle\sum_{k=0}^{i} \textrm{source}_{k,j}
\end{equation*}

For \FUNC{shmem\_sum\_exscan}, the value of the $j$-th element in
the \VAR{dest} array on \ac{PE}~$i$ is defined as:
\begin{equation*}
\textrm{dest}_{i,j} =
\begin{cases}
\displaystyle\sum_{k=0}^{i-1} \textrm{source}_{k,j}, & \text{if} \; i \neq 0 \\
0, & \text{if} \; i = 0
\end{cases}
\end{equation*}

The \source{} and \dest{} arguments must either be the same
symmetric address, or two different symmetric addresses
corresponding to buffers that do not overlap in memory. That is,
they must be completely overlapping or completely disjoint.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply the clarifications from #290 here, as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #290 the right reference here? I don't see how that applies here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 It was #490. Please incorporate that (minor) change to the reductions text here.


Team-based scan routines operate over all \acp{PE} in the provided
team argument. All \acp{PE} in the provided team must participate in
the scan operation. If \VAR{team} compares equal to
\LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the
behavior is undefined.

Before any \ac{PE} calls a scan routine, the \dest{} array on all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to the message in the example section below:
By omission, this is saying that the src arrays on all pes does not need to be ready.

\acp{PE} participating in the operation must be ready to accept the
results of the operation. Otherwise, the behavior is undefined.
Copy link
Collaborator

@jdinan jdinan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidozog Proposed text for the collectives section committee. We would add it here and to the other collectives:

The \source{} buffer at the local \ac{PE} must be ready to be read by any PE in the team.
The application does not need to synchronize to ensure that the \source{} buffer is ready
across all \acp{PE} prior to calling this routine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to bikeshed this and improve the text. :)


Upon return from a scan routine, the following are true for the
local \ac{PE}: the \dest{} array is updated, and the \source{} array
may be safely reused.

When the \Cstd translation environment does not support complex
types, an \openshmem implementation is not required to provide
support for these complex-typed interfaces.
}

\apireturnvalues{
Zero on successful local completion. Nonzero otherwise.
}

\begin{apiexamples}

\apicexample{
In the following \Cstd[11] example, the \FUNC{collect\_at}
function gathers a variable amount of data from each \ac{PE} and
concatenates it, in order, at the target \ac{PE} \VAR{who}. Note
that this routine is behaviorally similar to
\FUNC{shmem\_collect}, except that this routine only gathers the
data to a single \ac{PE}.
}
{./example_code/shmem_scan_example.c}
{}

\end{apiexamples}

\end{apidefinition}
12 changes: 12 additions & 0 deletions example_code/shmem_scan_example.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include <shmem.h>

int collect_at(shmem_team_t team, void *dest, const void *source, size_t nbytes, int who) {
static size_t sym_nbytes;
sym_nbytes = nbytes;
shmem_team_sync(team);
Copy link
Collaborator

@BKP BKP Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this sync if the src and dest buffers are different and dest is statically initialized?
If not, we may need to address this case in the section matter above.

int rc = shmem_sum_exscan(team, &sym_nbytes, &sym_nbytes, 1);
shmem_putmem((void *)((uintptr_t)dest + sym_nbytes), source, nbytes, who);
shmem_quiet();
shmem_team_sync(team);
return rc;
}
3 changes: 3 additions & 0 deletions main_spec.tex
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,9 @@ \subsubsection{\textbf{SHMEM\_COLLECT, SHMEM\_FCOLLECT}}\label{subsec:shmem_coll
\subsubsection{\textbf{SHMEM\_REDUCTIONS}}\label{subsec:shmem_reductions}
\input{content/shmem_reductions.tex}

\subsubsection{\textbf{SHMEM\_SCAN}}\label{subsec:shmem_scan}
\input{content/shmem_scan.tex}




Expand Down