Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Collectives] Deprecate active based language #8

Merged
merged 17 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions content/collective_intro.tex
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
\emph{Collective routines} are defined as coordinated communication or synchronization
operations performed by a group of \acp{PE}.

\openshmem provides three types of collective routines:
\openshmem provides four types of collective routines:

\begin{enumerate}
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.
\item Collective routines that operate on teams use a team handle parameter to determine
which \acp{PE} will participate in the routine, and use resources encapsulated by the team object
to perform operations. See Section~\ref{subsec:team} for details on team management.

\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.
\end{DeprecateBlock}
\begin{DeprecateBlock}
\item Collective routines that operate on active sets use a set of parameters to determine
which \acp{PE} will participate and what resources are used to perform operations.

\item Collective routines that do not accept active set
parameters and, as required, the default context.
\end{DeprecateBlock}

\item Collective routines that accept neither team nor active set
parameters, which implicitly operate on the world team and, as
required, the default context.
\item Collective routines that do not accept team
parameters, which implicitly operate on the world team and, as
required, the default context.
\end{enumerate}

Concurrent accesses to symmetric memory by an \openshmem collective
Expand Down
2 changes: 1 addition & 1 deletion content/programming_model_overview.tex
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@
data object on another symmetric data object.
\item \OPR{All-to-All}: All \acp{PE} participating in the routine exchange
a fixed amount of contiguous or strided data with all other \acp{PE}
in the active set.
in the team.
\end{enumerate}

\item \textbf{Mutual Exclusion}
Expand Down
48 changes: 33 additions & 15 deletions content/shmem_alltoall.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{nelems}{
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
The number of elements to exchange for each \ac{PE}.
For \FUNC{shmem\_alltoallmem}, elements are bytes;
for \FUNC{shmem\_alltoall\{32,64\}}, elements are 4 or 8 bytes,
respectively.
}

\begin{DeprecateBlock}
Expand Down Expand Up @@ -100,6 +100,21 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine,
the following conditions must be ensured:
\begin{itemize}
\item The \VAR{dest} data object on all \acp{PE} in the team is
ready to accept the \FUNC{shmem\_alltoall} data.
\end{itemize}

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
\end{itemize}

\begin{DeprecateBlock}
Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.

Expand All @@ -116,23 +131,26 @@

Before any \ac{PE} calls a \FUNC{shmem\_alltoall} routine,
the following conditions must be ensured:

\begin{itemize}
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\item The \VAR{dest} data object on all \acp{PE} in the active set is
ready to accept the \FUNC{shmem\_alltoall} data.
\item For active-set-based routines, the \VAR{pSync} array
on all \acp{PE} in the active set is not still in use from a prior call
to a \FUNC{shmem\_alltoall} routine.
\end{itemize}

Otherwise, the behavior is undefined.

Upon return from a \FUNC{shmem\_alltoall} routine, the following is true for
the local PE:
\begin{itemize}
\item Its \VAR{dest} symmetric data object is completely updated and
the data has been copied out of the \VAR{source} data object.
\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\item Its \VAR{dest} symmetric data object is completely updated and the
data has been copied out of the source data object.
\item For active-set-based routines,
the values in the \VAR{pSync} array are restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Expand Down
4 changes: 2 additions & 2 deletions content/shmem_alltoalls.tex
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@

\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
the combined total of \VAR{nelems} elements from each \ac{PE} in the
active set.
participating \acp{PE}.
The type of \dest{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
elements of data for each \ac{PE} in the active set, ordered according to
elements of data for each \ac{PE} in the participating \acp{PE}, ordered according to
destination \ac{PE}.
The type of \source{} should match that implied in the SYNOPSIS section.}
\apiargument{IN}{dst}{The stride between consecutive elements of the \dest{}
Expand Down
80 changes: 51 additions & 29 deletions content/shmem_broadcast.tex
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
respectively.
}
\apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to
the team or active set, from which the data is copied.}
the calling PEs, from which the data is copied.}

\begin{DeprecateBlock}

Expand All @@ -61,8 +61,7 @@
\end{apiarguments}

\apidescription{
\openshmem broadcast routines are collective routines over an active set or
valid \openshmem team.
\openshmem team-based broadcast routines are collective routines over a valid \openshmem team.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
Expand All @@ -75,66 +74,89 @@
\item The \dest{} object is updated on all \acp{PE}.
\item All \acp{PE} in the \VAR{team} argument must participate in
the operation.
\item Only \acp{PE} in the team may call the routine. If a
\ac{PE} not in the team calls a team-based
collective routine, the behavior is undefined.
\item If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.
\item \ac{PE} numbering is relative to the team. The specified
root \ac{PE} must be a valid \ac{PE} number for the team,
between \CONST{0} and \VAR{N$-$1}, where \VAR{N} is the size of
the team.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following
conditions must be ensured:
\begin{itemize}
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\end{itemize}
Otherwise, the behavior is undefined.

Upon return from a team-based broadcast routine, the following are true for the local
\ac{PE}:
\begin{itemize}
\item The \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\end{itemize}

\begin{DeprecateBlock}
\openshmem active-set broadcast routines are collective routines over an active set.
They copy the \source{} data object on the \ac{PE} specified by
\VAR{PE\_root} to the \dest{} data object on the \acp{PE}
participating in the collective operation.
The same \dest{} and \source{} data objects and the same value of
\VAR{PE\_root} must be passed by all \acp{PE} participating in the
collective operation.

For active-set-based broadcasts:
\begin{itemize}
\item The \dest{} object is updated on all \acp{PE} other than the
root \ac{PE}.
\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
\item The \VAR{dest} object is updated on all PEs other than the root PE.
\item All \acp{PE} in the active set defined by the
\VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet
must participate in the operation.
\item Only \acp{PE} in the active set may call the routine. If a
\ac{PE} not in the active set calls an active-set-based
collective routine, the behavior is undefined.
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\item The values of arguments \VAR{PE\_root}, \VAR{PE\_start},
\VAR{logPE\_stride}, and \VAR{PE\_size} must be the same value
on all \acp{PE} in the active set.
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\item The value of \VAR{PE\_root} must be between \CONST{0} and
\VAR{PE\_size $-$ 1}.
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
\item The same \VAR{pSync} work array must be passed by all \acp{PE}
in the active set.
\end{itemize}

Before any \ac{PE} calls a broadcast routine, the following
Before any \ac{PE} calls a active-set-based broadcast routine, the following
conditions must be ensured:
\begin{itemize}
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item For active-set-based broadcasts, the
\VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\item The \dest{} array on all \acp{PE} participating in the broadcast
is ready to accept the broadcast data.
\item The \VAR{pSync} array on all \acp{PE} in the
active set is not still in use from a prior call to an \openshmem
collective routine.
\end{itemize}
Otherwise, the behavior is undefined.

Upon return from a broadcast routine, the following are true for the local
Upon return from a active-based broadcast routine, the following are true for the local
kwaters4 marked this conversation as resolved.
Show resolved Hide resolved
\ac{PE}:
\begin{itemize}
\item For team-based broadcasts, the \dest{} data object is
updated.
\item For active-set-based broadcasts:
\begin{itemize}
\item If the current \ac{PE} is not the root \ac{PE}, the
\dest{} data object is updated.
\item If the current PE is not the root PE, the \dest{} data object is updated.
\item The \source{} data object may be safely reused.
\item The values in the \VAR{pSync} array are restored to the
original values.
\end{itemize}
\item The \source{} data object may be safely reused.
\end{itemize}
\end{DeprecateBlock}
}


\apireturnvalues{
For team-based broadcasts, zero on successful local completion; otherwise, nonzero.

\begin{DeprecateBlock}
For active-set-based broadcasts, none.
\end{DeprecateBlock}

}

\apinotes{
Expand Down
37 changes: 31 additions & 6 deletions content/shmem_collect.tex
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,13 @@
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem team or active set
in processor number order. The resultant \dest{} array contains the contribution from
\dest{} array, over an \openshmem team in processor number order.
The resultant \dest{} array contains the contribution from
\acp{PE} as follows:

\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\item For a team, the data from \ac{PE} number \CONST{0} in the team is first, then the
contribution from \ac{PE} \CONST{1} in the team, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
Expand All @@ -90,6 +88,26 @@
If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
otherwise invalid, the behavior is undefined.

\begin{DeprecateBlock}
\openshmem \FUNC{collect} and \FUNC{fcollect} routines perform a collective
operation to concatenate \VAR{nelems}
data items from the \source{} array into the
\dest{} array, over an \openshmem active set
in processor number order. The resultant \dest{} array contains the contribution from
\acp{PE} as follows:
\begin{itemize}
\item For an active set, the data from \ac{PE} \VAR{PE\_start} is first, then the
contribution from \ac{PE} \VAR{PE\_start} + \VAR{PE\_stride} second, and so on.
\end{itemize}

The collected result is written to the \dest{} array for all \acp{PE}
that participate in the operation. The same \dest{} and \source{}
arrays must be passed by all \acp{PE} that participate in the operation.

The \FUNC{fcollect} routines require that \VAR{nelems} be the same value in all
participating \acp{PE}, while the \FUNC{collect} routines allow \VAR{nelems} to
vary from \ac{PE} to \ac{PE}.

Active-set-based collective routines operate over all \acp{PE} in the active set
defined by the \VAR{PE\_start}, \VAR{logPE\_stride}, \VAR{PE\_size} triplet.
As with all active-set-based collective routines,
Expand All @@ -108,16 +126,23 @@
\item For active-set-based collective routines, the values in the \VAR{pSync} array are
restored to the original values.
\end{itemize}
\end{DeprecateBlock}
}

\apireturnvalues{
Zero on successful local completion. Nonzero otherwise.
}

\apinotes{
\begin{DeprecateBlock}
The collective routines operate on active \ac{PE} sets that have a
non-power-of-two \VAR{PE\_size} with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.
\end{DeprecateBlock}
The collective routines that operate on teams containing a
non-power-of-two of PEs do so with some performance degradation. They operate
with no performance degradation when \VAR{nelems} is a non-power-of-two value.

}

\begin{apiexamples}
Expand Down
Loading