diff --git a/.github/issue_template.md b/.github/issue_template.md new file mode 100644 index 00000000..e6ed79bc --- /dev/null +++ b/.github/issue_template.md @@ -0,0 +1,28 @@ +--- +name: Issue Template +about: Template for OpenSHMEM Issues +title: '' +labels: '' +assignees: '' + +--- + +# Problem Statement + + + +# Proposed Changes + + + +# Impact on Implementations + + + +# Impact on Users + + + +# References and Pull Requests + + diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 00000000..5065a4b9 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,7 @@ +# Summary of changes + +# Proposal Checklist +- [ ] Link to issue(s) +- [ ] Changelog entry +- [ ] Reviewed for changes to front matter +- [ ] Reviewed for changes to back matter diff --git a/content/backmatter.tex b/content/backmatter.tex index 89bbe837..16e7ffcc 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -143,12 +143,6 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} immediately upon an \openshmem call into the uninitialized library. \tabularnewline \hline -Multiple calls to initialization routines & In an \openshmem program where -the initialization routines \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} -have already been called, any subsequent calls to these initialization routines -result in undefined behavior. -\tabularnewline -\hline Specifying invalid \ac{PE} numbers & For \openshmem routines that accept a \ac{PE} number as an argument, if the \ac{PE} number is invalid for the team associated with the operation (either implicitly or explicitly), the @@ -661,6 +655,11 @@ \section{Version 1.6} The following list describes the specific changes in \openshmem[1.6]: \begin{itemize} % +\item Added support for initialization and finalization routines to be called + multiple times, and added an initialization status query API + \FUNC{shmem\_query\_initialized}. +\ChangelogRef{subsec:shmem_init, subsec:shmem_finalize, subsec:shmem_query_initialized}% +% \item Added interleaved block transfer APIs \FUNC{shmem\_ibget} and \FUNC{shmem\_ibput}. \ChangelogRef{subsec:shmem_ibget, subsec:shmem_ibput}% @@ -687,6 +686,10 @@ \section{Version 1.6} operations for team-based reductions. \ChangelogRef{teamreducetypes}% % +\item Added the session routines, \FUNC{shmem\_ctx\_session\_start} and + \FUNC{shmem\_ctx\_session\_stop}, which allow users to pass hints to the + \openshmem library to apply runtime optimizations. +\ChangelogRef{subsec:sessions}% \item Added fine grained completion routine: \FUNC{shmem\_pe\_quiet}. \ChangelogRef{subsec:shmem_pe_quiet}% % @@ -694,12 +697,31 @@ \section{Version 1.6} functions from a single entry in \openshmem[1.5] into separate entries. \ChangelogRef{subsec:shmem_malloc, subsec:shmem_free, subsec:shmem_realloc, subsec:shmem_align}% +% +\item Clarified that the \FUNC{shmem\_\{malloc, free, realloc, align, + malloc\_with\_hints, calloc\}} functions are collective operations on + the world team. +\ChangelogRef{subsec:shmem_malloc, subsec:shmem_free, subsec:shmem_realloc, + subsec:shmem_align, subsec:shmmallochint, subsec:shmem_calloc}% \item Corrected the level argument's recommended value in API notes for \FUNC{shmem\_pcontrol} to indicate that the value should be greater than 2 to enable profiling with profile library defined effects and additional arguments. \ChangelogRef{subsec:shmem_pcontrol} % +\item Clarified that \FUNC{shmem\_team\_get\_config} returns the current + configuration values, which may differ from the values assigned at the + time of the team's creation. +\ChangelogRef{subsec:shmem_team_get_config} +% +\item Clarified the behavior of \FUNC{shmem\_team\_get\_config} when the + \VAR{config\_mask} is 0 and/or the \VAR{config} argument is a null pointer. +\ChangelogRef{subsec:shmem_team_get_config} +% +\item Clarified the behavior of \FUNC{shmem\_team\_split\_strided} when the + stride argument is 0 or negative. +\ChangelogRef{subsec:shmem_team_split_strided} +% \end{itemize} \section{Version 1.5} diff --git a/content/collective_intro.tex b/content/collective_intro.tex index 56ba76f1..4996b178 100644 --- a/content/collective_intro.tex +++ b/content/collective_intro.tex @@ -22,7 +22,7 @@ \end{enumerate} Concurrent accesses to symmetric memory by an \openshmem collective -routine and any other means of access---where at least one updates the +routine and any other means of access---where at least one \ac{PE} updates the symmetric memory---results in undefined behavior. Since \acp{PE} can enter and exit collectives at different times, accessing such memory remotely may require additional synchronization. diff --git a/content/execution_model.tex b/content/execution_model.tex index a1ea1a69..a56f8bda 100644 --- a/content/execution_model.tex +++ b/content/execution_model.tex @@ -8,17 +8,15 @@ \ac{PE} execution is loosely coupled, relying on \openshmem operations to communicate and synchronize among executing \acp{PE}. The \openshmem phase in -a program begins with a call to the initialization routine \FUNC{shmem\_init} +a program begins with the first call to the initialization routine \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, which must be performed before using any of the other \openshmem library routines. -An \openshmem program concludes its use of the \openshmem library when all \acp{PE} call +An \openshmem program concludes its use of the \openshmem library when all \acp{PE} +make their final call to \FUNC{shmem\_finalize} or any \ac{PE} calls \FUNC{shmem\_global\_exit}. -During a call to \FUNC{shmem\_finalize}, the \openshmem library must -complete all pending communication and release all the resources associated to -the library using an implicit collective synchronization across \acp{PE}. -Calling any \openshmem routine before initialization or after -\FUNC{shmem\_finalize} leads to undefined behavior. After finalization, a -subsequent initialization call also leads to undefined behavior. +During the last call to \FUNC{shmem\_finalize}, the \openshmem library synchronizes +all \acp{PE}, completes all pending communication and releases all the resources +associated to the library. The \acp{PE} of the \openshmem program are identified by unique integers. The identifiers are integers assigned in a monotonically increasing manner from zero diff --git a/content/memmgmt_intro.tex b/content/memmgmt_intro.tex index 393785ff..8cb6605c 100644 --- a/content/memmgmt_intro.tex +++ b/content/memmgmt_intro.tex @@ -3,7 +3,7 @@ symmetric data objects in the symmetric heap. The symmetric memory allocation routines differ from the private heap -allocation routines in that they must be called by all \acp{PE} in a +allocation routines in that they must be called by all \acp{PE} in the world team. When specified, each of these routines includes at least one call to a procedure that is semantically equivalent to \FUNC{shmem\_barrier\_all}. This ensures that all \acp{PE} diff --git a/content/sessions_intro.tex b/content/sessions_intro.tex new file mode 100644 index 00000000..8e4d4ab8 --- /dev/null +++ b/content/sessions_intro.tex @@ -0,0 +1,31 @@ +\openshmem \emph{sessions} provide a mechanism for applications to inform the +\openshmem library of an upcoming sequence of communication routines that +exhibit suitable patterns for runtime optimizations. +A session is associated with a specific \openshmem communication context +(Section~\ref{sec:ctx}), and it indicates the beginning and ending of +communication phases on that context. +The \FUNC{shmem\_ctx\_session\_start} routine indicates the beginning of a session, +and the \FUNC{shmem\_ctx\_session\_stop} routine indicates the end of a session. +The \LibConstRef{SHMEM\_CTX\_SESSION\_*} options (Table~\ref{session_opts}) indicate +which patterns of \openshmem RMA and AMO routines will occur within a session. +These options serve only as \textit{hints} to the library; it is up to the +implementation whether or not to apply any optimizations within a session. +A session may be provided a configuration argument that specifies attributes +associated with the session. This configuration argument is of type +\CTYPE{shmem\_ctx\_session\_config\_t}, which is detailed further in +Section~\ref{subsec:shmem_team_config_t}. + +Usage of the \openshmem session APIs on a particular context must comply with +the requirements of all options set on that context. +Starting and stopping \openshmem sessions should not affect the completion or +ordering semantics of any \openshmem routines in the program. +For these reasons, multi-threaded \openshmem programs may require additional +thread synchronization to ensure sessions hints are correctly applied to +shareable contexts. +Because sessions are associated with an \openshmem communication context, +routines not performed on a communication context (like collective routines) +are ineligible for session hints. + +The \FUNC{shmem\_ctx\_session\_config\_t} object requires the \CONST{SIZE\_MAX} +macro defined in \HEADER{stdint.h} by \Cstd[99]~\S7.18.3 and +\Cstd[11]~\S7.20.3. diff --git a/content/shmem_align.tex b/content/shmem_align.tex index f054f66c..0cd97308 100644 --- a/content/shmem_align.tex +++ b/content/shmem_align.tex @@ -17,7 +17,8 @@ \apidescription{ - The \FUNC{shmem\_align} routine allocates a block in the symmetric + The \FUNC{shmem\_align} routine is a collective operation on the + world team that allocates a block in the symmetric heap that has a byte alignment specified by the \VAR{alignment} argument. The value of \VAR{alignment} shall be a multiple of \CONST{sizeof(void *)} that is also a power of two; otherwise, the diff --git a/content/shmem_ctx_session_config_t.tex b/content/shmem_ctx_session_config_t.tex new file mode 100644 index 00000000..11adff1f --- /dev/null +++ b/content/shmem_ctx_session_config_t.tex @@ -0,0 +1,79 @@ +\apisummary{ + A structure type representing communication session configuration arguments +} + +\begin{apidefinition} + +\begin{Csynopsis} +typedef struct { + size_t total_ops; +} shmem_ctx_session_config_t; +\end{Csynopsis} + +\begin{apiarguments} + None. +\end{apiarguments} + + +\apidescription{ + A communication session configuration object is provided as an argument to + the \FUNC{shmem\_ctx\_session\_start} routine. + The \VAR{shmem\_ctx\_session\_config\_t} object contains optional parameters + that are associated with the options of a communication session. + These parameters serve only as \textit{hints} to the library; it is up to + the implementation whether or not to use the parameter values within + a session. + + The \VAR{total\_ops} member indicates the expected maximum number of all + calls to \openshmem RMA routines within the session (i.e., after a call to + \FUNC{shmem\_ctx\_session\_start} and before a corresponding call to + \FUNC{shmem\_ctx\_session\_stop}). + If \VAR{total\_ops} differs from the \textit{actual} number of calls to + \openshmem RMA routines within the session, then application performance + might be suboptimal; however, the result of any data transfers, + completions, or memory ordering operations are unaffected by the value of + \FUNC{total\_ops}. + + When passing a configuration structure to \FUNC{shmem\_ctx\_session\_start}, + the mask parameter specifies which fields the application requests to + associate with the session. + Any configuration parameter value that is not indicated in the mask will be + ignored, and the default value will be used instead. + Therefore, a program must set only the fields for which it does not want + the default value. + + A configuration mask is created through a bitwise OR operation of the + following library constants. + A configuration mask value of \CONST{0} indicates that the session + should be started with the default values for all configuration + parameters. + + \widetablerow{\LibConstRef{SHMEM\_CTX\_SESSION\_TOTAL\_OPS}}{ + The value of the \VAR{total\_ops} member of the \VAR{config} structure is + unmasked within the session and applied as a hint. + } + + The default values for configuration parameters are: + + \widetablerow{\VAR{total\_ops} = \CONST{SIZE\_MAX}}{ + By default, the expected maximum number of calls to \openshmem RMA routines + in the session is set to the maximum value of a \VAR{size\_t} variable, + \VAR{SIZE\_MAX}. This default setting indicates that the \openshmem + application chooses not to specify a value for \VAR{total\_ops}. + } +} + +\apinotes{ + Users are discouraged from calling \FUNC{shmem\_fence}, + \FUNC{shmem\_ctx\_fence}, \FUNC{shmem\_quiet}, or \FUNC{shmem\_ctx\_quiet} + routines within a session whenever possible, because the library must + impose strict completions to comply with ordering semantics. + However, hints provided by \FUNC{shmem\_ctx\_session\_config\_t} do not imply + the occurence of any completion or memory ordering operations. + The requirements on buffers provided to \openshmem routines that are + \textit{in-use} (as described in Section + \ref{subsec:invoking_openshmem_operations}) apply regardless of any + \FUNC{shmem\_ctx\_session\_config\_t} hints. +} + +\end{apidefinition} diff --git a/content/shmem_ctx_session_start.tex b/content/shmem_ctx_session_start.tex new file mode 100644 index 00000000..7c771d24 --- /dev/null +++ b/content/shmem_ctx_session_start.tex @@ -0,0 +1,113 @@ +\apisummary{ + Start a communication session. +} + +\begin{apidefinition} + +\begin{Csynopsis} +void @\FuncDecl{shmem\_ctx\_session\_start}@(shmem_ctx_t ctx, long options, const shmem_ctx_session_config_t *config, long config_mask); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{ctx}{A context handle specifying the context associated + with this session.} + \apiargument{IN}{options}{The set of requested options from + Table~\ref{session_opts} for this session. Multiple options may be + requested by combining them with a bitwise OR operation; otherwise, + \CONST{0} can be given if no options are requested.} + \apiargument{IN}{config}{ + A pointer to the configuration parameters for the session.} + \apiargument{IN}{config\_mask}{ + The bitwise mask representing the set of configuration parameters to use + from \VAR{config}.} +\end{apiarguments} + +\apidescription{ + \FUNC{shmem\_ctx\_session\_start} is a non-collective routine that begins a + session on communication context \VAR{ctx} with hints requested via + \VAR{options}. + Sessions on a communication context must be stopped with a call to + \FUNC{shmem\_ctx\_session\_stop} on the same context. + If a session is already started on a given context, another call to + \FUNC{shmem\_ctx\_session\_start} on that same context combines new options + via a bitwise OR operation. In such a case, unmasked member values in the + \VAR{config} argument replace any existing configuration values that are + already applied to the session. + + If \VAR{ctx} compares equal to \LibConstRef{SHMEM\_CTX\_INVALID} then + \FUNC{shmem\_ctx\_session\_start} performs no action and returns immediately. + + No combination of \VAR{options} passed to \FUNC{shmem\_ctx\_session\_start} + results in undefined behavior, but some combinations may be detrimental for + performance; for example, when selecting an option that is not applicable + to the session. It is the user's responsibility to determine which + combination of \VAR{options} benefits the performance of the session. + + The \VAR{config} argument specifies session configuration parameters, + which are described in Section~\ref{subsec:shmem_ctx_session_config_t}. + + The \VAR{config\_mask} argument is a bitwise mask representing the set of + configuration parameters to use from \VAR{config}. + A \VAR{config\_mask} value of \CONST{0} indicates that the session should + be started with the default values for all configuration parameters. + See Section~\ref{subsec:shmem_ctx_session_config_t} for field mask names and + default configuration parameters. +} + +\apireturnvalues{ + None. +} + +\sessiontablebegin + +\sessiontablerow{\LibConstRef{SHMEM\_CTX\_SESSION\_BATCH}}{ + A \textit{batch} is a series of calls to \openshmem routines that occur + within a session on a communication context (i.e., after a call to + \FUNC{shmem\_ctx\_session\_start} and before a corresponding call to + \FUNC{shmem\_ctx\_session\_stop}), that might tolerate an increase in + individual call latencies. Designating a batch may provide an opportunity + to decrease the overall overhead typically involved with the \openshmem + library implementing the series as individual RMA operations. In other + words, the performance of \openshmem programs that issue many consecutive + and small-sized RMA routines might be improved by informing the library + implementation ahead of time that it is free to delay transferring data + in order to buffer, combine, and/or coalesce the issued \openshmem + routines. The specific mechanisms for improving performance using + batching optimizations depend on the \openshmem library implementation. + + The \VAR{SHMEM\_CTX\_SESSION\_BATCH} hint indicates that a communication + context will be used to issue a batch. An example of a batch is an + iterative loop of non-blocking RMA and/or AMO routines. A batch may + include a memory ordering or collective operation, but such routines + might require completions and/or synchronization that could degrade + performance. + + Because sessions do not affect the completion or ordering semantics of any + \openshmem routines in the program, routines such as non-blocking RMAs, + non-blocking AMOs, non-blocking \OPR{put-with-signals}, blocking scalar + \OPR{puts}, small blocking \OPR{puts}, and blocking non-fetching AMOs are + viable candidates for batching. Other routines, such as large blocking + \OPR{puts}, all blocking \OPR{gets}, blocking fetching AMOs, and the + memory ordering routines might require the library to enforce + completions, reducing the potential benefit of batching. + + The \VAR{total\_ops} field of \VAR{config} indicates the expected maximum + number of calls to \openshmem RMA routines within the session. + See Section~\ref{subsec:shmem_ctx_session_config_t} for details + about \VAR{shmem\_ctx\_session\_config\_t} parameters. + } \hline + +\sessiontableend + +\apinotes{ + The \FUNC{shmem\_ctx\_session\_start} routine provides hints for improving + performance, and \openshmem implementations are not required to apply any + optimization. + \FUNC{shmem\_ctx\_session\_start} is non-collective, so there is no implied + synchronization. + Blocking puts must be sufficiently small to benefit from batching, and the + exact threshold for this benefit depends on the \openshmem implemenation + and/or the application. +} + +\end{apidefinition} diff --git a/content/shmem_ctx_session_stop.tex b/content/shmem_ctx_session_stop.tex new file mode 100644 index 00000000..fc45fda8 --- /dev/null +++ b/content/shmem_ctx_session_stop.tex @@ -0,0 +1,46 @@ +\apisummary{ + Stop a communication session. +} + +\begin{apidefinition} + +\begin{Csynopsis} +void @\FuncDecl{shmem\_ctx\_session\_stop}@(shmem_ctx_t ctx); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{ctx}{A context handle specifying the context associated + with this session.} +\end{apiarguments} + +\apidescription{ + The \FUNC{shmem\_ctx\_session\_stop} routine ends a session on context \VAR{ctx}. + If a session is already stopped on a given communication context, another + call to \FUNC{shmem\_ctx\_session\_stop} on that context has no effect. +} + +\apireturnvalues{ + None. +} + +\apinotes{ + Users are discouraged from including non-\openshmem code, such as a long + computation loop, within a session without first calling + \FUNC{shmem\_ctx\_session\_stop}. +} + + +\begin{apiexamples} + +\apicexample + {The following \CorCpp{} program demonstrates the usage of + \FUNC{shmem\_ctx\_session\_start} and \FUNC{shmem\_ctx\_session\_stop} with a loop of + random atomic non-fetching XOR updates to a distributed table, similar to + the HPC Challenge RandomAccess GUPS (Giga-updates per second) benchmark + \footnote{http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/}.} + {./example_code/shmem_ctx_session_example.c} + {} +\end{apiexamples} + +\end{apidefinition} + diff --git a/content/shmem_finalize.tex b/content/shmem_finalize.tex index cfa32d13..5496e9bf 100644 --- a/content/shmem_finalize.tex +++ b/content/shmem_finalize.tex @@ -15,23 +15,33 @@ \end{apiarguments} \apidescription{ - \FUNC{shmem\_finalize} is a collective operation that ends the \openshmem - portion of a program previously initialized by \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and - releases all resources used by the \openshmem library. This collective - operation requires all \acp{PE} to participate in the call. There is an - implicit global barrier in \FUNC{shmem\_finalize} to ensure that pending - communications are completed and that no resources are released until all - \acp{PE} have entered \FUNC{shmem\_finalize}. - This routine destroys all teams created by the \openshmem program. + \FUNC{shmem\_finalize} ends the \openshmem + portion of a program previously initialized by \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}. + This is a collective + operation that requires all \acp{PE} to participate in the call. + + An \openshmem program may perform a series of matching + initialization and finalization calls. + The last call to \FUNC{shmem\_finalize} in this series + releases all resources used by the \openshmem library. + This call destroys all teams created by the \openshmem program. As a result, all shareable contexts are destroyed. The user is responsible for destroying all contexts with the - \CONST{SHMEM\_CTX\_PRIVATE} option enabled prior to calling this routine; + \CONST{SHMEM\_CTX\_PRIVATE} option enabled prior to this call; otherwise, the behavior is undefined. - \FUNC{shmem\_finalize} must be - the last \openshmem library call encountered in the \openshmem portion of a - program. A call to \FUNC{shmem\_finalize} will release all resources - initialized by a corresponding call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}. All processes + + The last call to \FUNC{shmem\_finalize} performs an implicit global barrier + to ensure that pending communications are completed and that no resources + are released until all \acp{PE} have entered \FUNC{shmem\_finalize}. All + other calls to \FUNC{shmem\_finalize} perform an operation semantically + equivalent to \FUNC{shmem\_barrier\_all} and return without freeing any + \openshmem resources. + + The last call to \FUNC{shmem\_finalize} causes the \openshmem library + to enter an uninitialized state. No further \openshmem calls may be + made until an \openshmem initialization routine is called. + All processes that represent the \acp{PE} will still exist after the call to \FUNC{shmem\_finalize} returns, but they will no longer have access to resources that have been released. @@ -42,12 +52,19 @@ } \apinotes{ - \FUNC{shmem\_finalize} releases all resources used by the \openshmem library + The last call to \FUNC{shmem\_finalize} releases all resources used by the \openshmem library including the symmetric memory heap and pointers initiated by \FUNC{shmem\_ptr}. This collective operation requires all \acp{PE} to participate in the call, not just a subset of the \acp{PE}. The non-\openshmem portion of a program may continue after a call to \FUNC{shmem\_finalize} by all \acp{PE}. + + Calls to \FUNC{shmem\_finalize} that are not the last in a series of + initialization and finalization calls do not free any \openshmem resources. + Thus, teams, contexts, or symmetric memory allocations may be leaked until + the final call to \FUNC{shmem\_finalize}. Applications that perform + multiple initialization and finalization calls should free resources prior + to calling \FUNC{shmem\_finalize} to avoid such leaks. } \begin{apiexamples} diff --git a/content/shmem_free.tex b/content/shmem_free.tex index 6d70228a..d37b8495 100644 --- a/content/shmem_free.tex +++ b/content/shmem_free.tex @@ -13,7 +13,8 @@ \end{apiarguments} \apidescription{ - The \FUNC{shmem\_free} routine causes the block to which \VAR{ptr} + The \FUNC{shmem\_free} routine is a collective operation on the + world team that causes the block to which \VAR{ptr} points to be deallocated, that is, made available for further allocation. If \VAR{ptr} is a null pointer, no action is performed; otherwise, \FUNC{shmem\_free} calls a barrier on entry. diff --git a/content/shmem_init.tex b/content/shmem_init.tex index 82fa3b72..6bfe2e1b 100644 --- a/content/shmem_init.tex +++ b/content/shmem_init.tex @@ -18,9 +18,15 @@ library. It is a collective operation that all \acp{PE} must call before any other \openshmem routine may be called. At the end of the \openshmem program which it initialized, the call to \FUNC{shmem\_init} must be matched with a - call to \FUNC{shmem\_finalize}. After the first call to \FUNC{shmem\_init}, a - subsequent call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} in the - same program results in undefined behavior. + call to \FUNC{shmem\_finalize}. + + The \FUNC{shmem\_init} and \FUNC{shmem\_init\_thread} initialization + routines may be called multiple times within an \openshmem program. A + corresponding call to \FUNC{shmem\_finalize} must be made for each call to + an \openshmem initialization routine. The \openshmem library must not be + finalized until after the last call to \FUNC{shmem\_finalize} and may be + re-initialized with a subsequent call to an initialization routine. + } \apireturnvalues{ diff --git a/content/shmem_init_thread.tex b/content/shmem_init_thread.tex index 2a4b081b..f1f397d8 100644 --- a/content/shmem_init_thread.tex +++ b/content/shmem_init_thread.tex @@ -24,13 +24,19 @@ \VAR{requested} are \CONST{SHMEM\_THREAD\_SINGLE}, \CONST{SHMEM\_THREAD\_FUNNELED}, \CONST{SHMEM\_THREAD\_SERIALIZED}, and \CONST{SHMEM\_THREAD\_MULTIPLE}. -An \openshmem program is initialized either by \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}. -Once an \openshmem library initialization call has been performed, a subsequent -initialization call in the same program results in undefined behavior. +The \FUNC{shmem\_init} and \FUNC{shmem\_init\_thread} initialization +routines may be called multiple times within an \openshmem program. A +corresponding call to \FUNC{shmem\_finalize} must be made for each call to +an \openshmem initialization routine. The \openshmem library must not be +finalized until after the last call to \FUNC{shmem\_finalize} and may be +re-initialized with a subsequent call to an initialization routine. + If the call to \FUNC{shmem\_init\_thread} is unsuccessful in allocating and initializing resources for the \openshmem library, then the behavior of any subsequent call to the \openshmem library is undefined. + + } \apireturnvalues{ @@ -43,6 +49,9 @@ or \FUNC{shmem\_init\_thread}. If the \openshmem library is initialized by \FUNC{shmem\_init}, the library implementation can choose to support any one of the defined thread levels. + +The \openshmem library may not be able to change the level of threading support +provided after the first initialization call has been made. } \end{apidefinition} diff --git a/content/shmem_malloc.tex b/content/shmem_malloc.tex index c7ef30c0..6b0b176f 100644 --- a/content/shmem_malloc.tex +++ b/content/shmem_malloc.tex @@ -15,7 +15,8 @@ \apidescription{ - The \FUNC{shmem\_malloc} routine returns the symmetric address of a + The \FUNC{shmem\_malloc} routine is a collective operation on the + world team and returns the symmetric address of a block of at least \VAR{size} bytes, which shall be suitably aligned so that it may be assigned to a pointer to any type of object. This space is allocated from the symmetric heap (in contrast to diff --git a/content/shmem_malloc_hints.tex b/content/shmem_malloc_hints.tex index 174e143a..ef4cbfc2 100644 --- a/content/shmem_malloc_hints.tex +++ b/content/shmem_malloc_hints.tex @@ -18,7 +18,8 @@ \apidescription{ - The \FUNC{shmem\_malloc\_with\_hints} routine, like \FUNC{shmem\_malloc}, returns a pointer to a block of at least + The \FUNC{shmem\_malloc\_with\_hints} routine, like \FUNC{shmem\_malloc}, + is a collective operation on the world team that returns a pointer to a block of at least \VAR{size} bytes, which shall be suitably aligned so that it may be assigned to a pointer to any type of object. This space is allocated from the symmetric heap (similar to \FUNC{shmem\_malloc}). When the \VAR{size} is zero, diff --git a/content/shmem_query_initialized.tex b/content/shmem_query_initialized.tex new file mode 100644 index 00000000..b3729b7c --- /dev/null +++ b/content/shmem_query_initialized.tex @@ -0,0 +1,30 @@ +\apisummary{ +Returns the initialized status of the \openshmem library. +} + +\begin{apidefinition} + +\begin{Csynopsis} +void @\FuncDecl{shmem\_query\_initialized}@(int *initialized); +\end{Csynopsis} + +\begin{apiarguments} +\apiargument{OUT}{initialized}{Nonzero if the \openshmem library is in the initialized state. Zero otherwise.} +\end{apiarguments} + +\apidescription{ + The \FUNC{shmem\_query\_initialized} call returns the initialization status + of the \openshmem library. If the application has called an \openshmem + initialization routine and has not yet made the corresponding call to + \FUNC{shmem\_finalize}, this routine returns nonzero. Otherwise, it returns + zero. + + This function may be called at any time, regardless of the thread safety + level of the \openshmem library. +} + +\apireturnvalues{ +None. +} + +\end{apidefinition} diff --git a/content/shmem_query_thread.tex b/content/shmem_query_thread.tex index b2144ff5..c917a314 100644 --- a/content/shmem_query_thread.tex +++ b/content/shmem_query_thread.tex @@ -19,6 +19,9 @@ initialized by \FUNC{shmem\_init\_thread}. If the library was initialized by \FUNC{shmem\_init}, the implementation can choose to provide any one of the defined thread levels, and \FUNC{shmem\_query\_thread} returns this thread level. + +This function may be called at any time, regardless of the thread safety +level of the \openshmem library. } \apireturnvalues{ diff --git a/content/shmem_realloc.tex b/content/shmem_realloc.tex index b061f0ac..388e7bb2 100644 --- a/content/shmem_realloc.tex +++ b/content/shmem_realloc.tex @@ -16,7 +16,8 @@ \apidescription{ - The \FUNC{shmem\_realloc} routine changes the size of the block to + The \FUNC{shmem\_realloc} routine is a collective operation on + the world team that changes the size of the block to which \VAR{ptr} points to the size (in bytes) specified by \VAR{size}. The contents of the block are unchanged up to the lesser of the new and old sizes. diff --git a/content/shmem_scan.tex b/content/shmem_scan.tex new file mode 100644 index 00000000..618a51a0 --- /dev/null +++ b/content/shmem_scan.tex @@ -0,0 +1,121 @@ +\apisummary { + Performs inclusive or exclusive prefix sum operations +} + +\begin{apidefinition} + +%% C11 +\begin{C11synopsis} +int @\FuncDecl{shmem\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +int @\FuncDecl{shmem\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +\end{C11synopsis} +where \TYPE{} is one of the integer, real, or complex types supported +for the SUM operation as specified by Table \ref{teamreducetypes}. + +%% C/C++ +\begin{Csynopsis} +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_inscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_sum\_exscan}@(shmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce); +\end{Csynopsis} +where \TYPE{} is one of the integer, real, or complex types supported +for the SUM operation and has a corresponding \TYPENAME{} as specified +by Table \ref{teamreducetypes}. + +\begin{apiarguments} + \apiargument{IN}{team}{ + The team over which to perform the operation. + } + \apiargument{OUT}{dest}{ + Symmetric address of an array, of length \VAR{nreduce} elements, + to receive the result of the scan routines. The type of + \dest{} should match that implied in the SYNOPSIS section. + } + \apiargument{IN}{source}{ + Symmetric address of an array, of length \VAR{nreduce} elements, + that contains one element for each separate scan routine. + The type of \source{} should match that implied in the SYNOPSIS + section. + } + \apiargument{IN}{nreduce}{ + The number of elements in the \dest{} and \source{} arrays. + } +\end{apiarguments} + +\apidescription{ + + The \FUNC{shmem\_sum\_inscan} and \FUNC{shmem\_sum\_exscan} routines + are collective routines over an \openshmem team that compute one or + more scan (or prefix sum) operations across symmetric arrays on + multiple \acp{PE}. The scan operations are performed with the SUM + operator. + + The \VAR{nreduce} argument determines the number of separate scan + operations to perform. The \source{} array on all \acp{PE} + participating in the operation provides one element for each scan. + The results of the scan operations are placed in the \dest{} array + on all \acp{PE} participating in the scan. + + The \FUNC{shmem\_sum\_inscan} routine performs an inclusive scan + operation, while the \FUNC{shmem\_sum\_exscan} routine performs an + exclusive scan operation. + + For \FUNC{shmem\_sum\_inscan}, the value of the $j$-th element in + the \VAR{dest} array on \ac{PE}~$i$ is defined as: + \begin{equation*} + \textrm{dest}_{i,j} = \displaystyle\sum_{k=0}^{i} \textrm{source}_{k,j} + \end{equation*} + + For \FUNC{shmem\_sum\_exscan}, the value of the $j$-th element in + the \VAR{dest} array on \ac{PE}~$i$ is defined as: + \begin{equation*} + \textrm{dest}_{i,j} = + \begin{cases} + \displaystyle\sum_{k=0}^{i-1} \textrm{source}_{k,j}, & \text{if} \; i \neq 0 \\ + 0, & \text{if} \; i = 0 + \end{cases} + \end{equation*} + + The \source{} and \dest{} arguments must either be the same + symmetric address, or two different symmetric addresses + corresponding to buffers that do not overlap in memory. That is, + they must be completely overlapping or completely disjoint. + + Team-based scan routines operate over all \acp{PE} in the provided + team argument. All \acp{PE} in the provided team must participate in + the scan operation. If \VAR{team} compares equal to + \LibConstRef{SHMEM\_TEAM\_INVALID} or is otherwise invalid, the + behavior is undefined. + + Before any \ac{PE} calls a scan routine, the \dest{} array on all + \acp{PE} participating in the operation must be ready to accept the + results of the operation. Otherwise, the behavior is undefined. + + Upon return from a scan routine, the following are true for the + local \ac{PE}: the \dest{} array is updated, and the \source{} array + may be safely reused. + + When the \Cstd translation environment does not support complex + types, an \openshmem implementation is not required to provide + support for these complex-typed interfaces. +} + +\apireturnvalues{ + Zero on successful local completion. Nonzero otherwise. +} + +\begin{apiexamples} + + \apicexample{ + In the following \Cstd[11] example, the \FUNC{collect\_at} + function gathers a variable amount of data from each \ac{PE} and + concatenates it, in order, at the target \ac{PE} \VAR{who}. Note + that this routine is behaviorally similar to + \FUNC{shmem\_collect}, except that this routine only gathers the + data to a single \ac{PE}. + } + {./example_code/shmem_scan_example.c} + {} + +\end{apiexamples} + +\end{apidefinition} diff --git a/content/shmem_team_config_t.tex b/content/shmem_team_config_t.tex index a82ce2c2..dd2ad01b 100644 --- a/content/shmem_team_config_t.tex +++ b/content/shmem_team_config_t.tex @@ -32,8 +32,8 @@ See Section~\ref{sec:ctx} for more on communication contexts and Section~\ref{subsec:shmem_team_create_ctx} for team-based context creation. - When using the configuration structure to create teams, a mask parameter - controls which fields may be accessed by the \openshmem library. + When passing a configuration structure to a team creation routine, the mask parameter + specifies which fields the application requests to associate with the new team. Any configuration parameter value that is not indicated in the mask will be ignored, and the default value will be used instead. Therefore, a program must set only the fields for which it does not want the default value. diff --git a/content/shmem_team_get_config.tex b/content/shmem_team_get_config.tex index 6ee9d9ec..1dfb1839 100644 --- a/content/shmem_team_get_config.tex +++ b/content/shmem_team_get_config.tex @@ -21,10 +21,18 @@ \FUNC{shmem\_team\_get\_config} returns through the \VAR{config} argument the configuration parameters as described by the mask, which were assigned according to input configuration parameters when the team was created. +The output \VAR{config} argument indicates the \openshmem library's +parameter values at the time \FUNC{shmem\_team\_get\_config} is called. +These values may differ from the parameter values that were assigned at the +time of the team's creation. If \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID}, then no operation is performed. If \VAR{team} is otherwise invalid, the behavior is undefined. +If \VAR{config\_mask} is 0, then \VAR{shmem\_team\_get\_config} performs no action +and \VAR{config} may or may not be a null pointer. +If \VAR{config} is a null pointer, then \VAR{config\_mask} must be 0, otherwise +the behavior is undefined. } \apireturnvalues{ diff --git a/content/shmem_team_split_strided.tex b/content/shmem_team_split_strided.tex index d22a5ffd..59decede 100644 --- a/content/shmem_team_split_strided.tex +++ b/content/shmem_team_split_strided.tex @@ -50,10 +50,15 @@ i \in \mathbb{Z}_{size-1} \end{equation*} where $\mathbb{Z}$ is the set of natural numbers ($0, 1, \dots$), $N$ is the -number of \acp{PE} in the parent team and $size$ is a positive number indicating -the number of \acp{PE} in the new team. The index $i$ specifies the number of -the given PE in the new team. Thus, \acp{PE} in the new team remain in the same +number of \acp{PE} in the parent team, $size$ is a positive number indicating +the number of \acp{PE} in the new team, and $stride$ is an integer. +The index $i$ specifies the number of the given PE in the new team. +When $stride$ is greater than zero, PEs in the new team remain in the same relative order as in the parent team. +When $stride$ is less than zero, PEs in the new team are in \textit{reverse} +relative order with respect to the parent team. +If a $stride$ value equal to 0 is passed to \FUNC{shmem\_team\_split\_strided}, +then the $size$ argument passed must be 1, or the behavior is undefined. This routine must be called by all \acp{PE} in the parent team. All \acp{PE} must provide the same values for the \ac{PE} triplet. diff --git a/content/teams_intro.tex b/content/teams_intro.tex index 851d1c1e..cca6d01b 100644 --- a/content/teams_intro.tex +++ b/content/teams_intro.tex @@ -21,7 +21,7 @@ \subsubsection*{Predefined and Application-Defined Teams} portion of an application. Any team successfully created by a \FUNC{shmem\_team\_split\_*} routine is valid until it is destroyed. -All valid teams have a least one member. +All valid teams have at least one member. \subsubsection*{Team Handles} @@ -84,7 +84,7 @@ \subsubsection*{Team Creation} \acp{PE} in a newly created team are consecutively numbered starting with \ac{PE} number 0. \acp{PE} are ordered by their \ac{PE} number in -the parent team. Team relative \ac{PE} +the parent team. Team-relative \ac{PE} numbers can be used for point-to-point operations through team-based contexts (see Section~\ref{sec:ctx}) or using the translation routine \FUNC{shmem\_team\_translate\_pe}. diff --git a/example_code/shmem_ctx_session_example.c b/example_code/shmem_ctx_session_example.c new file mode 100644 index 00000000..8c96f49f --- /dev/null +++ b/example_code/shmem_ctx_session_example.c @@ -0,0 +1,51 @@ +#include +#include +#include +#include + +#define N_UPDATES (1lu << 18) +#define N_INDICES (1lu << 10) +#define N_VALUES (1lu << 31) + +int main(void) { + + shmem_init(); + + uint64_t *table = shmem_calloc(N_INDICES, sizeof(uint64_t)); + + int mype = shmem_my_pe(); + int npes = shmem_n_pes(); + srand(mype); + + shmem_ctx_t ctx; + int ret = shmem_ctx_create(0, &ctx); + if (ret != 0) { + printf("%d: Error creating context (%d)\n", mype, ret); + shmem_global_exit(1); + } + + shmem_ctx_session_config_t config; + long config_mask; + config.total_ops = N_UPDATES; + config_mask = SHMEM_CTX_SESSION_TOTAL_OPS; + + shmem_ctx_session_start(ctx, SHMEM_CTX_SESSION_BATCH, &config, config_mask); + + for (size_t i = 0; i < N_UPDATES; i++) { + int random_pe = rand() % npes; + size_t random_idx = rand() % N_INDICES; + uint64_t random_val = rand() % N_VALUES; + shmem_ctx_uint64_atomic_xor(ctx, &table[random_idx], random_val, random_pe); + } + + shmem_ctx_session_stop(ctx); + shmem_ctx_quiet(ctx); /* shmem_ctx_session_stop() does not quiet the context. */ + shmem_sync_all(); /* shmem_ctx_session_stop() does not synchronize. */ + + /* At this point, it is safe to check and/or validate the table result... */ + + shmem_ctx_destroy(ctx); + shmem_free(table); + shmem_finalize(); + return 0; +} diff --git a/example_code/shmem_scan_example.c b/example_code/shmem_scan_example.c new file mode 100644 index 00000000..12a13090 --- /dev/null +++ b/example_code/shmem_scan_example.c @@ -0,0 +1,12 @@ +#include + +int collect_at(shmem_team_t team, void *dest, const void *source, size_t nbytes, int who) { + static size_t sym_nbytes; + sym_nbytes = nbytes; + shmem_team_sync(team); + int rc = shmem_sum_exscan(team, &sym_nbytes, &sym_nbytes, 1); + shmem_putmem((void *)((uintptr_t)dest + sym_nbytes), source, nbytes, who); + shmem_quiet(); + shmem_team_sync(team); + return rc; +} diff --git a/main_spec.tex b/main_spec.tex index 23e2d624..cfc3f8ae 100644 --- a/main_spec.tex +++ b/main_spec.tex @@ -92,6 +92,10 @@ \subsubsection{\textbf{SHMEM\_QUERY\_THREAD}} \label{subsec:shmem_query_thread} \input{content/shmem_query_thread} +\subsubsection{\textbf{SHMEM\_QUERY\_INITIALIZED}} +\label{subsec:shmem_query_initialized} +\input{content/shmem_query_initialized} + \subsection{Memory Management Routines} \label{sec:memory_management} @@ -166,7 +170,6 @@ \subsubsection{\textbf{SHMEM\_CTX\_GET\_TEAM}} \label{subsec:shmem_ctx_get_team} \input{content/shmem_ctx_get_team.tex} - \subsection{Remote Memory Access Routines}\label{sec:rma} \input{content/rma_intro.tex} @@ -378,6 +381,18 @@ \subsubsection{\textbf{SHMEM\_SIGNAL\_SET}}\label{subsec:shmem_signal_set} \input{content/shmem_signal_set.tex} +\subsection{Session Routines}\label{subsec:sessions} +\input{content/sessions_intro.tex} + +\subsubsection{\textbf{SHMEM\_CTX\_SESSION\_CONFIG\_T}}\label{subsec:shmem_ctx_session_config_t} +\input{content/shmem_ctx_session_config_t.tex} + +\subsubsection{\textbf{SHMEM\_CTX\_SESSION\_START}}\label{subsec:shmem_ctx_session_start} +\input{content/shmem_ctx_session_start.tex} + +\subsubsection{\textbf{SHMEM\_CTX\_SESSION\_STOP}}\label{subsec:shmem_ctx_session_stop} +\input{content/shmem_ctx_session_stop.tex} + \subsection{Collective Routines}\label{subsec:coll} \input{content/collective_intro.tex} @@ -409,6 +424,9 @@ \subsubsection{\textbf{SHMEM\_COLLECT, SHMEM\_FCOLLECT}}\label{subsec:shmem_coll \subsubsection{\textbf{SHMEM\_REDUCTIONS}}\label{subsec:shmem_reductions} \input{content/shmem_reductions.tex} +\subsubsection{\textbf{SHMEM\_SCAN}}\label{subsec:shmem_scan} +\input{content/shmem_scan.tex} + diff --git a/utils/defs.tex b/utils/defs.tex index 5fd1e6ac..9d2bdb64 100644 --- a/utils/defs.tex +++ b/utils/defs.tex @@ -371,6 +371,32 @@ \end{tabular}\\ } +\newcommand{\widetablerow}[2]{ + \begin{tabular}{p{6cm} p{8cm}} + #1 & #2 \tabularnewline + \end{tabular}\\ +} + +\newcommand{\sessiontablebegin} { +\begin{table}[h!] +\hspace{-1.0cm} +\begin{tabular}{|p{5.6cm}|p{12cm}|} + \hline + \textbf{Option} & \textbf{Usage hint} + \tabularnewline \hline +} + +\newcommand{\sessiontableend} { +\end{tabular} +\TableCaptionRef{Session options} +\label{session_opts} +\end{table} +} + +\newcommand{\sessiontablerow}[2]{ + #1 & #2 \tabularnewline +} + \newcommand{\apinotes}[1]{ \item[Notes] \hfill \\ #1