forked from cc-hpc-itwm/GPI-2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
362 lines (256 loc) · 12.1 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
******************************************************************************
GPI-2
http://www.gpi-site.com
Version: 1.5.1
Copyright (C) 2013-2021
Fraunhofer ITWM
******************************************************************************
1. INTRODUCTION
===============
GPI-2 is the second generation of GPI (www.gpi-site.com). GPI-2
implements the GASPI specification (www.gaspi.de), an API
specification which originates from the ideas and concepts of GPI.
GPI-2 is an API for asynchronous communication. It provides a
flexible, scalable and fault tolerant interface for parallel
applications.
2. INSTALLATION
===============
Requirements:
------------
The current version of GPI-2 has the following requirements.
Software:
- libibverbs v1.1.6 (Verbs library from OFED) if running on Infiniband.
- ssh server running on compute nodes (requiring no password).
- autotools utilities (autoconf>=2.63,libtool>=2.2,automake>=1.11)
- gawk (GNU Awk) and sed utilities.
Hardware:
- Infiniband/RoCE device or Ethernet device.
Basic configuration:
-------------------
If GPI-2 is cloned from the repository, it is necessary to generate
the files and scripts required for its configuration. This is achieved
by the command line:
`./autogen.sh`
After this step, the configuration is done using the script
`./configure`. The available options and the relevant environment
variables are printed by `./configure --help`. The basic
configuration:
`./configure --prefix=$HOME/local`
uses the compilers defined by the environment variables CC and FC for
the general checking procedure and sets up `$HOME/local` as the
installation directory. By default, the script:
- checks for the Infiniband header and library files, and fall backs
to the Ethernet device in case they are not available or usable,
- targets to the production, debugging and statistic libraries (both
static and shared), as well as, the Fortran modules (if the Fortran
compilers are found),
- configures GPI-2 for use PBS as the batch system,
- checks the existence of `doxygen` and `dot` for the documentation target.
Compilation, testing and cleaning:
-----------------------
The compilation step:
`make -j$NPROC`
builds in parallel the GPI-2 libraries, the Fortran modules, and the
binary tests and microbenchmarks. After successful completion, the
user can define the working hosts in `tests/machines` and run the
predefined tests by:
`make check`
or by using an environment variable for the working hosts, e.g.:
`GPI2_RUNTEST_OPTIONS="-m ~/my_machines" make check`
(see more options in [tests usage](tests/README)).
Cleaning of the configuration/compilation files can be done as usual
with the commands `make distclean` and `make clean`.
Documentation and tutorial:
--------------------------
If required, the (doxygen) documentation and the tutorial code are
built through `make docs` and `make tutorial`, respectively.
Installation and uninstallation:
-------------------------------
Finally, `make install` installs:
- the running scripts in the `$HOME/local/bin` directory,
- the shared and static libraries in `$HOME/local/lib64`,
- the headers and Fortran modules in `$HOME/local/include`,
- the full tests directory in `$HOME/local/tests`
Note, as usual, the path to the GPI-2 shared libraries need to be
added to the `LD_LIBRARY_PATH` environment variable:
`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/local/lib64`
If required the package can be removed from the target directory by
using `make uninstall`.
Custom configurations:
----------------------
Specific configurations can be setup by predefined flags.
### DEVICES ###
GPI-2 is intended to be linked to the libibverbs from the OFED stack.
In case the configure script is not able to find it in the default paths
of the host system, the user can pass the path of the OFED
installation:
`./configure --with-infiniband<=full_path_to_ofed>`
By default, GPI-2 will be compiled without Infiniband Extensions
support, however the user can also enable and using it (if the header
file is found) by `--enable-infiniband-ext`. Note, however, they are
for the moment an experimental feature.
On the other hand, GPI-2 can be installed on a system without
Infiniband, using standard TCP sockets:
`./configure --with-ethernet`
Such support is, however, primarily targetted at the development of
GPI-2 applications without the need to access a system with
Infiniband, with less focus on performance.
### BATCH SYSTEM ###
PBS is the default batch system of GPI-2, however, the user can
configure it with Slurm support:
`./configure --with-slurm`
or LoadLeveler support:
`./configure --with-loadleveler`
### MPI Interoperability ###
If the plan is to use GPI-2 with MPI to, for instance, start an
incremental port of a large application or to use some libraries that
require MPI, the user can enable MPI interoperability in several ways:
- checking for MPI in the standard path: `./configure --with-mpi`
- checking for MPI in a specific path, e.g.: `./configure --with-mpi=<=path_to_mpi_installation>`
- specifying the MPI compilers, e.g.: `CC=mpicc FC=mpif90 ./configure`
For this MPI+GPI2 mixed mode, the only constraint is that MPI_Init()
must be invoked before gaspi_proc_init() and it is assumed that the
application starts with mpirun (or mpiexec, etc.). Also, note that
this option will require that the GPI-2 application is linked to the MPI
library (even if MPI is not used). Therefore, if the interest is to
use GPI-2 only, GPI-2 must not be build with this option.
Furthermore fine control of MPI can be done through the
`--with-mpi-extra-flags` option. For example, to configure with Intel
MPI compilers and link to the thread safe version of the Intel MPI
Library:
`CC=mpiicc FC=mpiifort ./configure --with-mpi-extra-flags=-mt_mpi`
### GPU/CUDA interoperability ###
GPI-2 allows a direct data transfer between NVIDIA GPUs through Mellanox
HCA and the GPUDirectRDMA's API. To this end the system must satisfy
the following requirements:
- InfiniBand or RoCE adapter cards with Mellanox ConnectX-4 (or later)
technology,
- Kepler, Tesla or Quadro GPUs
- NVIDIA software components (CUDA 5.0 or above),
- A properly loaded GPUDirect kernel module on each of the compute
nodes (can be verified through `service nv_peer_mem status` or `lsmod
| grep nv_peer_mem `)
There is neither special configuration and/or compilation setup for
GPI-2 nor special GASPI/GPI-2 functions to use GPUs and/or
GPUdirectRDMA. The user just needs to properly allocate the memory
segments and buffers into the host(s)/device(s) using the GPI-2 and
CUDA APIs. Specific considerations about the memory management and
general design of applications using GPUdirectRDMA can be found in
[https://docs.nvidia.com/cuda/gpudirect-rdma/index.html].
3. BUILDING GPI-2 APPLICATIONS
==============================
By default, GPI-2 provides two libraries: libGPI2.a and libGPI2-dbg.a,
and their corresponding shared versions: libGPI2.so and libGPI2-dbg.so.
The libGPI2.* aims at high-performance and is to be used in production
whereas the libGPI2-dbg.* provides a debug version, with extra
parameter checking and debug messages and is to be used to debug and
during development.
4. RUNNING GPI-2 APPLICATIONS
=============================
The gaspi_run utility is used to start and run GPI-2
applications. A machine file with the hostnames of nodes where the
application will run, must be provided.
For example, to start 1 process per node (on 4 nodes), the machine
file looks like:
node01
node02
node03
node04
Similarly, to start 2 processes per node (on 4 nodes):
node01
node01
node02
node02
node03
node03
node04
node04
The gaspi_run utility is invoked as follows:
gaspi_run -m <machinefile> [OPTIONS] <path GASPI program>
IMPORTANT: The path to the program must exist on all nodes where the
program should be started.
The gaspi_run utility has the following further options [OPTIONS]:
-b <binary file> Use a different binary for first node (master).
The master (first entry in the machine file) is
started with a different application than the rest
of the nodes (workers).
-N Enable NUMA for processes on same node. With this
option it is only possible to start the same number
of processes as NUMA nodes present on the system.
The processes running on same node will be set with
affinity to the proper NUMA node.
-n <procs> Start as many <procs> from machine file.
This option is used to start less processes than
those listed in the machine file.
-d Run with GDB (debugger) on master node. With this
option, GDB is started in the master node, to allow
debugging the application.
-p Ping hosts before starting the binary to make sure
they are available.
-h Show help.
Non-interactive usage
---------------------
If gaspi_run is used in a batch system, the machine file still must be
provided. In general, the information required to setup such file job
scheduler can be obtained from environment variables defined by the
job scheduler. The directory docs/batch_examples includes sample
scripts for setting the machine file and submitting jobs to common
batch processing systems. They can be used as starting point for some
elaborated applications and particular environments.
5. THE GASPI_LOGGER
===================
The gaspi_logger utility is used to view and separate the output from
all nodes when the function gaspi_printf is called. The gaspi_logger
is started, on another session, on the master node. The output of the
application, when using gaspi_printf, will be redirected to the
gaspi_logger. Other I/O routines (e.g. printf) will not.
A further separation of output (useful for debugging) can be achieved
by using the routine gaspi_printf_to which sends the output to the
gaspi_logger started on a particular node. For example,
gaspi_printf_to(1, "Hello 1\n");
will display the string "Hello 1" in the gaspi_logger started on rank
1.
6. TROUBLESHOOTING AND KNOWN ISSUES
===================================
If there are troubles when building GPI-2 with support for Infiniband,
make sure the OFED stack is correctly installed and running. As above
mentioned, it is possible to specify the OFED path in the actual host
system.
When installing GPI-2 with MPI mixed-mode support (using the options
`--with-mpi` or `--with-mpi<=path_to_mpi_installation>`) and the
installation is failing when trying to build the tests due to missing
libraries, try to setup directly the MPI compilers (wrappers) through
the environment variables CC and FC.
Environment variables
---------------------
You might have some trouble when your application requires some
dynamically set environment setting (e.g. the LD_LIBRARY_PATH), for
instance, through the module system of your jobs batch
system. Currently, neither the gaspi_run or the GPI-2 library take
care of such environment settings. To this situation there are 2
workarounds:
i) you set the required environment variables in your shell
initialization file (e.g. ~/.bashrc).
ii) you create an executable shell script which sets the required
environment variables and then starts the application. Then you can
use gaspi_run to start the application, providing the shell script as
the application to execute.
gaspi_run -m machinefile ./my_wrapper_script.sh
where my_wrapper_script.sh contains:
#!/bin/sh
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_my_lib>
<path_to_my_application>/my_application <my_app_args>
exit $?
If you're running in MPI mixed-mode, starting your application with
mpirun/mpiexec, this should not be an issue.
7. UP COMING FEATURES
=====================
GPI-2 is on-going work and more features are still to come. Here are
some that are in our roadmap:
- support to add spare nodes (fault tolerance)
- better debugging possibilities
8. MORE INFORMATION
====================
For more information, check the GPI-2 website ( www.gpi-site.com ) and
don't forget to subscribe to the GPI-2 mailing list. You subscribe it
at https://listserv.itwm.fraunhofer.de/mailman/listinfo/gpi2-users