forked from jmrosinski/GPTL
-
Notifications
You must be signed in to change notification settings - Fork 0
General Purpose Timing Library
License
jedwards4b/GPTL
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This file contains information about using GPTL. For information on building and installing GPTL, see the file INSTALL. GPTL is the "General Purpose Timing Library". It can be used to manually instrument application codes with an arbitrary set of "regions" (or "timers") over which statistics such as wallclock time and CPU time are gathered and subsequently printed. If the target application is built with the GNU compilers (gcc or gfortran), Pathscale (pathcc or pathf95), or PGI compilers, GPTL can also be used to automatically instrument regions which are defined by function entry and exit points. This is an easy way to generate a dynamic call tree. See Auto-Instrumentation below for a description of how to use this feature. Similar to compiler-generated auto-instrumentation, GPTL can intercept and auto-profile MPI calls made by the application if the target MPI library supports the PMPI profiling layer. In this case an estimate of bytes transferred by each MPI call is presented in the printed output. If the PAPI library is installed (http://icl.cs.utk.edu/papi), GPTL also provides a convenient mechanism to access all available PAPI events. In addtion to PAPI preset and native events, GPTL defines derived events which are based on PAPI counters. See gptl.h for a list of available derived events. Of course these events can only be enabled if the PAPI counters they require are available on the target architecture. Using GPTL ---------- C codes making GPTL library calls should #include <gptl.h>. Fortran codes can "use gptl" or #include or Fortran include 'gptl.inc'. The C and Fortran interfaces are identical, except that the C interface uses mixed case. All user-accessible functions return either 0 (success) or -1 (failure). Example codes that use the library can be found in subdirectories ctests/ and ftests/. Code instrumentation to utilize GPTL involves zero or more calls to GPTLsetoption(), then a single call to GPTLinitialize(), then an arbitrary sequence of calls to GPTLstart() and GPTLstop(), and finally a call to GPTLpr() or GPTLpr_file(). See "Example" below for a sample calling sequence. Calls to GPTLstart() and GPTLstop() are thread-safe, with per-thread statistics printed by GPTLpr() or GPTLpr_file(). The purpose of GPTLsetoption() is to enable or disable various library options. For example, to enable the PAPI counter for total cycles, do this: ret = GPTLsetoption (PAPI_TOT_CYC, 1); The "1" says "enable". Use "0" for "disable". See the man pages for complete documentation on function usage and arguments. The list of available GPTL options is contained in gptl.h, and a complete list of available PAPI-based events can be found by running "ctests/avail". GPTLinitialize() initializes the GPTL library. There can be an arbitrary number of start/stop pairs before GPTLpr() or GPTLpr_file() is called to print the results. And an arbitrary amount of nesting of regions is also allowed. The printed results will be indented to indicate the level of nesting for each region. GPTLpr() prints the results to a file named timing.<number>, where the single argument <number> is an integer. For MPI jobs, it is most convenient to use the MPI rank of the invoking task for <number>. Equivalently, function GPTLpr_file() can be called. Its input argument is a character string indicating the output file name to be written. It is up to the user to ensure that these print functions write to uniquely-named files, in order to avoid name-space collisions. GPTLfinalize() can be called to clean up the GPTL environment. All space malloc'ed by the GPTL library will be freed by this call. Example ------- From "man GPTLstart", a simple example calling sequence to time a couple of code regions and print the results is: (void) GPTLsetoption (GPTLcpu, 1); /* enable cpu timings */ (void) GPTLsetoption (GPTLwall, 0); /* disable wallclock timings */ (void) GPTLsetoption (PAPI_TOT_CYC, 1); /* enable counting of total cycles */ ... (void) GPTLinitialize(); /* initialize the GPTL library */ (void) GPTLstart ("total"); /* start a timer */ ... (void) GPTLstart ("do_work"); /* start another timer */ do_work(); /* do some work */ (void) GPTLstop ("do_work"); /* stop a timer */ (void) GPTLstop ("total"); /* stop a timer */ ... (void) GPTLpr (mympitaskid); /* print the results to timing.<mympitaskid> */ Auto-instrumentation -------------------- If the regions to be timed are defined by function entry and exit points, and the application to be profiled is built with either the GNU or Pathscale compilers, you might find it convenient to use the auto-instrumentation feature of GPTL. Here's how: 1) Add the flag -finstrument-functions (-Minstrument:functions under PGI) when compiling the routines you'd like to profile. 2) Add calls to GPTLsetoption() (if desired), and GPTLinitialize() to the main program before any other routines are invoked. 3) Add a call to GPTLpr() or GPTLpr_file() wherever appropriate prior to where the code terminates. 4) Link with -lgptl (and -lpapi if PAPI is enabled). 5) Run the code. 6) Run "hex2name.pl <a.out> <timing.0> | less", where <a.out> is the name of the executable, and <timing.0> is the name of the timing file to be converted. The result should be a dynamic call tree with timings and (if enabled) PAPI counts and derived event statistics for each region, where regions are defined by function entry and exit points. Here's what's happening under the covers: The -finstrument-functions flag tells the compiler to insert calls to __cyg_profile_func_enter (void *this_fn, void *call_site) at function start, and __cyg_profile_func_exit (void *this_fn, void *call_site) at function exit. GPTL defines these functions as calls to (effectively) GPTLstart() and GPTLstop(), where the address of the function is used as the input sentinel to these routines. Running hex2name.pl converts the function addresses back to human-readable function names. It uses the UNIX "nm" utility to do this. When using MPI auto-profiling, steps 2) and 3) above can be omitted. In this case GPTL auto-generates calls to GPTLinitialize and GPTLpr from MPI_Init and MPI_finalize, respectively. Multi-processor instrumented codes ---------------------------------- With rev. 4.3 of GPTL, function GPTLpr_summary(mpi_communicator) was rewritten from scratch for scalability and the presentation of additional statistical information. Max, min, mean, and standard deviation of region timings, along with the process and thread index responsible for max and min, are presented in a single output file named timing.summary. With this rewrite, this is now the preferred method (over parsegptlout.pl) for gathering summary statistics across threads and tasks. See example3 in the web documentation for further information.
About
General Purpose Timing Library
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 56.0%
- Shell 27.8%
- Fortran 13.6%
- Perl 2.4%
- C++ 0.2%