-
Notifications
You must be signed in to change notification settings - Fork 1
LDMS Data Facilitates Analysis
IO wait profiles from LDMS data for a 64-node job: 1 sec interval (top), 60 sec interval (bottom). Higher fidelity sampling is needed to resolve details. Each line is a single node’s data (legend suppressed). The gray background shows times pre- and post-job. From Toward Rapid Understanding of Production HPC Applications and Systems @ IEEECluster 2015.
Full system data enables investigation of effects of conditions which cannot be understood from the data accessible from an application's perspective alone. For example, in shared networks conditions along the communication routes will affect the application's performance. That data is not available from the application's allocation.
In order to get a coherent picture of conditions, the data must be collected at effectively the same time across possibly tens of thousands of disparate components.
Lustre opens per node over a day on NCSA's Blue Waters. Significant opens at across system at the same time indicated by arrow. Horizontal lines are indicative of significant and sustained level of opens from a few nodes. From The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications @ SC 2014.
Run time data collection and transport enables analysis while applications are running and while the system is experiencing conditions of interest. Thus, problems can be discovered early and remediative action can be taken. Post-processing analysis does not solve problems as they occur and is rarely performed in practice. LDMS supports streaming analysis as part of the store plugins or on the output of the store (e.g., store feeding a named pipe). Analysis can also be performed on a database while LDMS data is being fed into it.
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running