Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPIO support for Basis classes #274

Merged
merged 53 commits into from
Mar 19, 2024
Merged

MPIO support for Basis classes #274

merged 53 commits into from
Mar 19, 2024

Conversation

dreamer2368
Copy link
Collaborator

@dreamer2368 dreamer2368 commented Feb 17, 2024

This PR intends to support multi-path input/output (MPIO) of HDF5 format, thereby enabling MPI I/O of a single file.

Currently the I/O only supports 'file-per-processor' parallel I/O, where each processor reads/writes its own file. While this guarantees the I/O scalability most of the time, this fixates the number of processors for the entire workflow of basis generation and continual use.

Single file I/O with multiple processors, of course, can cause a significant overhead. This, however, has been optimized within HDF5 library up to some extent. As long as the file system is appropriately set up (e.g. Lustre stripe count), the parallel I/O on a single file can maintain reasonable time cost (from HDF5 group post).
If used carefully, this option will provide users with the freedom of changing number of processors throughout their own basis generation/use workflow.

List of implementation:

  • HDFDatabaseMPIO class
  • Deploying HDFDatabaseMPIO as an option in BasisGenerator, BasisReader, and BasisWriter
  • (optional) removing the concept of time interval in basis data formats.

Comparison of scalability for 5000x1000 snapshot matrix, between HDFDatabase and HDFDatabaseMPIO:

  • BasisGenerator::endSamples (SVD + basis writing)
Number of processors Base MPIO
1 10352ms 10253ms
2 2131ms 2150ms
4 1548ms 1605ms
8 1566ms 1524ms
  • BasisGenerator::writeSnapshots (snapshot writing only)
Number of processors Base MPIO
1 232ms 246ms
2 213ms 200ms
4 171ms 175ms
8 198ms 178ms

The comparison is done on LC quartz debug node, with lustre stripe count of 36.

@dreamer2368 dreamer2368 changed the title Mpi io MPIO support for Basis classes and data format update Feb 17, 2024
@dreamer2368 dreamer2368 changed the title MPIO support for Basis classes and data format update MPIO support for Basis classes Feb 17, 2024
@dreamer2368 dreamer2368 force-pushed the mpi-io branch 2 times, most recently from 06d0818 to 55acb08 Compare February 22, 2024 00:17
@dreamer2368 dreamer2368 marked this pull request as ready for review February 22, 2024 01:16
@dreamer2368 dreamer2368 added enhancement RFR Ready for review labels Feb 22, 2024
@dreamer2368
Copy link
Collaborator Author

dreamer2368 commented Mar 14, 2024

Reflecting @dylan-copeland 's comments, function signatures of database are changed:

  • open, create now takes an optional input MPI_Comm comm=MPI_COMM_NULL, If comm==MPI_COMM_NULL, the file is opened/created serially.
  • put...Array and get...Array functions take an optional input bool distributed=false. If distributed, different types of parallel I/O are performed depending on classes:
    • CSVDatabase: all I/O are performed serially regardless of distributed.
    • HDFDatabase: Parallel I/O is performed as file-per-process, where each process opens/creates a file corresponding to its own rank. Thus the I/O behavior is the same regardless of distributed.
    • HDFDatabaseMPIO: all processes perform parallel I/O on a single file. If not distributed, then only the root process performs I/O.

@dreamer2368
Copy link
Collaborator Author

Reflecting @ckendrick 's comment, parallel hdf5 is set optional. HDFDatabaseMPIO has the full capability only when parallel hdf5 is available. Otherwise, it becomes identical to its base clasee HDFDatabase.

@dreamer2368 dreamer2368 merged commit c03ae94 into master Mar 19, 2024
4 checks passed
andersonw1 pushed a commit that referenced this pull request Apr 2, 2024
* BasisWriter::writeBasis create/open a file and closes the file at the end.

* stylization.

* HDFDatabase::putIntegerArray - overwrites if the dataset exists.

* enforce single time interval in Options.

* HDFDatabase::putIntegerArray does not allow overwrite.

* BasisWriter::writeBasis always create the file, which will overwrite the exisiting file.

* add a header and stylization.

* remove increase time interval test, as time interval is fixed to 1.

* add an error message for a guidance.

* remove test_SVD from ci workflow.

* SVD::increaseTimeInterval - allow the initial time interval.

* minor fix in test_IncrementalSVDBrand.

* reflecting the comments.

* removed the concept of time interval in BasisReader. time argument remains for backward compatibility.

* BasisWriter: removed the concept of time interval.

* minor fix in BasisReader.

* SVD: removed the concept of time intervals.

* BasisGenerator: removed the concept of time interval.

* add test_SVD.cpp for resolving conflict.

* stylization.

* changed function signature of BasisGenerator::takeSample.

* rebased to resolve conflict.

* changed function signature for BasisReader::getSpatialBasis.

* changed function signature for BasisReader::getTemporalBasis.

* changed function signature for BasisReader::getSingularValues.

* changed function signature for BasisReader::getSnapshotMatrix.

* unit test with fapl_mpi.

* parallel integer array writing example.

* add timing for H5DWrite.

* 2d integer array parallel writing example.

* hdf5 parallel integer array reading example.

* add d_dim as const member variable of BasisGenerator/Reader.

* BasisReader: CSV format is never used. Enforce HDF5 format from now on.

* create/open_parallel function.

* stylization

* test_HDFDatabase: test for selective parallal I/O.

* parallel I/O routines within basis classes.

* HDFDatabaseMPIO class initial loading.

* rebase from generator-fix2

* deployed HDFDatabaseMPIO to basis classes.

* test for partial getSpatialBasis.

* test routien for scaling.

* add hdf database test in ci workflow.

* update doxygen description

* reflecting comments 1

* reflecting comments 2. changed interface.

* minor fix

* reflecting comments

* hdf5 parallel is optional. set compile-time if statements.

* minor fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement RFR Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants