Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of time interval and closing files within BasisWriter::writeBasis #261

Merged
merged 37 commits into from
Feb 29, 2024

Conversation

dreamer2368
Copy link
Collaborator

@dreamer2368 dreamer2368 commented Jan 17, 2024

Closing files at the end of BasisWriter::writeBasis

This PR revamps #150 . Now closing the database file is not associated with the destruction of BasisWriter. This enables writing both the basis and the snapshot at a given moment. As far as I believe, this should keep the backward compatibility as well, but it needs other people's review as we do not have a regression test to check this.

  • Database class prints out the message when creating/opening a file.
  • BasisWriter class initializes/destroys the member Database* d_database and Database* d_snap_database at its initialization/destruction, respectively. This, however, does not involve a file creation/opening/close.
  • The putInteger operation in ~BasisWriter is moved into BasisWriter::writeBasis.
  • BasisWriter::writeBasis now closes the file at the end of the function call.

Removal of time interval

The concept of time interval is obsolete, and it hasn't been used in practice. This is removed, addressing #267 .
While the inside of all functions do not use the concept of time intervals, some function signatures remain in order to keep the backward compatibility:

  • BasisGenerator::takeSample used double time only for determining the time interval. Now time is a dummy variable and not used in the function.
    • In our practice, incremental svd's next sample is determined outside of this function.
  • Member functions of BasisReader used double time for determining the time interval, but now time is a dummy variable.
    • getSpatialBasis
    • getTemporalBasis
    • getSingularValues
    • getSnapshotMatrix

The concept of time interval is obsolete, and it hasn't been used in practice. This will be removed in future (#267 ), and for now we enforce the number of interval not to be more than 1.

  • Options class will return error if max_time_intervals > 1. This will be kept until we actually remove the time interval concept.
  • BasisWriter::writeBasis assumes to newly create the file every time it is called.
    • HDF5Database::putIntegerArray does not overwrite the dataset. If the dataset num_time_intervals is attempted to be re-written again, then an error will be returned.

@dreamer2368 dreamer2368 changed the title generator-fix2 BasisWriter::writeBasis closes the file at the function call Jan 17, 2024
@dreamer2368 dreamer2368 added the RFR Ready for review label Jan 17, 2024
@dreamer2368 dreamer2368 marked this pull request as ready for review January 17, 2024 02:24
@chldkdtn chldkdtn removed the RFR Ready for review label Feb 8, 2024
@dreamer2368
Copy link
Collaborator Author

@dylan-copeland ,
After looking throughout the code, I'd like to suggest that we work on a separate PR for removing the concept of time intervals.

Even though the multiple time intervals are never used in practice, num_time_intervals or d_num_intervals_written are widely used in many classes and dataset names in hdf5 format. Since this concept of time interval has been so engrained into the code, it will take quite some time to remove it while keeping the consistency of the code. We will also have to discuss how we change the format of dataset name, which will have quite a big impact in backward compatibility.

@dreamer2368
Copy link
Collaborator Author

@dylan-copeland , per our discussion, I added a commit that spits an error message whenever a user attempts to increase time interval, directly or indirectly through BasisGenerator or SVD class. Technically it still returns an error without this message, but this message will guide the user to increase the maximum number of samples.

Now unit_tests/test_SVD.cpp is removed from ci workflow. This test checks nothing but the concept of time interval, which is obsolete from now on.

I thought about the data format of basis classes, and ended up keeping the current implementation, allowing both csv and hdf5 format. Currently both data formats are compatible, and no reason to remove a functionality that we can support without any complication.

With that, I think this PR is ready for review and merge.

@dreamer2368
Copy link
Collaborator Author

@dylan-copeland ,

In my work for PR #274 , I ended up removing the concept of time interval. I moved those changes to here since it fits more to this PR.

While the inside of all functions do not use the concept of time intervals, some function signatures remain in order to keep the backward compatibility:

  • BasisGenerator::takeSample used double time only for determining the time interval. Now time is a dummy variable and not used in the function.
    • In our practice, incremental svd's next sample is determined outside of this function.
  • Member functions of BasisReader used double time for determining the time interval, but now time is a dummy variable.
    • getSpatialBasis
    • getTemporalBasis
    • getSingularValues
    • getSnapshotMatrix

Not exactly sure if we'd want to change these function signatures right away, which will impact all applications using libROM. Need more discussion about this @chldkdtn @ckendrick

@dreamer2368 dreamer2368 changed the title BasisWriter::writeBasis closes the file at the function call Removal of time interval and closing files within BasisWriter::writeBasis Feb 17, 2024
@dreamer2368
Copy link
Collaborator Author

Dataset names for basis are now updated, removing the trailing 6 digits of integer that used to indicate time interval.

Any out-dated snapshot/basis hdf5 files can be updated using scripts/data/update_basis_formats.py. The syntax for using this python script is:

python3 update_basis_format.py filename

This basis/snapshot format update is tested with unit_tests/test_basis_conversion.cpp. In .github/workflows/run_tests/action.yml, outdated basis/snapshot files are first converted with the python script, and checked if the BasisReader class can read the updated files.

@dreamer2368
Copy link
Collaborator Author

@ebchin ,

After some discussion throughout this PR, we decided to change the dataset names within snapshot/basis hdf5 files. I heard from @dylan-copeland that you might have data files for your production and your simulation can be impacted by this update. I prepared a python script scripts/data/update_basis_formats.py, which can update the dataset names of the out-dated hdf5 snapshot/basis files. This script is also tested in the ci workflow. Please feel free to bring up any suggestion/concerns.

@dreamer2368 dreamer2368 merged commit 23f9e83 into master Feb 29, 2024
4 checks passed
andersonw1 pushed a commit that referenced this pull request Apr 2, 2024
…asis (#261)

* BasisWriter::writeBasis create/open a file and closes the file at the end.

* stylization.

* HDFDatabase::putIntegerArray - overwrites if the dataset exists.

* enforce single time interval in Options.

* HDFDatabase::putIntegerArray does not allow overwrite.

* BasisWriter::writeBasis always create the file, which will overwrite the exisiting file.

* add a header and stylization.

* remove increase time interval test, as time interval is fixed to 1.

* add an error message for a guidance.

* remove test_SVD from ci workflow.

* SVD::increaseTimeInterval - allow the initial time interval.

* minor fix in test_IncrementalSVDBrand.

* reflecting the comments.

* removed the concept of time interval in BasisReader. time argument remains for backward compatibility.

* BasisWriter: removed the concept of time interval.

* minor fix in BasisReader.

* SVD: removed the concept of time intervals.

* BasisGenerator: removed the concept of time interval.

* add test_SVD.cpp for resolving conflict.

* stylization.

* changed function signature of BasisGenerator::takeSample.

* rebased to resolve conflict.

* changed function signature for BasisReader::getSpatialBasis.

* changed function signature for BasisReader::getTemporalBasis.

* changed function signature for BasisReader::getSingularValues.

* changed function signature for BasisReader::getSnapshotMatrix.

* removed Options::max_time_intervals

* SVD::isNewSample -> isFirstSample

* update comments for StaticSVD::isBasisCurrent

* minor comment updates.

* stylization.

* update dataset names and added python script for dataset name update.

* fixed the ci workflow

* minor fix in ci workflow.

* minor fix

* removed BasisReader:isNewBasis

* minor doxygen comment update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement RFR Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants