-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should there be a PrePARE testing suite? #491
Comments
@doutriaux1 Those tests seem to be only testing the CMOR Python interface and CMIP6 tables. I was thinking of tests for the PrePARE.py script. |
Yes, this is a good idea. Not sure I will have time soon to think about this. |
@taylor13 @durack1 @sashakames I would like to get back to this since there are many issues posted related to PrePARE (#532, #533, #534, #540, #541, #553). It would be helpful to have continuous integration tests for PrePARE for the changes that will be added to it. The I think we should make a directory of small NetCDF files that have flaws that should be caught by PrePARE as well as some that should pass. There should be a script that will run PrePARE tests and capture the stdout and stderr output to see if it matches what we expected. Any suggestions are welcomed. |
@mauzey1 thanks again for raising this, a test suite is a great idea. To be honest, we have a huge multi-PB archive of CMIP6 files mounted on the css03 hardware so coming up with a very comprehensive test suite wouldn't be an issue (we have every pathology you have ever thought of in the ~1 million files). I suppose CircleCI runs on remote systems right, so we can't mount direct? |
Running PrePARE on a million files (if that is what is suggested), especially the very large files in the CMIP6 archive, would seem to be inefficient, and perhaps not even practical. One way to make incremental progress would be to design a test each time we add or modify a PrePARE check to determine whether it actually catches the and correctly describes any problem. You may have a more ambitious (and comprehensive) testing strategy in mind, which I'd be happy to discuss next week. I'm not sure the test suite should necessarily hold up moving forward on the issues mentioned above. |
Agreed that a million files is too many. Though a limit of 100 representative files is reasonable for a test suite that we expect to run repeatedly and also others can easily run to verify that their installation is working properly. For starters the files are here, but we could provide a script to allow others to download. As an aside wrt testing, I've run with 10000s of files via the publisher and could easily continue that. |
I'm not sure the number of files matters, especially if they are all QC'd files in the CMIP6 archive. How would such files test PrePARE's ability to identify non-compliant files? I think we need to specially construct non-compliant files and then see if PrePARE finds them and provides helpful error messages to users. |
Sorry folks I should have been far clearer. I was not suggesting creating a test suite to use a million files, rather I was suggesting that we select a subset of these files that capture known pathologies and then build the test suite on this comprehensive pathology archive. If we encounter new pathologies increment the test suite by one or more. Also for the purposes of the test suite, we could temporally subset the test files to include a single time step, reducing storage footprints and file copy/read times |
Given that many files already published contain various pathologies (due to several causes), I agree with Paul that we can and should source these from the archive. The related issues are mainly (1) "soft" false negatives - warnings that should be errors (return a -1 or False) (2) poor error messages that leave the user scratching their head. When these are fixed. |
There are currently tests for the C, Fortran, and Python CMOR API but no tests for PrePARE. This would be helpful in making sure PrePARE is functioning as intended after making changes.
We could have a list of sample NetCDF files, or generate test data with CMOR. Any suggestions?
The text was updated successfully, but these errors were encountered: