Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Software Development BP with changes from recent hack #39

Merged
merged 4 commits into from
Apr 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 150 additions & 79 deletions SoftwareDevelopmentBestPractices.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,114 +2,185 @@

## Executive Summary

This document has three parts:
* **Minimum best practices** are the minimum standards that all codes hosted by or distributed by CIG are expected to meet.
* **Standard best practices** are the suite of standards CIG-supported codes should be following. If the codes fall short of these standards, they should have a plan of active development to achieve this level.
* **Target best practices** should be considered in the development plan for CIG-supported codes under active development.
This document describes best practices that software that is part of the CIG collection must meet:

* **Minimum Best Practices** are the minimum that we expect all software to meet.
* **Standard Best Practices** are the suite of standards CIG software should be following. If the software falls short of these standards, developers should have a plan of active development to achieve this level.
* **Target Best Practices** should be considered in the development plan for software under active development.

A sample repository that demonstrates these best practices can be found in the CIG software template repository (https://github.com/geodynamics/software_template).

## Minimum Best Practices
*Practices that codes must follow in order to be accepted by CIG.*

1. **Licensing**
1. Use an [OSI](https://opensource.org/licenses) open source license such as GPL, MIT, BSD.

1. Use an Open Source Initiative [https://opensource.org/license](https://opensource.org/license) open source license.
<details>
<summary>Examples</summary>
GPL, MIT, BSD
</details>

2. **Version control**
1. Use of version control to manage code changes.
2. Use a public repository that is accessible without registration, such as GitHub, GitLab.
1. Use version control to manage code changes.
2. Use a public repository that is accessible without registration.
<details>
<summary>Examples</summary>
GitHub, GitLab
</details>
3. Obtain persistent identifiers for each named version of the software such as releases.

3. **Portability, configuration, and building**
1. Code builds on Unix-like machines (Linux, Darwin) with free tools (compilers)
2. Well designed, portable build system (e.g. cmake, make, configure-unix only, setup.py, etc.)
1. Ensure that the code builds on Unix-like machines (Linux, macOS) with only free tools.
2. Use a well designed, portable build system.
<details>
<summary>Examples</summary>
cmake, make, autotools (Unix only), setup.py
</details>

4. **Testing**
1. Code includes tests to verify that it runs properly.
2. Results of accuracy and/or performance benchmarks if established by the research community.
1. The software includes tests to verify that it runs properly.
2. The software reports the results of accuracy and/or established community performance benchmarks.
5. **Documentation**
1. Describe the research problem the software is designed to address. Discuss any limitations.
2. Instructions for building and installing the code.
3. Description of all parameters including units (if dimensionless, specify scaling used).
4. Explanation of physics the code simulates.
5. Illustration of how to use the code to solve scientific problems.
6. Cookbook examples with sample, editable input files.
7. Documentation is provided online or offline.
6. **Citation*
1. A citable publication.
2. A persistent identifier to an archived version of the software.
1. Describe the research problem the software is designed to address. Discuss significant limitations.
2. Provide instructions for building and installing the software.
3. Describe all parameters including units. If dimensionless, specify the scaling used.
4. Explain the physics the software simulates.
5. Illustrate how to use the software to solve scientific problems with a few cookbook examples that have sample, editable input files.
6. Provide documentation online or offline.
7. Include how to cite the software (see also 6 below.).
6. **Citable publication**
1. Provide a citable publication.
7. **Support**
1. Clearly indicate if the software is actively supported and if so, how to report issues and get help.

1. Clearly indicate if the software is actively supported and if so, how to report issues, contribute modifications and get help.
<details>
<summary>Example</summary>
provide a CONTRIBUTING.md document
</details>

## Standard Best Practices

*Practices in addition to the Minimum Best Practices that should be used by all codes developed within the CIG community. Codes not meeting all standards should be actively working to eliminate deficiencies.*
*Practices in addition to the above* Minimum Best Practices *that should be used by all software developed within the CIG community. Software not meeting all standards should be actively working to eliminate deficiencies.*

1. **Version control**
1. Differentiation between maintenance (bug fixes) and new development.
2. Source tree limited to files necessary to build software and documentation and run small verification tests.
1. Limit source tree to files necessary to build software and documentation, and run verification tests.
2. In each release, include release notes distinguishing between significant changes, new features and bugfixes.
<details>
<summary>Example</summary>
use a changelog following https://keepachangelog.com/en/1.1.0/
</details>

2. **Coding**
1. User-friendly specification of parameters (e.g., graphical user interfaces, human readable parameter files) outside of source code/specified at run time
2. Development plan (updated yearly) with prioritization of new features and estimated timetable for their implementation
3. Comments in code describing:
1. Algorithms with appropriate references
2. Purpose of functions, objects, etc. and descriptions of arguments (inputs / outputs), groups of objects, and tutorials
4. Modular design
1. Balance use of external libraries (e.g., PETSc) to maximize reuse while minimizing dependencies and maintenance
2. Allow users to extend code (new features or alternative implementations) without destroying original functionality or modifying main branch.
_:TODO:_ Example in GitHub illustrating various ways to implement “plugins”
1. Wolfgang (ASPECT method)
2. Jed (PETSc method)
3. Brad (Pyre method)
5. Error trapping
1. User errors should result in a message that helps the user correct the problem. User errors should not result in a crash.
2. Internal errors are generally bugs (or unintended uses). Ideally consistency checks will catch internal errors and generate an error message that helps the developer fix the problem.
6. Scalability
1. Use of distributed/parallel data structures
2. Messages used to transfer information (e.g., MPI) instead of filesystem
1. Use user-friendly specification of parameters outside of source code. Parameters should be specified at runtime, not at compile time.
<details>
<summary>Examples</summary>
graphical user interfaces, human readable parameter files
</details>
2. Provide a development plan, updated yearly, with prioritization of new features and an estimated timetable for their implementation.
3. Use comments in the software that describe the following:
1. Algorithms with appropriate references.
2. Purpose of functions, objects, etc. and descriptions of arguments (inputs / outputs), and groups of objects.
4. Strive for a modular design:
1. Balance the use of external libraries to maximize reuse while minimizing dependencies and maintenance.
<details>
<summary>Examples</summary>
make use of PETSc, deal.II
</details>
2. Allow users to extend the code with new features or alternative implementations without destroying original functionality or modifying the main branch.
5. Use error trapping strategies:
1. User errors should result in a message that helps the user correct the problem. User errors should not result in crashes without error messages.
2. Internal errors are generally bugs or unintended uses. Use consistency checks to catch internal errors which generate error messages that help the developer fix the problem.
6. Aim for Scalability:
1. Use distributed/parallel data structures.
2. Use messages to transfer information between processes instead of the filesystem.
<details>
<summary>Example</summary>
MPI
</details>
3. **Portability, configuration, and building**
1. Verification that dependencies are available and usable.
2. Automation and portability of configuration and building.
3. Output all configuration and build options during runtime (e.g. commit id, compiler options, checksum) to facilitate reproducibility.
1. Let the build system verify that dependencies are available and usable.
2. Use an automated and portable configuration and build system.
3. Output all configuration and build options during runtime to facilitate reproducibility.
<details>
<summary>Examples</summary>
commit id, compiler options, checksum
</details>
4. **Testing**
1. Code includes pass/fail tests that verify it runs properly.
2. Development pipeline uses continuous integration (CI) to automate running tests.
1. Include pass/fail tests that verify that the software runs properly.
2. Create a development pipeline that uses continuous integration (CI) to automate running tests.
<details>
<summary>Examples</summary>
GitLab pipelines, GitHub workflows, Azure pipelines, Jenkins
</details>
5. **Documentation**
1. User documentation: workflow for research use.
2. Developer documentation: description of how to extend code in anticipated ways.
3. Documentation is provided in dynamic form (for example, html generated by Sphinx) and available offline (which can be in a static form such as a PDF file).
1. Provide user documentation that describes workflows for research use.
2. Provide developer documentation that explains how to extend the code in anticipated ways.
3. Provide documentation in dynamic form and available offline.
<details>
<summary>Example</summary>
Sphinx combined with a PDF file
</details>
4. Illustrate how to use the software to solve major scientific use cases with cookbook examples that have sample, editable input files.
5. List authors and contributors.
<details>
<summary>Example</summary>
include a CITATION.cff
</details>
6. **User workflow**
1. Running different simulations does not require rebuilding.
2. User specified directories and filenames for input and output.
3. Use of standard binary file formats (e.g., NetCDF, HDF5).
4. Citation for code version.
1. Ensure that running different simulations does not require rebuilding.
2. Ensure that the code uses user specified directories and filenames for input and output.
3. Use standard binary file formats.
<details>
<summary>Examples</summary>
NetCDF, HDF5, VTK
</details>

## Target Best Practices

*Desirable practices that developers should consider in defining long-term development priorities for codes developed within the CIG community. These go beyond theStandard Best Practices”, so they tend to be more important for long-term projects.*
*Practices in addition to the above* Standard Best Practices *that describe and define long-term development priorities for software developed within the CIG community. These go beyond the* Standard Best Practices *and are important for long-term projects.*

1. **Version control**
1. New features added in separate branches.
2. Stable development (main) branches for rapid release of new features.
1. Add new features in separate branches.
2. Use a stable development (or main) branch for rapid release of new features.
2. **Coding**
Standard Best Practices plus
1. Functionality implemented as a library rather than an application
1. Implement functionality as a library rather than an application.
1. Leverage alternative implementations via plugins.
2. Extend library features in applications without modifying original code.
3. Construct higher level applications using libraries as building blocks.
2. Output of provenance information (parameters used).
3. Scalability
1. Parallel access to inputs and outputs (e.g., HDF5).
4. Checkpointing.
2. Construct higher level applications using libraries as building blocks.
2. Output provenance information (such as parameters used).
3. Strive for scalability.
1. Use parallel access to inputs and outputs.
<details>
<summary>Example</summary>
HDF5
</details>
4. Implement checkpointing and restart capability.
3. **Portability, configuration, and building**
Standard Best Practices plus
1. User can select compilers, optimization, additional build flags during configuration without modifying files under version control.
2. Permit multiple builds using the same source tree.
3. Software can be installed to a central location.
1. Let users select compilers, optimization and additional build flags during configuration without modifying files under version control.
2. Permit multiple builds using the same source tree.
3. Ensure software can be installed to a central location.
4. Make software available as a package and/or containerized application that does not require manual build steps.
5. Provide executable software via an online portal.
<details>
<summary>Examples</summary>
Jupyter servers, online software gateways
</details>
4. **Testing**
1. Pass/fail unit testing for code verification at a fine grain level.
2. Method of Manufactured Solutions for code verification at a coarse grain level.
3. Use code coverage tools (for example, python-coverage and gcov) to assess gaps in test coverage.
1. Provide pass/fail unit testing for software verification at a fine grain level.
2. Use the Method of Manufactured Solutions for software verification at a coarse grain level.
3. Use code coverage tools to assess gaps in test coverage.
<details>
<summary>Examples</summary>
python-coverage and gcov
</details>
5. **Documentation**
Standard Best Practices plus
1. Guidelines on parameter scales/combinations for which code is designed/tested.
2. FAQs or knowledge base.
1. Include guidelines on parameter scales/combinations for which software is designed/tested.
2. Provide a list of publications that cite or use the software.
<details>
<summary>Examples</summary>
link to the citations tracked by CIG and/or by the project
</details>
3. List ORCIDs for each author and contributor and encourage all contributors to add their ORCID ID to their GitHub profile.
4. Create a wiki, FAQ, or knowledge base that provides answers to common questions.
5. Provide guidance on archiving model data for publishing.
6. **User workflow**
Standard Best Practices plus
1. Reproducibility via archiving of workflow.
1. Allow for reproducibility via archiving of workflows.