Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start doing quarterly(-ish) versioned releases #890

Open
edwardhartnett opened this issue Nov 23, 2024 · 9 comments
Open

start doing quarterly(-ish) versioned releases #890

edwardhartnett opened this issue Nov 23, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@edwardhartnett
Copy link
Contributor

Description

Like other software packages, fv3atm should have versioned releases.

Solution

We need to pick a version number to start with, and do a release. Then we need to do quarterly releases thereafter.

Alternatives

The alternative is what we are now doing: unlabeled, undocumented releases via submodule hash.

Testing:

No testing needed to do the first release. In future releases, unit testing will be added.

Dependent PRs:

Required to support documentation and testing efforts.

@edwardhartnett edwardhartnett added the enhancement New feature or request label Nov 23, 2024
@DusanJovic-NOAA
Copy link
Collaborator

fv3atm currently has versioned releases based on which production system it is used for (for example GFS) or which public release (ufs-wm or ufs-srw) etc. Who is going to use these quarterly releases? And how? Did anyone request these quarterly releases (NCO or ufs-community)? If so, then they should tell us what version numbering scheme they want to use.

Are submodules used by fv3atm going to make release with exactly the same version, with correct commit hashes (atmos_cubed_sphere, ccpp-physics, ccpp-framework and upp)? We must update the corresponding submodules before making a release/tag. Which means we must coordinate that release with code managers of those projects.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Nov 27, 2024

This is part of adding documentation and unit testing to fv3atm. Regular and frequent releases are an agile practice.

Is there some reason that fv3atm cannot have a release, with version numbers, like all other software packages do?

We do not have to update anything before this release, the point is to get a release before we start doing any testing.

Each component should and must have it's own versioning. Just like netcdf-c, HDF5, in fact, every software package you've ever heard of. We have this right now, but we use a hash instead of a version number, and we don't document the release. Instead, we will document a release each time we need to move the ufs_weather_model hash of fv3atm.

Yes, releases must be coordinated by code managers. That is the job of code manager. If doing a release takes more than 5 minutes, the code manager is doing it wrong. I will give a presentation on the agile release process...

Perhaps this confusion stems from the fact that you are used to a lot of manual or system-level testing before a release. Instead, we will do a release without any such testing. However, the next release after this will start to include unit tests.

Eventually we will be releasing fully-tested releases, and then we will run those tests on spack install, so spack and the EMC/NCO install team will be using these releases to work out that solution. If we convince the UFS steering committee not to use submodules, but instead to use libraries, we will be ready to transition fv3atm to a library instead of a subcomponent.

@DusanJovic-NOAA
Copy link
Collaborator

Instead, we will document a release each time we need to move the ufs_weather_model hash of fv3atm.

We 'move the ufs_weather_model hash of fv3atm' every time we make a commit to fv3atm, which is sometimes multiple times a week. Are you suggesting that we make a release with new version number multiple times a week. That could be hundreds of releases a year? What's the purpose of all these releases when ufs-weather-model must still know exact hash for it's fv3atm submodule.

@DusanJovic-NOAA
Copy link
Collaborator

If we convince the UFS steering committee not to use submodules, but instead to use libraries

Why would we do that? That will mean that any change in any of the subcomponents will require a new library build. Who is going to do all those library builds? How? Where are we going to store all these library rebuilds? How is ufs-weather-model going to 'find' all those libraries?

@WenMeng-NOAA
Copy link
Contributor

UPP was previously configured as a library for inline post in the UFS-Weather-Model. The library team required quarterly releases for UPP library installation, which significantly slowed down UPP development to support GFS, RRFS, GEFS, and HAFS, AQM implementations—until Dusan updated the configuration to make UPP a submodule of fv3atm.

@BrianCurtis-NOAA
Copy link
Collaborator

I will put my support behind the move towards the agile development. It's what everyone else does, so we should be there as well.

To answer the question on the builds, I think that process is (can be) mostly if not completely automated through CI/CD which Github has capabilities for. For example, once the button is clicked to release then a bunch of actions start working on providing artifacts from the build for the public to use and saves it in github (please correct me if I'm wrong).

@edwardhartnett After a "first release" what pieces/methodologies would need to be in place? How would the CM workflow need to change towards Agile. Maybe painting the picture a bit might help?

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Nov 27, 2024

Wow, great discussion.

@BrianCurtis-NOAA you're correct, I have not fully explained this issue. We have a meeting to discuss this topic week after next; I will add you to the invitation list. I have a slide which explains the release process and why we do it. (And we can meet in person, since I will be in Collage Park for the meeting.)

@WenMeng-NOAA I would not expect improvement from making untested code into a library. Without the ability to test, the library brings little benefit. This is why all modern libraries use unit testing. However, it's not clear to me how doing releases can slow other development - a release takes only a few minutes to do. How did this slow other development? Manual testing?

Also, there was no time at which the libraries team required quarterly releases. UPP has only done two versioned releases, 10.0.0 and 11.0.0 in May and June of 2022.

@DusanJovic-NOAA if we move the hash every week, then indeed we should not do new versions every time. I presented about spack and how the UFS will be installed on WCOSS2, no longer with a set of manually executed git commands, but with spack. So spack will install all libraries, as well as the model and all submodules.

@DusanJovic-NOAA
Copy link
Collaborator

I still struggle to understand what specific problems removing git submodules and replacing them with externally built libraries for all components is supposed to solve. In my opinion, taking that approach would likely just increase the overall complexity of the project, slow down the development process, and make the program more difficult to build than is necessary. I'm not convinced that the potential benefits, if there are any, would outweigh the drawbacks of such a significant architectural change.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Nov 30, 2024

Specific problems with the submodules:

  1. They don't install, so they don't leave any info for packages that use them (ex. HDF5. when installed, leaves a lot of useful information for netcdf-c).
  2. All files in the repo end up on WCOSS2, even test code and other files we don't want or need there. (ex. when we install HDF5, we get only the files we want to install, everything else is deleted when the build is complete. This uses less space, fewer inodes, and leaves behind a much less complex set of files on WCOSS2.).
  3. Submodules do not adjust to the changing dependency landscape, for example, netcdf-c will build with all versions of HDF5, but for some earlier versions, some features are not available. At build time, netcdf-c can determine this and adjust its functioning. fv3atm cannot adjust its functioning based on UPP capabilities, because UPP is never properly installed, so it does not leave behind any information, and fv3atm is never properly installed, so it does not have an opportunity to use that information and adjust to the dependency.
  4. Submodiles resist unit testing - why can't fv3atm have unit tests? Because it is a submodule. (Until Alex severely hacked the build to allow it.)
  5. Submodules force all debugging to be the most expensive - all problems have to be debugged at the system level, the most expensive form of debugging we know. If there's a problem with zlib, we don't attempt to debug it by running netcdf-fortran programs, even though that would be possible, because it would be much more expensive.
  6. I cannot easily test with multiple versions of a submodule, but I can do so with trivial ease with libraries. fv3atm should be testing against the last released version of UPP, and the develop branch of UPP, so that backward compatibility is maintained, and bugs are caught at the earliest possible moment, which is much cheaper. But we cannot do that with submodules. We do this with all the NCEPLIBS that depend on each other - g2 builds with the last two released versions of ip, plus the ip develop branch. It helps a lot.
  7. When we install the ufs-weather-system on WCOSS2, we would like to run tests of each component after install (but before installing the rest of the system.) This is the cheapest way for EMC and NCO to do things, and it is the standard practice. With this method, we find problems at the cheapest level. But submodules prevent this. (For example, if there is a problem with HDF5 on WCOSS2, we find that before downloading and building netcdf-c, netcdf-fortran, or the ufs-weather-model. If HDF5 tests fail, we do not have to debug the whole ufs-weather-model to find that the problem is in HDF5.)
  8. Submodules create a highly coupled system, which is more expensive (see https://www.geeksforgeeks.org/software-engineering-coupling-and-cohesion/) For example, fv5atm cannot be unit tested without hackery because it is tightly coupled to another repo. Unlike libraries, which are loosely coupled. (Ex. we can easily build and test any library in isolation from other code, except its dependencies).
  9. Contributors have to work with the whole ufs-weather-model to work on any part of it.
  10. Even when (and if) we get unit testing for submodule components, maintaining test ordering at the top-level project for every model which uses the component is extra work and risk eliminated by libraries. Running unit tests for all components from the top-level will mean we may sit through 30 minutes of HPC testing when debugging a problem in some of the later testing. Libraries can be installed and tested without the entire codebase.

NOAA has invented a way of distributing software with submodules. It's what everyone is used to, so hard to change, but it less productive than what everyone else does. Submodules are not well supported by cmake and spack, which provide few tools and functions to deal with them, but copious free functionality for libraries.

NOAA should not be originating new methods of software distribution. We should be using commercial off-the-shelf tools for free, instead of rolling our own. This is not science, it is mere engineering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants