Skip to content

MOM6 repository policies

Alistair Adcroft (GFDL) edited this page Feb 10, 2015 · 11 revisions

MOM6 repositories policies (for GitHub and GitLab).

Rationale

MOM6 source code management (SCM) is conducted using git. git is a distributed SCM meaning that git does not require a single centralized server. For organizational and management purposes we will consider one particular repository and branch to be the canonical master but the distributed approach provides great flexibility in how we operate and collaborate.

There are initially two official repositories: one on github.com, the other at gitlab.gfdl.noaa.gov. Bear in mind that your working directory is also a repository in it’s own right. There will also be forks of repositories (snapshots of a repository) that become self contained repositories. Ultimately, there will also be a third official repository with public visibility on GitHub.

GitHub provides robust hosting with a modern web interface for collaboration, both with external developers and within a private project. GitLab is a free, open source, self-hosted alternative to GitHub which lives within the GFDL firewall.

Context

MOM6 relies on FMS software, and MOM6 configurations rely on other FMS components. The repository configuration is unique to MOM6 but our intent is to follow the FMS policies as much as possible. Since the MOM6 repositories are unique, there are some MOM6 specific policies and because git is new to FMS, these policies are a work-in-progress.

Terminology

A repository primarily contains files and their history. It may also contain images of other remote repositories. Your working directory is a repository and users could add your working directory as a remote in their own repository.

Cloning creates a local copy of a remote repository into a working directory (better thought of as a local working repository). It typically will be a complete copy including all the remote history up to the point when you cloned. Once cloned, your local repository is disconnected and needs to pull down updates from the parent to stay in sync.

Committing is the process of adding changes to the repository history. It is not file based but repository based. This distinction is powerful because it allows API changes (argument lists and calls) to be associated and consistent. Commits are always made to your local working repository.

Pushing is the process of sending your new history up to a parent (or other) remote repository. Until you push, all your commits are not shared with others using the same remote. Until you push, you can rearrange your commits, retroactively edit them and do other mischievous things. Once you have pushed your commits, there is no going back.

Fetching is the process of updating your copy of the remote repositories. This does not change your local repository.

Merging combines histories from two branches or repositories and is implemented in one of two ways: i) by stacking changes consecutively (known as a fast-forward), or ii) by creating a new merged state (a conventional merge).

Pulling is essentially a "fetch" followed by a "merge" for the current branch you are on.

Forking is a web-based analog of cloning. A fork is new repository that looks like a snapshot of the parent repository. There is a virtual connection between the two repositories but commits (pushes) are independent and forks thus are potentially static unless explicitly updated. Forks allow development by a user without interference with the parent repository. To work with a fork, the user must clone with the url of the fork, or add the fork as a second remote.

MOM6 repositories

MOM6 GitHub repository

The GitHub repository requires you to have a GitHub account. If you are a federal employee then you need to have applied for permission to have a GitHub account to use for government work. This repository is where the core developers will push their commits to share with each other. The front end is at: https://github.com/CommerceGov/NOAA-GFDL-MOM6.

MOM6 GitLab repository

The GitLab repository is accessible from within the GFDL firewall and through ssh tunnels or https from our HPCs. The GitLab repository is strictly read-only and mirrors the entirety of the GitHub MOM6 repository. The mirroring process occurs at 15 minutes intervals. Any GFDL user can use GitLab, can clone the GitLab repository and can create forks of the repository to which they can then write. The GitLab front end is at: http://gitlab.gfdl.noaa.gov.

MOM6 versus MOM6-examples repository

Everything that applies to the MOM6 repository also applies to the MOM6-examples repository. As of October 2014, the MOM6 repository exclusively contains source code while everything else, namely the configuration data and tools, reside in the MOM6-examples repository.

Branches on MOM6 repositories

These branch naming conventions apply to the MOM6 GitHub repository but reflect both i) the FMS policies (adopted on other repositories) and ii) a recommendation of best practices for personal repositories forked by users.

Branch name Purpose Comments
master Released code. City and public releases.
dev/master The effective master branch for developers. Requires complete compliance to commit rules (see section on dev/master).
dev/<project> A collaborative side-project branch. For large-scale changes that cannot continuously comply to the dev/master rules for commits. Project dependent restrictions on commits.
user/<abc>/<project> A personal branch belonging to user "abc". Used for experimental commits which might be dead-end. No restrictions.
public1 Publicly released code. This may lag the city releases if there is non-published work at risk of exposure.

1 new suggestion to better accommodate FMS policies.

A schematic shows how branches can be related.

Comment on the style of branch names

The a/b/c notation was adopted because it allows word separation for each of a, b or c. For example, user/aja/refactor_grid_module is easier to categorize than user_aja_refactor_grid_module. One side effect has been discovered that if a branch user/aja is created then user/aja/xyz has trouble being pushed. It appears that the branch name is being used for a hidden internal filename. In this example the file .git/refs/head/user/aja would exist and so the file .git/refs/head/user/aja/xyz can not be created since .git/refs/head/user/aja needs to be a directory but is already a file. While this side effect is potentially fragile, the readability and categorization of branches is sufficiently useful that we are keeping the notation. Cleanup simply requires deleting the branch.

Commit policies

All branches

There are some guidelines that apply to all branches.

  1. A commit should include all changes to all files associated with a particular update. This is important for keeping APIs consistent.

  2. Commits should not involve multiple unrelated issues at once. If you are making two unrelated changes, make two commits.

  3. Commits should be properly logged (see "Commit logging guidelines").

  4. If the commit is referring to an issue listed in the issue tracker (redmine or on github) use the # notation, such as "#101", “Fixes #43”, “resolves #2” or “closes #21”. See this stackoverflow question for help.

  5. Never rebase a branch that has been pushed. Once an upstream repository has your history, any changes to that history will be rejected.

Policies for dev/master

dev/master is the collaborative branch for developers. dev/master has strict guidelines for commits since any code/data/results on dev/master is considered sanctioned.

  1. At all times code compiles without errors using all sanctioned compilers (currently gnu, intel, PGI).

  2. At all times, all the regression tests pass with all the sanctioned compilers. Pass means that the checksums generated by running the code are the same as those committed to the repository.

  3. All commits are logged following the commit logging guidelines.

  4. If a commit [intentionally] changes answers in any test then put an asterisk (*) in the first character of the one-line summary. This helps identify answer changing commits when in forensic mode.

  5. No partial commits.

  6. Changes to checksums (timestats) must be scientifically justified.

Other branches

dev/<project>

dev/<project> is for collaborative development on a feature/project that cannot meet the strict rules of dev/master, such as being in a working state at all times. We typically might use such a branch for a large-scale refactor.

user/<abc>/<project>

user/<abc>/<project> is under user control and has no restrictions. It can be used for keeping personal versions of updates or for preparing updates to present back to dev/master. If you intend to maintain lots of, or long-lived, branches then you might be better served by a forked repository.

Commit logging guidelines

All commits require a text log. git allows short logs to be provided via a command line option but will otherwise invoke an editor (preferred) at which point you can create the log entry. The format of a log entry should follow:

One-line summarizing the commit in <=50 characters

Detailed explanation of the commit, rationale, issues addresses, etc.
goes after a blank line (THE BLANK LINE IS VERY IMPORTANT). Also:
 - It is helpful to use an ascii pseudo-formatting like
   this "-" notation.
 - There is no need to add user information or dates since that
   information is recorded by git for you.
Yes, that one-liner is exactly 50 characters long! Your editor will
probably indicate when you exceed the 50 characters. Subsequent text
is best word-wrapped at 72 or 80 characters.
Here is a ruler:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890

A discussion/recommendation about commit messages can be found here.

Useful summary line

It can be challenging to get enough information into the short summary line and so the 50 character limit is not a hard limit. Too little information is worse than not enough information. A summary line such as "Bug fix" is almost useless. "Bug fix in MOM_ALE.F90" is better and "Bug fix: uninitialized variable (dXdYdZ) in MOM_ALE.F90" is even better despite being 55 characters long.

Short-hand to indicate answer changes

To indicate when a commit changes answers (i.e. when a timestats.* file was updated) please insert an asterisk (*) to the summary line, e.g. "*Bug fix: uninitialized variable (dXdYdX) in MOM_ALE", which is 52 characters long. This helps us track experiment evolution. When ansers change, the commit message should list the experiments for which the answers changed.

Workflows

There are several flavors of workflow depending on access and role. Note that these use cases are written for a single repository for brevity. Since MOM6, SIS2 and MOM6-examples are spread out over linked repositories the actual commit procedure is a little longer than indicated here. See the work flow use cases on the MOM6-exmples wiki.

1. A core-developer working on GitHub

Core developers typically will be working on the dev/master collaborative branch. Working on dev/master requires they validate all the code with all the examples using all the compilers - it IS a lot of work. They also typically work out of one cloned repository, initialized as follows:

git clone [email protected]:CommerceGov/NOAA-GFDL-MOM6.git MOM6

Their workflow involves the following (nested) process:

  1. Develop code/inputs/regression checksums

    1. edit code

    2. compile with all compilers

    3. run all tests with all executables

    4. check all answers

THIS WHOLE BULLET IS VERY IMPORTANT, AND IS VERY TIME CONSUMING, AND IS MANDATORY.

  1. Commit updates (to working repository)

    1. git add <files>

    2. git commit (see the commit log guidelines)

  2. Fetch and pull from GitHub

    1. git fetch (this will indicate whether the upstream has newer updates)

    2. if any updates

      1. git pull (or git merge origin/dev/master)

      2. if any conflicts, resolve

      3. return to step 1

  3. Push updates to GitHub

    1. git push

This workflow requires very frequent synchronization otherwise the the merge/conflict resolution steps can take too long that the parent repository updates before you are ready to push.

Comment: The whole testing procedure takes approximately 10-15 minutes and involves running over a hundred tests with half a dozen executables. The core developers have each independently developed their own method of running these tests. It is not uncommon for the different methods to disagree, which has inevitably led us to uncovering subtle bugs. The excessive effort involved is more than we can expect of all contributors so we only insist on this rigorous testing when committing directly to dev/master. Other contributors should user the next workflow.

2. A developer working on GitHub via a branch

In the instance that a developer is not able to meet all the testing requirements of dev/master, they should create and work on a user branch. This involves creating a local branch in the form user/<abc>/<project>. For example, developer "wga" would use the following command:

git checkout dev/master (the new branch starts from dev/master)

git checkout -b user/wga/my_new_stuff

Thereafter "wga" can follow a less stringent development process. Once done he/she can commit and push as follows:

  1. git commit

  2. git push origin user/wga/my_new_stuff

The last command pushes the local branch to a new remote branch on GitHub. Presumably, "wga" will want their updates incorporated into the dev/master branch. The onus is on “wga” to minimize the work needed to merge their changes. If the have let their branch age for a long time they must first merge in the latest history from dev/master:

  1. git fetch origin

  2. git rebase origin/dev/master

Note that the rebase command attempts to replay your local commits on top of the end of dev/master. If it cannot do so it will fail, in which case you will have to do a merge instead:

 `git merge origin/dev/master`

Now commit and push again, and then ask the core developers to merge in your branch. The most expedient approach to notify the core developers is via a github pull request.

3. A developer working on GitHub via a fork

Not everyone will have write access to the GitHub MOM6 repository but may instead just have read access. In this instance they can use the fork process to submit updates. This can also be used when the developer does not want their branches to show up on the main repository while they are experimenting with updates. First the user forks the repository on the GitHub website and then:

git clone [email protected]:<ghuser>/NOAA-GFDL-MOM6.git MOM6

where is their GitHub user id (GitHub will provide the url and clone instruction on the website). The developer can now push and pull at will to their own repository. They can even work on dev/master (not recommended). However, their repository will not be up to date with the parent MOM6 repository. Updating and staying in sync is up to the user. To sync:

  1. git remote add parent [email protected]:CommerceGov/NOAA-GFDL-MOM6.git

  2. git fetch parent

  3. git rebase parent/dev/master

  4. git push (by default will push to origin = forked repository)

Here, parent is just a name that the user can choose. The user can then merge her up-to-date copy of dev/master onto her local branch that she was previously working on, just as if she were following the "A developer working on GitHub via a branch" workflow. Once pushed, she can invoke a "pull request" via the GitHub website.

4. A developer working on GitLab

For a developer without GitHub access, the analogous workflow to "A developer working on GitHub via a fork" can be followed except that the fork is made on GitLab. Sign-in to GitLab and then at http://gitlab.gfdl.noaa.gov/github_mirror/noaa-gfdl-mom6 click “Fork”. You will recieve an email saying you have been granted “master access” and a link to the repository. You can now clone with:

git clone [email protected]:<first.last>/noaa-gfdl-mom6.git

The process of development now follows that above except for the pull request. A pull request can be submitted via GitLab which will be emailed to the developers. However, it will be handled at the command-line and pushed via GitHub. The request will then be removed from GitLab once it is handled. It will suffice to email the developers with the GitLab branch to pull from.

5. An end user from within GFDL

An end user who is not making changes to the code or repository can clone using the http protocol from within the GFDL/HPC firewall. This is how the XML-based production-runs will operate:

git clone http://gitlab.gfdl.noaa.gov/github_mirror/noaa-gfdl-mom6.git

Pull requests

Submitting a GitHub pull request

Whether you are working on a branch or a fork, when you have code ready to submit to the core developers to merge onto dev/master you should make a "pull request". This is managed via the github website.

  1. Navigate to your branch (either via the branch tab or thepull down menu in the commits tab).

  2. Click the green icon, image alt text, near the top left, with a mouse-over that reads "Compare, review, create puul request".

  3. The next page shows you the change relative to where you start your branch on dev/master.

  4. Click image alt text.

  5. Fill out a descriptive but succinct title

  6. In the comment box, please summarize all the commits involved and explain or justify the code changes.

  7. Then click image alt text and you are done.

Handling a GitHub pull request (for core developers only)

This section assumes you are prepared to meet the dev/master requirements for testing, described in section .

Pull requests should be handled expediently to avoid stale code developing conflicts. Conflicts mean more work. Pull requests are sent out as notifications (emails and message on the website) but can be found in the right-side column of icons: image alt text. The workflow to handle a pull request is as follows:

  1. Assign the request to yourself by clicking "Assignee" and then your id.

  2. Click the blue words command line which will expand the commands you will use to obtain the code. Something like

git fetch origin
git checkout -b user/aja/stuff origin/user/aja/stuff
git merge dev/master

What this does is checkout the branch user/aja/stuff and then makes sure it is up to date with dev/master by merging in the latest code.

  1. Compile and run the tests as if you were testing a mod on dev/master.

  2. If everything passes muster, you should now merge back onto dev/master with:

git checkout dev/master
git merge --no-ff user/aja/stuff
  1. The last step is to push your changes to github. There is a choice here:

    1. You can simply issue git push origin dev/master. The pull request should now appear as "closed" on the web-site.
    2. OR, if there were no conflicts and the web-page has the icon image alt text you can complete the merge via the web. This latter option has one advantage which is that you can annotate the handling of the request ie. write comments whilst closing the request.

Tip

GitHub recommends the above approach but you can reduce the number of times you test a pull request by doing the following instead:

git fetch origin
git checkout -b user/aja/stuff origin/user/aja/stuff
git checkout dev/master
git merge --no-ff --no-commit user/aja/stuff

Now test, and if everything passes, finish up with git commit. If things fail, git reset HEAD will un-stage the merge and you can cleanup. This method is shorter if things work but longer if there is a problem with the code.

Handling a pull request from GitLab

To merge a branch or fork hosted on GitLab, a new remote should be added (once-only) to your working copy:

git remote add XYZ [email protected]:<user>/noaa-gfdl-mom7.git

Check out the specific branch with:

git checkout -b user/abc/stuff XYZ/user/abc/stuff
git merge dev/master

"XYZ" is an arbitrary label for the remote repository. Thereafter, proceed as for a GitHub pull request using the command line.