Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

Branch Naming Conventions and Standardization of Workflows #170

Open
Wackerbarth opened this issue Jan 15, 2019 · 33 comments
Open

Branch Naming Conventions and Standardization of Workflows #170

Wackerbarth opened this issue Jan 15, 2019 · 33 comments

Comments

@Wackerbarth
Copy link
Contributor

As a part of an effort to standardize the workflow and convert version generation to an automated scheme derived from the git repository, I propose the following:

(1) All of our components (Redeem, Toggle, Umikaze, etc.) will migrate to the same scheme and workflow.

(2) The name of the branch which is checked out for the build will be a part of the version string when the head commit is not directly tagged.

(3) For this purpose, the following patterns are reserved:
(a) number, number, "x" (eg. "2.2.x") -- Reserved for STABLE releases and backported patches. These branches are updated only by "release engineering" after verifying that the resulting change incorporates a properly tested change. Typically, this will be accomplished by a "fast forward" merge of a "gold master".
(b) "GM", "RC", "b", "a" -- These branches are curated changes leading to a particular release.
(c) "dev" -- This is the base upon which new work will be introduced. It is a very transient branch which will be rebased frequently to reflect corrections applied to the previous category.

(4) Unreserved branch names will identify the feature they are adding, the issue that they address, etc. (eg. "auth", "pr44", "issue_179", "this_is_valid_but_likely_a_poor_choice"). These are "transient" branches for the presentation of proposed contributions.

In keeping with this scheme, I suggest that we immediately replace any "develop" branch with an appropriate "dev" branch (the shorter name is used because it will appear in many build version tags). All new PRs would be applied to these branches. Existing open PRs will be handled on an individual basis. If there is little "history" of importance to a particular one, the easiest process might be to simply close the existing PR with a notation that it has been superseded by a new one.

Your discussion is encouraged. However, unless there is significant objection to creating a "dev" branch to replace existing development branches, we can implement that change while the exact workflow remains under discussion.

Any objections to switching the branch name to "dev"?

@eliasbakken
Copy link
Contributor

Using "dev" instead of "develop" sounds like a good idea.
Are you thinking we keep separate branches for all stable releases? Would it not be better to use "release" for releases?
What are you imagining the role of Master to be? Most Git repos seem to use "Master" as the main tip though, are you sure we should not adopt this naming strategy instead to keep with what is the de-factor standard?

@Wackerbarth
Copy link
Contributor Author

I anticipate that, in the future, we will support multiple "stable" branches. You cannot use any single name when you have more than one such version in the same repo. And using separate repos never worked in the past. Even if the content was the same, or at least compatible, everyone seemed to want a single repo as the "point of contact" for the multiple versions of the project. Projects that use "master" are an artifact of the default setup which dates to the idea that there would be only one supported version of any project. Github now allows us to designate the default branch. It need not be called "master". But, as you note, many projects have only one version and use the default "master". I would suggest that we also have a branch that is called "master". However, this branch would contain NO actual code. And it would not be the default either. Instead, it would serve as the README which would explain our branch naming and direct individuals to the current branch appropriate for their use. It could be either a single file or a set of documentation about the project. Exactly how we use it is another discussion.

@Wackerbarth
Copy link
Contributor Author

The problem that I have with having a "master" is that it doesn't reflect any organization of the code. It's like keeping all of your receipts in a shoebox, sorted neither by topic nor accounting period. Yes, it is all there, but everything gets all jumbled together. That makes it very difficult to have a coherent thread to follow.

@goeland86
Copy link
Member

@Wackerbarth I disagree. master has a role to play in that it should always represent the latest stable released code available.

As to what @eliasbakken was saying regarding a single release branch, well, it really depends how many stable releases we end up supporting. At the moment the point is moot as there is no way to do an OTA update of Redeem without user intervention. Until such time as we have that mechanism, we'll never have more than a single main line of code to support.

@goeland86
Copy link
Member

Specifically you keep thinking of us needing to keep a clean audit trail of code.
We do not need one. And you seem intent on not understanding that the organization you're proposing to get rid of master is contrary to even what the Linux kernel developers do. They created git based on the workflow they wanted for it. And master is very much the default and used actively.

Your notion of audit trail makes no sense with the multitude of tools available within Git to track branch merge history in a graphical and reasonable way.

I'm starting to get really annoyed by your insistence to make us use a branching scheme so outside the norm for Git-based projects as to be unusable by anyone who hasn't had an in-depth introduction to your ill-notioned schema of branching. There's a reason everyone uses master. That you don't grasp it or recognize its importance to collaborative workflow is not sound reasoning for us to disuse the concept of master entirely.

dev should be the feature-integration branch, and a single RC or feature-freeze branch to bugfix the code before it gets released.

On release, the RC or feature-freeze branch gets merged into master. At the same time, a new branch from after that merge is created to track a stable version 2.2.x for instance. And release tags are created on the release-version branch. This is how not only other open-source projects operate (see OctoPrint, Linux kernel, etc.), but also many companies. I see no reason to go against the flow in the name of a cleaner "code audit trail" that nobody who wants to look at it would have any difficulty understanding the common workflow processes of using a 3rd-generation version control system.

@Wackerbarth
Copy link
Contributor Author

Still think that we will need to support multiple versions of Redeem in a manner for which a single "master" does not fit. In particular, at least for the short term, the "product" for Revolve will not be just a configuration option in a single unified product. However it will share much of the code with the Replicape version. But while those two products exist, each will have all of the needs for its own repository branch functions. Thus, there are now two distinct "master" branches, etc. I don't see how you will handle that without splitting the repository or using a differing spelling in the namespace. That is, effectively, what we are trying to do with the n.n.x branches. They serve as the "master" branch for a particular supported product. In many respects, it is advantageous to "spin off" portions of the functionality into separate libraries because that allows each to be more focused and each then has its own private repository structure.

@goeland86
Copy link
Member

No, I disagree on this. Specifically the Revolve support is still experimental at the moment even in its own branch. However we will want to merge all the hardware support into a single repository, and differentiate within the code the various hardware support functions, as detected from the EEPROM readouts.

Ergo we keep the single master concept, even if we will have a dev-revolve branch (which, in my company nomenclature is a feature development branch to then be merged into dev, instead of having free-form naming). And having legacy support for the older branches does not preclude from having master as the latest-release code ready to run in that instance.

@Wackerbarth
Copy link
Contributor Author

"Your notion of audit trail makes no sense with the multitude of tools available within Git to track branch merge history in a graphical and reasonable way." -- Then explain why I had to create items (PR#130 is just one example) to correct the dropping of code that was obscured in a merge. If you adopt the practice of "squashing" everything when you promote it up the ladder, then "throw it in a shoebox" is workable because of the limited scope while it is under development. But merging threads in a non-linear fashion creates a hopeless mess. Can't you realize the "commits" are to the project as lines of code are to the module. They are just more complex "instructions". And there was good reason why the best practices got away from the "GOTO" statement. Commit "merge" is just the dual of GO_TO. It is a "COME_FROM" scheme.

@goeland86
Copy link
Member

That indicates a problem in the executor, not the concept. Again. I ask you to look at how every major git project handles their code. None of them that I'm aware of, shy away from git merges. Bad merges happen, yes. And they may have to be reverted or corrected after the fact. Such is a developer's life. But you're attempting to paint us in a corner use-case which has no reason to exist. I refuse to create extra branches & audit trail fluff when it is not warranted. Unless someone else comes out in support of your philosophy (and nobody to date has that I have seen), that indicates to me that your scheme is not the way forward for this community.

@Wackerbarth
Copy link
Contributor Author

I have no problem with the way you describe dev-revolve IF you can assure that the timelines are such that we will not ever need to distribute more than one version (beyond development testing). However, I fear that such is rarely the case. The marketing pressures usually create a situation where "we can ship the product with that dev version" override the necessity to fully integrate it with the "master" version.

@goeland86
Copy link
Member

My last discussion with @eliasbakken about the Revolve involved back of the envelope calculation on the time required to ship the boards with software flashed on it. Eventually the time required for flashing would delay shipping by such a long time that the Revolve software will be shipped separately and the user will have to assume responsibility for installing it on their boards. Further, the ease of flashing via USB key (and of reliable USB hubs or USB extension cables) means that even if someone has buried their Revolve in their printer, updating will be a 20-25 minute procedure at most. So if we have to issue a number of corrective releases, it will not be a deal-breaker.

@Wackerbarth
Copy link
Contributor Author

"That indicates a problem in the executor, not the concept" -- That is why you NEED to be able to audit in a structured manner. That is the purpose in having a "curator" in the process. By that, you create a "clean" set of commits that lead to exactly the same final result. That cleaner up version is then something that can be audited.

@goeland86
Copy link
Member

This is why we have an integration branch (dev) to make sure the code is actually functional and the merges were correct. When we have functional code in dev, it gets squash-merged into RC or feature-freeze for beta-testing and bugfixing of underlying issues. After RC is stable, you merge it to master and the appropriate release-branch, with a version tag.

The curation happens in dev - a structured audit does not matter if the end-result functions as needed without regresssions. Again, I think you're letting your previous personal experience cloud your vision of how open-source software development is usually done.

@Wackerbarth
Copy link
Contributor Author

I'm glad that the thinking is moving toward the type of updates that you describe. It will also help the process of updating the Replicape versions because the same technique, but perhaps with different code image files, should work there also. But, in my mind, this doesn't change the situation that I reference about "marketing". You cannot ship any product without workable, but not necessarily perfect, software. And whatever we ship becomes a "product" that we have support until we can deprecate it.

@Wackerbarth
Copy link
Contributor Author

If you reduce things to "Squash-merge", then you are, in effect insisting that each "unit" is sufficiently small that there is no advantage in keeping any of the details about the sub-parts of the entity.

In a large project, I have yet to see where having identifiable commits doesn't aid in understanding the whole.

@goeland86
Copy link
Member

If you reduce things to "Squash-merge", then you are, in effect insisting that each "unit" is sufficiently small that there is no advantage in keeping any of the details about the sub-parts of the entity.

In a large project, I have yet to see where having identifiable commits doesn't aid in understanding the whole.

Then that implies a lack of experience of what large projects truly are. My day job involves many more modules & classes than what is in Redeem (factor 5 to 10-fold of objects, closer to 20-25 for lines of code), and that large of a project has never required me to go digging through the commit log more than once or twice in the last year to figure out which side of a local merge was the one to keep. On something as small, code-wise, as Redeem, with a smaller development team, I don't expect this to be a problem.

@Wackerbarth
Copy link
Contributor Author

And I suspect that you have little experience with programs that are long-lived and need to incorporate "new hires". In my experience, and Redeem is certainly no exception, there is a definite shortage of higher level documentation that provides a correct description of how the parts fit together to provide a working system. Sure, there is documentation of the calling sequence for a particular API endpoint. But what is usually lacking is the description that shows how you combine calls to create a functioning behavior. The "learning curve" is immense. And being able to recognize and comprehend it in small pieces is greatly aided when you have more than "here it is". Properly configured commits that encapsulate a particular functionality's introduction, or modification, are a partial substitute for the missing documentation and unit tests.

I also come from an environment where things were "mission critical" and, because of the high cost, or impossibility, of making a correction after launch, multiple individuals needed to review, understand, and sign off on each, and every, line of code. There, the ability to audit was essential.

I also find it quite helpful when I need to go back to code that I wrote years ago.

@goeland86
Copy link
Member

Well we'll just agree to disagree with regards to the onboarding of new hires.

And we're not "mission critical" as we can issue updates, and we have in-depth testing phases allowing us to discover the most significant bugs up front.

I agree there's a lack of documentation for higher level object architecture, which is one of the main reasons my dabbling in the Redeem codebase is so light. However to say that a good commit history would help me figure it out is like saying that leaving the windows open during winter will warm up my garden. While in theory true, the amount of benefit to be had for most folks is nigh non-existent.

I will insist that a good Github repository maintains an up to date and running version of its code in master.

@Wackerbarth
Copy link
Contributor Author

Wackerbarth commented Jan 16, 2019 via email

@Wackerbarth
Copy link
Contributor Author

The argument against having just a single "master" branch.

(1) It only allows for a single "flavor" of the project.
In my experience, "marketing" ends up requiring the distribution
of specialized adaptations of the same general product. Whether permanent,
or not, each of those should be developed in a style consistent with there
being a "master" branch that represents that exact product.
Unless they are in separate repositories, the branch namespace
provides for only one branch actually named "master". And one branch cannot
reflect multiple "products".

(2) You do not want to force the end user to "change trains" just because
a "new and improved" line becomes the standard product. That change
should be voluntary until the earlier branch is no longer supported.
So, in that context, it makes no sense for anyone to follow a unified "master"
since it would, on occasion, incorporate incompatible changes.
From the first release (or even before), an enduring (stable) branch name
needs to serve as the master for that release sequence. That is why I suggest
that each minor version should have its own master name.
By the version numbering convention, we are required to provide backward
compatibility within that branch, but not when we switch to a new version-master branch.

(3) In general, new features are not based on the "stable" version of a release branch.
They need to be based on the version that already includes those features which have
been accepted for inclusion in the next (less stable) release. The only PR submissions that
are appropriate for the master branch are the few that represent an actual formal release.
ALL others need to first pass through a less development branch.

@goeland86
Copy link
Member

The argument against having just a single "master" branch.

(1) It only allows for a single "flavor" of the project.
In my experience, "marketing" ends up requiring the distribution
of specialized adaptations of the same general product. Whether permanent,
or not, each of those should be developed in a style consistent with there
being a "master" branch that represents that exact product.
Unless they are in separate repositories, the branch namespace
provides for only one branch actually named "master". And one branch cannot
reflect multiple "products".

In our case, "marketing" is really only @eliasbakken, and as far as I understand it, we're not making customized versions of the software. If that happens, I suspect it will be in private forks that will not impact this community-based release.

(2) You do not want to force the end user to "change trains" just because
a "new and improved" line becomes the standard product. That change
should be voluntary until the earlier branch is no longer supported.
So, in that context, it makes no sense for anyone to follow a unified "master"
since it would, on occasion, incorporate incompatible changes.
From the first release (or even before), an enduring (stable) branch name
needs to serve as the master for that release sequence. That is why I suggest
that each minor version should have its own master name.
By the version numbering convention, we are required to provide backward
compatibility within that branch, but not when we switch to a new version-master branch.

We are not forcing users to do anything. We should, however, provide tools to aid in the forward migration of configuration files, as well as document properly what expected syntax/values are valid for each release version. Your objection doesn't make sense in that "following" doesn't mean anything to most people who want to just update their installations with a plugin click, which will execute migration scripts. We should be able to generate warnings or errors during the update if the migration scripts for configs fails to execute properly. This is an invalid argument from my standpoint. When you have a project that releases new versions, you always have the code for that latest version on the master branch. See https://github.com/balena-io/etcher for instance. Or any other popular project shipping binaries.

It doesn't matter that you have legacy versions elsewhere. Those are in separate branches - hence our proposal for the distinction between 2.2.x and 2.3.x as release branches. But those are maintenance branches for a reason - they're there so we can backport bugfixes into them if needed. If we instead approach a cycle where we say "ok, well, we just released 2.2.0, the 2.1.x series is no longer supported starting in 3 months", then you only have to maintain and backport fixes to the 2.1.x branch for 3 months. Then you can archive it and let it become part of history. But the point is that master will be pointing at the latest accepted and valid features that will make up the 2.3.x release.

(3) In general, new features are not based on the "stable" version of a release branch.
They need to be based on the version that already includes those features which have
been accepted for inclusion in the next (less stable) release. The only PR submissions that
are appropriate for the master branch are the few that represent an actual formal release.
ALL others need to first pass through a less development branch.

Yes, you're right. This is why we have the PRs for incoming new features on the dev branch, where we'll have time to fix merge conflicts, address the worst of the bugs, and have a few fearless people test that code on their printers to identify flaws. When it's reliable, that same feature branch is then merged into master. And releases are tagged on master. When we create a new major release (i.e. from 2.2.x to 2.3.x or 2.x.y to 3.x.y) that is when we create a new maintenance release branch. Otherwise all minor releases for the current latest code (i.e. 2.2.0 to 2.2.5) will all be tagged and done on master. Before the code to be tagged as 2.3.0 is merged, a 2.2.x branch is made from the latest master code for maintenance purposes only, knowing it will become deprecated as soon as support for it ends.

@goeland86
Copy link
Member

Further on (3):
Most feature PRs should actually be based on the latest stable code. As mentioned before, if something is a major breaking change in the code, master will inherit those changes before the release is actually made. And therefore carry forward all of the features that were merged into it previously (seeing as dev will always be ahead of master, we always have a 'sandbox' to test those breaking changes while keeping intact all previous features).
PRs comparing against dev will obviously get a chance to pull the breaking changes before it's released into master, thus assuring us of a smooth development process moving forward.

@eliasbakken
Copy link
Contributor

Guys, let's not spend hours discussing this. If all other Git projects can manage with a master, we should certainly be able to too. I think this was the document I was basing the initial develop workflow on, it seems like a good starting point. https://nvie.com/posts/a-successful-git-branching-model/

@ThatWileyGuy
Copy link
Member

To be honest, I wish I'd seen that blog post when I started contributing back in the day. That makes a lot of the old structure make a lot more sense. Intuitively, I was creating feature branches for PR to develop anyway, though, so I suppose it wouldn't have changed my behavior much.

I'm very much in favor of that as a structure - it makes sense, it keeps code moving along, and it makes possibly-unstable features more available to the users that want to seek them out.

@Wackerbarth
Copy link
Contributor Author

TIme will tell. The way things are going, I will not be surprised if there is a desire to ship Revolve with a version of the software that works, but is not fully integrated with the Replicape version. Again, your example of "etcher" is in a different category because there is only one "product" and there is nothing to carry over when upgrading. Our "Umikaze" build tool would fall into the same category. We have no need for multiple "current" versions of it because, even if it is generating different target images, the build procedure is the same.

Now, the procedure that you describe for "master" about 2.2 and 2.3 is NOT the same as you claimed previously. If "master" is always a stable release, it cannot contain any of the 2.3 branch until 2.3 is declared to be stable. So it would be the same as 2.2.x. And my point is that the user would be tracking 2.2.x and not master because that is the the name that remains constant when 2.3 becomes the current version. And, contrary to your description, the 2.3.x branch needs to exist at least from the time the first copy "leaks" from "engineering", certainly well before we are ready to declare it the stable release.

@Wackerbarth
Copy link
Contributor Author

OK, guys. Explain to me just how this is really going to work. Practical example -- Toggle.
We have a branch 1.3.x which is of (at least) RC quality and will soon be a part of our next release.
@goeland86 has some authorization changes on which he is working. Since they are not presently a working feature, they go on (for example) his auth branch which could be an extension of the present head. When he gets them working, he would submit them as a PR against the development branch for the next 1.3 release. So far, OK. Someone else may have another feature. It might become ready for inclusion before the auth branch. So the auth branch would get rebased/merged onto the end of that.

Now, I have a different set of changes, a conversion to python3. So, I would have my P3 branch and work on it. Now, the problem. -- What development branch do I submit these changes against? Eventually, we might manage to fold all of these together. But, until we do, they are not compatible. How will it all fold into just a single master?

@goeland86
Copy link
Member

@Wackerbarth so in the scheme outlined by the blog post linked above, the auth branch needs to be PRed against dev, as will your P3 branch.

If we were to be working on the intelligent-agent repository instead of our own forks, my auth work would be in dev-auth for instance, and yours in dev-P3. Both of those would PR against dev. But due to the order of merges, if dev-auth merges into dev before dev-P3, you would have to bring in the relative changes to the authentication from dev-auth into your dev-P3 branch for the dev branch to be consistent after we merge dev-P3 into it.

Now what would be more likely to happen is that after dev-auth gets merged into dev, it's tested, then released - so dev merges into 1.3.x, which then merges into master, where the 1.3.1 tag gets created. (We still need to merge 1.3.x into and then create the 1.3.0 tag on master btw).

When 1.3.1 is released, the changes it brings are already in dev - meaning dev-P3 can import those changes from "upstream" (it's the wrong word but I don't have a better one). dev-P3 is then up to date with dev and can be merged in.

We could also merge dev-auth and dev-P3 into dev without creating the 1.3.1 release first - so that 1.3.1 doesn't exist anymore - we'd be creating 1.4.0, as the change to Python 3 is a breaking one.

Does this clarify the workflow? May be easier to explain around a white board however.

@Wackerbarth
Copy link
Contributor Author

@goeland86 - Perhaps a whiteboard would help. -- I'm not worried about the particular names of any feature branches. Those are all transitory. What we must have enduring are master branches for the 1.3 series (which continues to its end-of-life even after the 1.4 and 2.0 series get created) and additional separate master branches for 1.4 and 2.0 ... . I presume that you will create those as you decide that the series currently claiming your master is deposed by a newer "preferred" one.

But, doing it that way, (1) You have no way to support the master history of a branch before it ascends to the throne and (2) no way to allow continued improvements without FIRST unifying the branches.

I guess I might explain it by relating to politics. In effect, you are recording the history of the OFFICE and not the OFFICE-HOLDERS. Thus, taking the US President as an example, your "master" is "the president".
And when you spin off your "stable" branches, each of them is a "past-president". But you have no way to properly record the events in the life of individuals while they were "congressmen" or "candidates" except as they might be found in the "public records" of all individuals. Thus when a new president is sworn in, he arrives with a wife. But when he married her is only "before becoming president".

On the other hand, what I propose maintains the complete life history of each individual as a single thread, before, during, and after he holds the office.

I contend that we need to apply all of the protocols for development in the same manner whether our "product" is "Life in the White House" or "Life of Mr. xxxx"

@goeland86
Copy link
Member

Umm. I guess your analogy makes sense as to what we're accomplishing.

What I don't understand is why we would track every single individual? This makes absolutely no logical sense to me in that setting.

Not to mention - you have the release branches to keep as they were when released. And the tag in master. So there's absolutely no need I can fathom to have multiple masters.

Especially if you were to use merge instead of rebase, the commits have the same hash across branches and it then becomes a whole lot easier to trace what happened where and when.

@goeland86
Copy link
Member

What I think you're misunderstanding is that 1.3.x for instance, is the master branch at the point in time when it was created and tagged. And the release 1.3.x branch will have tags for 1.3.0, 1.3.1, 1.3.2 etc. So in essence you are keeping track of all minor releases for every major release you're doing... That's the whole point of the scheme outlined in the blog post @eliasbakken linked.

@Wackerbarth
Copy link
Contributor Author

OK, we seem to agree that 1.3.x is the name of a "master" branch. As such, only releasable branches ever get merged into it. But, the question is "when does it get created?" Does it exist before it is released for the first time, at the time of its first release, or only when it becomes "past", but still supported?

@Wackerbarth
Copy link
Contributor Author

If I use merge instead of rebase then all of the past becomes just "public record". It requires a lot of digging to uncover the past. It's the difference in "a dump" and "a museum". In the former, the trash pickers can easily find "the surface". But, to an archeologist, that is just a small part of the story.

A major reason to have multiple masters is that you never know which candidate is going to be the ultimate winner. (Or, in reality, that there are simultaneous viable paths developing at the same time.)

For example, we now have a situation where Toggle works with OP 1.3.9 or, in a restricted way, with OP 1.3.10. So, defacto, we have two "Umikaze" builds that are viable. In one, the user may not care about the restrictions and in the other the user does care. Now, we need to be working on things other than unifying this conflict. We cannot really afford to "wait and see" which one, or when, one will become "the winner".
But we should be maintaining that "master" workflow for whichever one ends up winning. The only way I see to do so is to treat each as having its own master. And pieces of one don't get mixed into the other until there is an intentional migration to improve the unification.

@goeland86
Copy link
Member

A major reason to have multiple masters is that you never know which candidate is going to be the ultimate winner. (Or, in reality, that there are simultaneous viable paths developing at the same time.)

Yes, you do. The branch with the superseding version is the winner. For this repo. The only reason you keep the older ones is to demonstrate that an alternative architecture/change wasn't chosen for a reason, and you keep it documented. It also lets you look for potential issues in old code. But it should have no other purpose than archival.

And as to the parallel you're trying to draw now between OP 1.3.10 and 1.3.9 compatibility, I would argue that no. The current 1.3.x released code of Toggle is the release version 1.3.0. One of the restrictions of that release is its limited compatibility features with OP 1.3.10+. The release 1.3.1 should remove that restriction.

The master branch will document both 1.3.0 and 1.3.1 in sequence, and there's no need for a rebase of the fix coming from the auth branch onto 1.3.0 to create 1.3.1 - a simple merge will suffice. We have only a single master in that regard. I don't quite understand how you see that we should have 2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants