Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

AlexHentschel · 2024-10-31T22:22:39Z

This flip is tracked by issue #298.

Motivation

🚧

Context

This flip is a draft resulting from the Core Protocol Working Group meeting on Oct 31, 2024.
The original version was drafted in Notion: https://www.notion.so/flowfoundation/Core-Protocol-WG-Meeting-07-October-31-2024-1271aee1232480d0850eced4fafc1596?pvs=4

…ing it

AlexHentschel · 2024-10-31T22:24:09Z

P.S. As part of this PR, I have added the markdown [.md] extension to filenames that were previously missing it

protocol/20241031-execution-stack-versioning.md

turbolent

Great proposal! +100 on adopting this for the execution stack and Cadence.

protocol/20241031-execution-stack-versioning.md

turbolent · 2024-10-31T23:25:29Z

protocol/20241031-execution-stack-versioning.md

+
+- Dynamic Protocol State should ingest Version Beacon Service Event and track’s the Execution Stack’s Component Version
+
+![Illustration of Process](20241031-execution-stack-versioning/Execution_Stack_Versioning_(2).png)


There is a lot going on in this diagram. Maybe add a description of what is important to this proposal.

Regarding this interface: maybe make this more abstract and explain it. I had a hard time understanding this core piece of the proposal. What is a "KVStoreReader"? What is a "ViewBasedActivator"? Maybe improve that naming and make it less based on current implementation details (Go/flow-go).

Maybe also make it clearer that the idea is that for each component, there will be two functions:

A function returning the current required version for this component

A function returning future/upcoming versioning for this component

It is not very clear that the example shows the versioning of the component named "ExectionStack".

protocol/20241031-execution-stack-versioning.md

• adding sections `Objectives` and `Motivation` and `User Benefit` according to Flip template • adding short explanation of terminology (Version Beacon, HCU)

bluesign · 2024-11-04T17:12:43Z

hmm I think semver is not a good idea here, actually I think version alone is bad idea tbh. I think this should be version + an array of feature flags instead.

jordanschalm · 2024-11-04T17:13:30Z

protocol/20241031-execution-stack-versioning.md

+- In addition, we introduce compatibility requirement from semantic versioning:
+
+    $\textnormal{Component Version :} \quad \texttt{major}\,.\,\texttt{minor}$


I disagree with this mapping from semver to component versions.

In semver, only major version changes are breaking changes. All other changes must be backward-compatible.

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes

We need to coordinate a component version upgrade only when that component is upgraded in a backward-incompatible manner (major version increment in semver). Otherwise a rolling upgrade suffices.

We may choose to require coordination of a backward-compatible component version upgrade (minor version increment in semver). Maybe we want to coordinate the release of a feature at a specific time. But by doing so, we are turning a backward-compatible upgrade into a backward-incompatible upgrade. Which is fine, but now it is a major version upgrade in semver terms.

So, if some component is using semver to internally version itself, then only major version changes should correspond to component version increments.

We need to coordinate a component version upgrade only when that component is upgraded in a backward-incompatible manner (major version increment in semver). Otherwise a rolling upgrade suffices.

Maybe I missunderstand - added new section Discussion of Possible Versioning Schemes, which in part discusses your point if I understood it correctly.

In a nutshell, a pure feature addition is still something that needs to be coordinated, because we need to agree when the new feature becomes usable. Nevertheless, one can make an argument for differentiating between major breaking changes and pure feature adds I think. That's all that I want to say here.
Nevertheless, in the end, I agree with you that for many scenarios SemVer odes not make sense to me. I tried to explain that better in the new section Discussion of Possible Versioning Schemes. Please take a look. Curious about your thoughts.

protocol/20241031-execution-stack-versioning.md

turbolent · 2024-11-04T23:38:54Z

@bluesign the idea is to keep the solution/mechanism simple. you could imagine that a particular component could translate a version to feature flags internally, so there isn't really any reason to put features into the "component version" that is tracked/coordinated by the protocol.

turbolent · 2024-11-05T01:23:37Z

We've discussed this proposal in the Cadence team today. Some feedback based on questions to avoid confusion:
It might make sense to

explicitly state that it is not a requirement that a software version must support all previous component versions
add an example where a component version is supported by multiple software versions, to illustrate that e.g. any change that does not change the behaviour of execution (i.e. the result, for example a performance optimization), does not require a component version bump

AlexHentschel · 2024-11-05T05:31:41Z

@bluesign also in response to your suggestion

version alone is bad idea tbh. I think this should be version + an array of feature flags instead.

I have extended section Discussion of Possible Versioning Schemes. It discusses Feature Vectors for versioning in detail. I would agree that certain features we would want to repeatedly turn off and on at runtime and in those cases feature flag make sense.

In most cases we batch updates and for those I think an integer version would just be fine. So I agree with your general assessment of version + an array of feature flags instead. Though, at the moment I am not sure of any feature that we want to repeatedly turn on and off on mainnet. Hence, I'd start with the integer version alone.

bluesign · 2024-11-05T05:40:19Z

I would agree that certain features we would want to repeatedly turn off and on at runtime and in those cases feature flag make sense.

I was more of thinking of rollback scenarios without building new version. Also it can be helpful for stuff that deprecates. ( without releasing a new version ) Also it can make backwards guarantees easier, you can say that you guarantee component to be compatible with 1 previous version, and next one is previous + feature flags.

something like this:

1.0 
1.0 + [feature_A, feature_B]
1.1 ( includes feature_A, feature_B) + [feature_C]

AlexHentschel · 2024-11-05T19:51:54Z

protocol/20241031-execution-stack-versioning.md

+additional complexity of SemVer with the associated correctness risks (SemVer assumes downwards compatability by default, while maintaining downwards compatability 
+in the implementation is generally additional work, so the default assumption of compatability induces additional risks for the happy path of block execution - no a good tradeoff in my opinion).
+
+### Versioning Scheme based on Feature Vectors


@bluesign I wanted to consolidate the conversation around feature flags into one thread here, which we can possibly resolve after we reach alignment and transferred the conclusions into the flip. This also allows other contributors to comment on the ideas without different conversations interleaving.

@bluesign's initial comment: hmm I think semver is not a good idea here, actually I think version alone is bad idea tbh. I think this should be version + an array of feature flags instead.

@AlexHentschel reply: I have extended section Discussion of Possible Versioning Schemes. It discusses Feature Vectors for versioning in detail. I would agree that certain features we would want to repeatedly turn off and on at runtime and in those cases feature flag make sense.

In most cases we batch updates and for those I think an integer version would just be fine. So I agree with your general assessment of version + an array of feature flags instead. Though, at the moment I am not sure of any feature that we want to repeatedly turn on and off on mainnet. Hence, I'd start with the integer version alone.

@bluesign's reply: I was more of thinking of rollback scenarios without building new version. Also it can be helpful for stuff that deprecates. ( without releasing a new version ) Also it can make backwards guarantees easier, you can say that you guarantee component to be compatible with 1 previous version, and next one is previous + feature flags.

something like this:

1.0 1.0 + [feature_A, feature_B] 1.1 ( includes feature_A, feature_B) + [feature_C]

Thanks for the clarification @bluesign. It makes sense to me to introduce a feature flag to potentially roll back one specific feature that we are worried about. However, we could also just roll back to the prior version if the software supports that:

• version 6 • version 7 (extends version 1.0 by adding features A and B) ⋮ • version 6: we learn feature A is bugged and roll back to version 6 • version 8: extends version 1.0 by adding features A and a fixed version of feature B)

I would guess we share the opinion that many different could potentially be useful for different scenarios. Though, I have been asked repeatedly for "best practise guidelines" on what versioning scheme should be picked when -- which I am struggling with because I also don't think I have the necessary experience to give properly educated suggestions.
That's mainly the reason why I tend to come back to the questions:

Can we not express the same evolutionary path of the protocol by using solely integer versions?

And what is the strong benefit of using a more sophisticated versioning scheme (here SemVer + feature flags)? Sure, it can be useful in specific cases, though a more sophisticated versioning schemes also introduces more intellectual and implementation complexity.

I agree that integer versioning is a comparatively simplistic tool. Though it is also extremely general as it makes effectively no assumptions about usage patterns.

Interesting, I had not realized that this mechanism would support rollbacks 🤔 For whatever reason I assumed that versions would have to increment, even if an higher version effectively might just have the same behaviour as an earlier one, i.e. I had assumed that "version 6: we learn feature A is bugged and roll back to version 6" would rather be a "version 7: we learn feature A is bugged and roll back to the behaviour of version 6, features A and B are disabled"

Can we not express the same evolutionary path of the protocol by using solely integer versions?
[...] what is the strong benefit of using a more sophisticated versioning scheme

That was mostly the realization in prior discussions: A simple integer is sufficient (e.g. the notion of a supported component version range can be implemented in the component implementation, like software version 2 supports component version range [1,2]) and any benefits (which?) are likely not worth the complexity

thanks @AlexHentschel , unfortunately I lack the experience in distributed protocols area, but from my experience in mobile game publishing, where deploying a new version is a bit delayed where many users are effected, it is always best to think with "bad version release" in mind.

There are few parameters to consider I guess

do we need rollback vs new version release ? (btw I was thinking like @turbolent when I commented, component rollback will be a new version )

I don't know the cost of releasing new version to network, I assume ( from execution context only ) it involves updating and restarting the nodes. This can be covered with version ranges for sure, but I think there would need some guarantees on at least supporting 1 previous version in case things go wrong.

what is the component support range should be?

I think one version before is a must, just in case of a failure. version N supporting N-1 is probably enough. ( at least for security patches )

For bigger batches ( features bundled in a version ) it was the origin of the version + flags idea, as all those are parallel to previous version, linear rollback does not let us to remove one feature without removing another. ( not sure how important is this one also to the business case here )

It is really tricky subject, I don't think there is single best practice to follow, it all depends on targets and trade offs.

always best to think with "bad version release" in mind

💯 agree. Especially for high-assurance systems such as flow, it's super important to have a fallback plan, because every software change requires time-intensive coordination and if mainnet were to break it would take on the order of hours to coordinate with a supermajority of node operators to deploy a fix. We thoroughly test, but of course that is not a guarantee that the software behaves as intended.

I really appreciate this discussion, because it helps find a good balance: we want to make our toolset better, but it doesn't have to be perfect. Nevertheless, the improvement of our tooling need to be significant enough to warrant the respective investment of engineering resources. In the context of this Flip, the goal is to extend our toolset for deploying software upgrades and addressing problems with upgrades:

I personally think that already Integer Version via the Dynamic Protocol State for some (few) software components would be a significant step forward. We are going to learn a lot about for which upgrade scenarios our new approach is working well and for which scenarios there are challenges. Furthermore, integer versioning is simple and the most general, so it makes sense to me to start with that.

Nevertheless, @bluesign is raising very good points: What mechanisms do we need in addition, for integer versioning to be a significant improvement? Paraphrasing (my understanding) of @bluesign's previous points:

(i) Should we recommend that software supports version N plus N-1, to keep the option for quick rollback without requiring time-intensive software upgrades on supermajority of nodes?
(ii) How should be represent rollback of new version in the Component Versioning Scheme?
(iii) In case of releasing a bigger batch of features, do we want the ability to roll back select features without needing to reverse the entire upgrade as a whole?

(Hope I understood correctly 😅, don't hesitate to jump in @bluesign)

For the points (i)-(iii), the software implementation as well as the component versioning scheme have to support it.

regarding (i):
Whether we can make the implementation work (one software supporting component version [N-1, N]) will strongly depend on the feature we want to upgrade/add and the component it lives in. We should recommend support for component versions [N-1, N] but not depend on it always being possible. I feel in many cases, it will be possible for the implementation to at least support component versions [N-1, N]. So for this flip, I think we should consider this scenarios explicitly.

regarding (ii): (rollback of new version as a whole)
I think for this scenario, I think integer versioning is sufficient. Lets walk over the two important sub-cases:

The software only supports version N, but some time after deployment we discover there are problems with the new behaviour and we need to roll back. We need to deploy a new software to a majority of nodes, because the only behaviour the software supports is the bricked behaviour. Then, it doesn't really matter, whether we increment or decrement the version number for the following reason:

Example: Lets say we just deployed component version N=6 and now find out there are severe problems. If we switch back to exactly the same behaviour of version 5, we could decrement the component version from 6 to 5 in the protocol state. The nodes with bricked behaviour will halt at the switchover point, the node operators would have to deploy the version-5-software and restart the nodes, which then proceed from the switchover point. Nevertheless, there is also no strict reason why two component specifications couldn't behave identically. In other words, we could specify version 7 to behave exactly as 5 and then we could increase the version number from the bricked version 6 to 7. The deployment process for the node software would be exactly the same.

The other scenario is that our software supports component versions [N-1, N]; we just switched to version N=6 and now find out about the problems. But differently to the prior case, our software still supports the old component version 5. Then we could quickly switch back to version 5 via a governance transaction without needing to touch the software on the nodes. In contrast, if we hypothetically chose to increase the version to 7 to remove the bricked behaviour, we would need new software. So decrementing the version number is clearly advantageous in my opinion.

Hence, we conclude: decrementing the component version is generally suitable to rollback an upgrade as a whole. There are exceptions, but I would recommend this approach as the default.

regarding (iii): (rollback of new selective features of a new version)
So we have seen that the Integer Versioning Scheme works well for rolling back to a prior version (abandoning the new version as a whole). However, it doesn't allow us to roll back selective features (arbitrary combinations) that were all deployed as one new version. I am not sure how prominent this scenario will be - my gut feeling is that we need this fine-grained control only in a minority of cases. Hence, I am inclined to stick with Integer Versioning Scheme and think about optional extensions for this edge-case scenario (👇 proposal below).

Here is my proposal: "Integer Versioning with Variable Feature Vector Extension"

The following could be the format of a version

type ExtendedIntegerVersion struct { N uint64 FeatureControls []byte // by default empty or nil }

With empty FeatureControls, this collapses to the pure Integer Versioning Scheme.

FeatureControls is a binary vector, which is solely interpreted by the software. There is one and only one way how to interpret the FeatureControls, which is technically part of the component behaviour specification. So software that understands component version N can decode and interpret FeatureControls for exactly that version. In the most simplistic form, every bit could correspond to a particular feature flag for the component. Though, we can also put a lot more complex structures in this FeatureControls slice. FeatureControls should be small in size, that's the only limitation. But otherwise, it is essentially a completely generic binary blob, which carries a version-specific configuration for the component.

…rt all previous component versions.

AlexHentschel · 2024-11-05T23:22:27Z

@turbolent, thanks for discussing the flip with the Cadence team and providing a summary:

• explicitly state that it is not a requirement that a software version must support all previous component versions
• add an example where a component version is supported by multiple software versions, to illustrate that e.g. any change that does not change the behaviour of execution (i.e. the result, for example a performance optimization), does not require a component version bump

I have extended the section Relationships between Software and Component Version and added explanations of both points. Thanks

Alexander Hentschel added 2 commits October 31, 2024 15:16

added markdown [.md] extension to filenames that were previously miss…

f688d0d

…ing it

initial draft of version beacon flip

c609359

github-actions bot assigned AlexHentschel Oct 31, 2024

AlexHentschel changed the title ~~[Flip ] Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack)~~ [Flip 296] Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) Oct 31, 2024

AlexHentschel commented Oct 31, 2024

View reviewed changes

protocol/20241031-execution-stack-versioning.md Outdated Show resolved Hide resolved

Alexander Hentschel added 4 commits October 31, 2024 15:29

added Flip number

4b40ac4

cleanup

744cda2

extension

4c37f09

fixed typos

3a19aaa

AlexHentschel marked this pull request as ready for review October 31, 2024 22:40

turbolent reviewed Oct 31, 2024

View reviewed changes

AlexHentschel mentioned this pull request Nov 1, 2024

Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #298

Open

AlexHentschel changed the title ~~[Flip 296] Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack)~~ Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) Nov 1, 2024

turbolent added the flip: protocol Protocol FLIP label Nov 1, 2024

FLIP 298: addressing comments

fd9c54d

• adding sections `Objectives` and `Motivation` and `User Benefit` according to Flip template • adding short explanation of terminology (Version Beacon, HCU)

AlexHentschel force-pushed the version-beacon-flip_initial-draft branch from b5f8620 to fd9c54d Compare November 1, 2024 23:42

jordanschalm reviewed Nov 4, 2024

View reviewed changes

FLIP 298: incorporating thoughts/ideas from comments

1a424d2

extended discussion of versioning

3bf9ad0

addressing feedback

f491240

AlexHentschel commented Nov 5, 2024

View reviewed changes

included Jordan's comment about SemVer being more a hinderance than help

cba3277

AlexHentschel force-pushed the version-beacon-flip_initial-draft branch from 97f19b6 to cba3277 Compare November 5, 2024 21:09

extended Guidelines on component versioning

51adc73

Alexander Hentschel added 4 commits November 5, 2024 14:34

included additional suggestions for clarifications

240b378

added that It is not a requirement that a software version must suppo…

f192cb5

…rt all previous component versions.

addressing feedback

33e0303

fixed typo

62967bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

AlexHentschel commented Oct 31, 2024 •

edited

Loading

AlexHentschel commented Oct 31, 2024

turbolent left a comment

turbolent Oct 31, 2024

bluesign commented Nov 4, 2024

jordanschalm Nov 4, 2024

AlexHentschel Nov 5, 2024 •

edited

Loading

turbolent commented Nov 4, 2024

turbolent commented Nov 5, 2024

AlexHentschel commented Nov 5, 2024

bluesign commented Nov 5, 2024 •

edited

Loading

AlexHentschel Nov 5, 2024

AlexHentschel Nov 5, 2024 •

edited

Loading

turbolent Nov 5, 2024

turbolent Nov 5, 2024 •

edited

Loading

bluesign Nov 7, 2024

AlexHentschel Nov 8, 2024 •

edited

Loading

AlexHentschel commented Nov 5, 2024


		- Dynamic Protocol State should ingest Version Beacon Service Event and track’s the Execution Stack’s Component Version

		![Illustration of Process](20241031-execution-stack-versioning/Execution_Stack_Versioning_(2).png)

		- In addition, we introduce compatibility requirement from semantic versioning:

		$\textnormal{Component Version :} \quad \texttt{major}\,.\,\texttt{minor}$

Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

Are you sure you want to change the base?

Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

Conversation

AlexHentschel commented Oct 31, 2024 • edited Loading

Motivation

Context

AlexHentschel commented Oct 31, 2024

turbolent left a comment

Choose a reason for hiding this comment

turbolent Oct 31, 2024

Choose a reason for hiding this comment

bluesign commented Nov 4, 2024

jordanschalm Nov 4, 2024

Choose a reason for hiding this comment

AlexHentschel Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

turbolent commented Nov 4, 2024

turbolent commented Nov 5, 2024

AlexHentschel commented Nov 5, 2024

bluesign commented Nov 5, 2024 • edited Loading

AlexHentschel Nov 5, 2024

Choose a reason for hiding this comment

AlexHentschel Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

turbolent Nov 5, 2024

Choose a reason for hiding this comment

turbolent Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

bluesign Nov 7, 2024

Choose a reason for hiding this comment

AlexHentschel Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Here is my proposal: "Integer Versioning with Variable Feature Vector Extension"

AlexHentschel commented Nov 5, 2024

AlexHentschel commented Oct 31, 2024 •

edited

Loading

AlexHentschel Nov 5, 2024 •

edited

Loading

bluesign commented Nov 5, 2024 •

edited

Loading

AlexHentschel Nov 5, 2024 •

edited

Loading

turbolent Nov 5, 2024 •

edited

Loading

AlexHentschel Nov 8, 2024 •

edited

Loading