-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coordinated upgrades aka hardforks #665
Conversation
Need to rebase and play with some related changes, but PR desc is fine and code is cleaned up. |
if err != nil { | ||
return txRes(spend, codeForEngineError(err), err) | ||
return transactions.CodeUnknownError, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lost a few codeForEngineError(err)
calls in the rebase. Fixing.
|
||
d.identifier, err = ident.Identifier(tx.Signature.Type, tx.Sender) | ||
if err != nil { | ||
return transactions.CodeUnknownError, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place to return a better code. This is 100% a problem with either the sig type or the sender/pubkey, and code should indicate.
extensions/consensus/consensus.go
Outdated
// EncodingTypeCustom offset should be used as the first possible external | ||
// codec's type to leave space for more kwild canonical codecs in the | ||
// future. Choose an encoding type that does not collide with other codecs. | ||
Encoders []*serialize.Codec // unimplemented! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused field. Should we maybe comment this out for now?
// ParamsUpdates are updates to the consensus engine parameters that should | ||
// go into affect at after the activation height (returned to the consensus | ||
// engine when finalizing the block at this height). | ||
ParamsUpdates *ParamUpdates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit funny, simply because the entire point of consensus params is that they can evolve using pure consensus, rather than a fork, right? This doesn't necessarily mean we have to get rid of this, just feels maybe a bit off. I just always imagined we would actually allow consensus param updates via vote store, as this seems like the point of consensus params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit funny, simply because the entire point of consensus params is that they can evolve using pure consensus, rather than a fork, right?
These are the same concepts. Consensus is the logic that the validators (block producers and approvers) abide by being the same, not conflicting, at any given height. Voting is one way to compute the parameter updates fed to cometbft in FinalizeBlock
. When you say "pure consensus", you are referring to governance-based consensus. That's a great feature to have, but it's not what defines consensus. IMO, coordinating the updates to the parameters used by the consensus engine at any given block is how we prevent consensus failure, but as long as it is coordinated, consensus is maintained. Upgrade by height X or be forced off the network is the most basic form of coordinating rule changes.
The basic ability to seamlessly change rules simply by a consensus among validators (real world, not on-chain governance) deciding to update their genesis configurations with the agreed upon activation height is the most fundamental form of consensus updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand on that, almost any of these rule changes can be done with the voting systems triggering resolutions to put them into affect, but we've focused almost exclusively on federated use cases, which makes this traditional height based activation pretty appropriate.
Also, if you consider doing this through resolutions, the nodes still have to upgrade their software to include the definition of the resolutions even if they want to vote against the resolution, so that approach to forward compatibility is extremely limited.
extensions/consensus/consensus.go
Outdated
// StateMod is triggered at activation. It can do anything, one time. For | ||
// instance, arbitrary change to application state via the Engine or more | ||
// directly to the DB may be made at the end of block at the activation | ||
// height. This is to be called inside the outer transaction of activation | ||
// block, so changes to date state are captured in the normal apphash diff. | ||
// This is a reasonable capability for a hardfork to make state changes | ||
// outside of transaction execution, but most such changes can probably be | ||
// achieved through the resolution system and voting. Doing it in a hardfork | ||
// would be needed if there are other changes (either baked in to kwild or | ||
// in the above fields) that should be done in concert with the fork. | ||
StateMod func(context.Context, *common.App) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give a concrete instance of this, there was an etherum hard fork "tangerine whistle" that included in its' changes clearing of empty accounts to eliminate state bloat caused by an attack. Rule changes were complementary to the state modification. TheDAO hard fork also included state modifications to rectify issues.
Still working through it, but a few higher-level questions / considerations: GremlinCan you explain this? I'm gathering that it seems like it is meant to be included as an unactivateable fork for all networks- if this is the case, what is its purpose? ExtensionsI really like the upgradeability of resolution extensions. I was wondering why you didn't include upgrades for precompiles and authenticators? ForksI'm somewhat confused by the |
I expanded the PR description a fair amount this morning, in part to explain the two different types of forks (although they appear the same in the genesis file). Sorry that it is quite a tedious description. A fork defined by an extension is limited by what the We place an emphasis on extensions to allow network operators to expand functionality of kwild without PRing our repo to make it baked into the code. These are the extended forks. To allow some network-level customization without forking and hacking into kwil-db (a bad thing for many reasons), I've concocted a first iteration of some well defined changes that can be fully implemented and activated without modifying the kwild code, only registering your If we consider the canonical forks, it's impossible to test without some initial named fork. That's Gremlin. It has an implementation in Go, but it is not registered (won't activate) unless a genesis.json has it like For instance, search go-ethereum code for I think we should do a Q&A after the meeting since there is a lot of concepts here that are very subtle, as well as changes that are essentially shower thoughts turned to proof-of-concept, as you've noted on some of the commented fields of the
Open to it! One goal of this work was to explore the possibilities and get feedback for what kind of changes we might want to be programmable. Initially I intended to basically just:
So, basically just give us an out rather than "well, forget that network, on to the next version". The extended forks concept seemed like a good idea given Kwil use cases thus far. But in general, if the Kwil team identified a sensible or even critical update that applied to all conceivable Kwil networks, that's where we'd cut to the chase and make the backward-incompatible changes, but with a hard fork definition and activation height to coordinate it's rollout (and allow old blocks to still process). |
I'm not sure how I missed the section on Gremlin in the description 🤦. Apologies for that, that was a totally redundant question. I read the part on forks and was still a bit confused, but totally missed Gremlin. I definitely think I am on the same page now. I guess my only other question is regarding the |
// To modify an existing payload in a backward incompatible way, instead | ||
// create a new version of the payload such as PayloadTypeExecuteActionV2. | ||
// TODO: consider payload removal/invalidation and replace. | ||
TxPayloads []Payload // Type() and tx app Route() implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly mind this, however Im not sure if I quite see the usecase for it, since users cannot register their own routes and payloads in the first place.
Obviously here they could, but does it feel weird to only expose this functionality if a user has a fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we're discussing this in the same context, because this is to do just that. It gives network operators (and the core kwil-db dev team) ways to add or potentially modify payloads with some level of coordination because validators cannot upgrade their software at the same instant in practice. And removal of payloads (although not currently supported like with the resolution mods) necessitates an associated height otherwise we cannot replay any blocks or sync new nodes.
With TxPayloads
, network X operators would define new playloads in the extension system, and have them either activated at genesis if they are planning a new network, or at some arbitrary point in their existing network's timeline, such as if they conceive of a need for a new payload type that they want to enable without breaking the network.
New payloads becoming available smoothly in a network upgrade to even the official kwil-db code base (canonical) necessitate this too. But with an extended fork, that would mean they are too specific to network X to be part of the kwil-db official code base, and are thus done as an extended fork so hacking the code is not necessary.
This is just one way to handle new payloads. Registering them at activation is one way to have the tx router only use them when it is time, but conditional logic could exist in the routes as well.
internal/abci/abci.go
Outdated
// If at activation height, submit any consensus params updates associated | ||
// with the fork. They should not overlap (some forks should be superseded | ||
// by later fork definitions and not used on new networks). | ||
paramUpdates := defaultParamUpdates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In defaultParams
, you specify ed25519 as the only key type. Wouldn't you just want a totally empty params here, so that any specially configured values for the PubKeyTypes
won't be overwritten by this default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are updates rather than the entire set of parameter. But yes, the default updates should have been empty, but historically were always "updating" at every block to the same parameters. If we change them to empty now, the resulting apphash will diverge because the updates become part of cometbft's apphash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so we update cometbft at every block? Damn that seems really messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're basically starting fresh with 0.8, I'm fixing this longstanding quirk. Using totally empty params as the updates, as is intended when nothing changes.
chainID: chainID, | ||
GasEnabled: GasEnabled, | ||
chainID: chainParams.ChainID, | ||
GasEnabled: !chainParams.ConsensusParams.WithoutGasCosts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now support updating of these params via a fork, we now need a way to read this data dynamically, right? Probably by passing the current block's consensus params as part of the TxContext
? If a user were to update their GasEnabled
via a fork, it would not get updated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a sharp observation, although WithoutGasCosts
is unchangeable presently because the parameter updates pertain only to the BaseConsensusParams
:
type ConsensusParams struct {
BaseConsensusParams
// These are unchangeable after genesis. In fact, WithoutNonces probably doesn't even work.
WithoutNonces bool `json:"without_nonces"`
WithoutGasCosts bool `json:"without_gas_costs"`
}
I will have another scan through to see if we're failing to reference the active parameters anywhere else.
Which reminds me, WithoutNonces
should be eliminated. I think we've removed any app logic based on it, and there was nothing really sensible in the public key transaction signing infrastructure.
internal/txapp/txapp.go
Outdated
@@ -71,6 +77,9 @@ type TxApp struct { | |||
// The various internal stores (accounts, votes, etc.) are accessed through | |||
// the Database via the functions defined in relevant packages. | |||
|
|||
chainParams *chain.GenesisConfig // NOTE: .ConsensusParams can change (be out of date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this field is totally unused, aside from when you set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch. I'm pretty certain the note is a message to myself to remove the field on account of the problematic mutability.
continue | ||
} | ||
r.log.Infof("running StateMod for %q", fork.Name) | ||
if err := fork.StateMod(ctx, &common.App{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort've unfortunate we have to do this twice (once for consensus params, once for everything else), but seems probably unavoidable.
Want to make sure we don't forget about this, because I think it is pretty close to being good. Should we discuss this today? |
Just putting my main effort into JSON-RPC at the moment since there are a number of next steps for @Yaiba , but I will circle back asap. We could discuss this p.m. or tomorrow though. |
No sweat, we can circle back when it's more convenient. |
Oh, I had failed to push three commits yesterday. Pushed now. Will also extend the fork definition to include authenticator updates... |
core/types/serialize/encode.go
Outdated
// DecodeGeneric decodes the given serialized data into the given value. See | ||
// also Decode for use with an existing instance. This is generally no more | ||
// useful than Decode; it is syntactic sugar that requires no existing instance | ||
// of the type. | ||
func DecodeGeneric[T any](bts SerializedData) (*T, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'll remove this as discussed rather than rename. It's hard to see a reason to use it imo
Will get this reviewed tonight. I'm about there, just wanna scan through again |
Overall I think it looks good. A few small things, but I don't think any of these are blockers:
Overall, I'm good on this though, but will wait since Gavin is reviewing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT.
Barely scanned abci part so I don't have much insight of it.
I looked into the fork mechanism mostly and it works for me. A question about the toolings for fork to work, mostly KGW, say X network want to support a new Codec, KGW could break in this network unless upgrade as well, but currently I don't see how they can do it
|
||
// Update transaction payloads. | ||
for _, newPayload := range fork.TxPayloads { | ||
r.log.Infof("Registering transaction route for payload type %s", newPayload.Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it feels weird, is there use case that would want to remove a type of payload(or Codec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely due to a non-determinism issue
r.log.Errorf("hardfork %v at height %d has no definition", name, height) | ||
continue // really could be a panic | ||
} | ||
activations = append(activations, fork) // how to handle multiple at same height? alphabetical?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if order really matters, just not to activate them in same height
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have the order.OrderMap
util function in core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if order really matters, just not to activate them in same height
Exactly. forks with conflicting semantics should not overlap.
return nil, err | ||
} | ||
app.log.Infof("Preparing ABCI application at height %v, appHash %x", height, appHash) | ||
activeForks := app.forks.ActivatedBy(uint64(height)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we need the ordered forks(by height) from ActivatedBy
?
// at the given block height, and retrieves the associated changes from the | ||
// consensus package that contains the canonical and extended fork definitions. | ||
func (a *AbciApp) Activations(height int64) []*consensus.Hardfork { | ||
// hmm, this is a tup of the TxApp method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make helper function like Activations(height int64, forks forks.Forks, logger Logger)
?
Got some edits, and want to test something. |
Are there more changes on the way? I think this pr is good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Yeah, sorry, need to resolve some conflicts and re-check something. Will be ready again pretty soon. |
No rush |
I agree, this is a likely desired change that would certainly need coordination lest blocks get rejected. In general, there is a lot left to the imagination here, and much that might end up being phased out like perhaps the tx payload addition. The main thing IMO is to get a system and machinery in place for defining and activating such changes. A minor or patch release can add fields that specify the changes, and either height or a resolution can be used to flip it on. |
The broad goal of this work is dynamic customizability of any consensus-critical aspect of kwild, with a basic height-based coordination schema. The specific changes are as follows: Refactor internal/txapp to eliminate a large amount of code duplication in each route. Changes to common logic is no longer required in every single route. Routes are registered in a map and retrieved by (*TxApp).Execute. All routes are composed of a baseRoute embedding a consensus.Route, which contains the route- specific logic. Transaction payloads (and payload types) may be dynamically registered. Authenticators may be added/modified/removed in the existing extensions/auth registry. Serialization schema may be dynamically registered. A schema is defined as a Codec, which associates an encoding type with an encoder and decoder. Resolutions may also be dynamically added/modified/removed. The new extensions/consensus package defines the Hardfork type that combines all changes as described above into a single change definition. When combined with the new genesis file's "activations" section, a height-based coordinated upgrade may be performed. To support these coordinated upgrades, both AbciApp and TxApp are updated. In addition to the dynamic functionality changes, a coordinated upgrade also has the ability to perform arbitrary state modifications and consensus engine parameter updates. A test node is defined in test/nodes/fork. This nodes uses the extensions/consensus mechanism for testing the above dynamic changes. The integration tests also test a coordinate network halt, which is intended for coordinating network migrations. - https://gist.github.com/alexanderbez/5e87886221eb304b9e85ad4b167c99c8 - https://web.archive.org/web/20201112011700/https://github.com/cosmos/cosmos-sdk/wiki/Cosmos-Hub-1-Upgrade - https://hub.cosmos.network/migration/latest#method-i-manual-upgrade The genesis file code is moved from cmd/kwild/config into the new common/chain package since it is more broadly useful in kwild than simply application config like with config.toml. Move internal/ident to common/ident (still the main module, but not hidden from external uses like extended kwild builds). Add a new `core/client.(*Client).NewSignedTx` method for advanced transaction authoring with any transactions.Payload. Remove the obsolete "withoutNonces" setting.
docs: kwilteam/docs#108
Background
Certain types of rule (consensus code) changes can be made forward-compatible so that older software versions accept data (e.g. blocks, transactions) from newer versions of the software. A soft fork is such rule modification that is forward compatible, usually making rules more strict or adding ignorable extra data, but not changing or adding things that would cause older software to reject these blocks.
A hard fork breaks this possibility. In a hard fork, a newer version of the software can create data that older versions reject. This type of change must be coordinated. Validators must adopt the breaking rule changes at the same time (height) to avoid network disruption and liveness issues, not just when they install the new release that ships the code with breaking changes.
Further, synchronizing a new node must know when the changes become applicable otherwise the (re)execution of blocks generated prior to the new code was deployed would fail entirely or with an incorrect appHash. This was a scenario we encountered in fixing a bug on the 0.6 testnet, preventing any new node from rejoining the network or even locally rebuilding state from block data.
This PR introduces mechanisms to coordinate and define these changes.
Overview of changes
Coordinated rule changes in this PR are accomplished by:
uint64
) for named forks (string
) =>map[string]*uint64
in theForkHeights
field of thecommon/chain.GenesisConfig
struct. (chaincfg.go) Named forks with activation heights defined per-chain are foundational in Ethereum. For instance: hereextensions/consensus.Hardfork
struct for each named fork, and making these known to kwild withRegisterHardfork
. This is more novel than ad-hoc conditions sprinkled throughout code. I'm trying to allow networks to customize their networks using the extension system to define changes that would otherwise break consensus if not activated with coordination.Types of forks in
kwild
There are two ways coordinated rule changes may be defined:
Canonical forks. These have their own named field in the
common/chain/forks.Forks
struct. Being baked into kwild, these can affect any part of the code more directly with the named method likeIsForkX(height uint64)
andBeginsForkX(uint64)
that may switch between different logic (e.g.ThresholdV1()
orThresholdV2()
) at/after the current height. This is exactly the pattern used withgo-etherum
's implementation (and other chains that have planned activation). For instance: here, here, etc.This is extremely flexible and open-ended, but not amenable to customization. These are changes that Kwil developers see as important or necessary, but which still require coordinated upgrade to avoid network failure or liveness issues if nodes simply update at their convenience. A network may define the appropriate height at which the corresponding rules should activate in genesis.json. NOTE: a fresh network may very well define several to activate at genesis, as is also Ethereum's way for new public or developer networks. For instance: sepolia and ad hoc test networks.
Extension-defined forks aka "extended forks". These are also in the
Forks
struct, but their names and activation heights are stored in the fieldExtended map[string]*uint64 //
. The definition (implementation) of the changes corresponding to an extended fork are defined with anjson:"extended"
extensions/consensus.Hardfork
instance, and are limited by the customization mechanisms exposed via that type. We are starting with a few clear and expected types of changes, and may expand on them for the sake of network customization without the need to forkkwild
. More details below.Note that canonical forks also have a
extensions/consensus.Hardfork
instance associated with the fork name, but they are not limited by the mechanisms exposed in the extension system.Applications for future hardfork definitions
etc. With these changes, we have a system for making such changes take place at a certain height.
packages
common/chain
contains theGenesisConfig
struct that models genesis.json. New:Forkheights
field, methods, package-levelForks
singleton, andBaseConsensusParams
from whichConsensusParams
derives. Much of this file was moved from the somewhat awkward location incmd/kwild/config
, which is now focused primarily on kwild application config rather than the network definition.common/chain/forks
contains theForks
struct definition that has the canonical forks as named fields, with methods to support checking activation status against both canonical and extended forks.extensions/consensus
. This is how the implementation of the hardfork changes are defined. See next section for detailed description.common/ident
is extracted frominternal/ident
by necessity (authenticator use in extended fork code).core/types/transactions
supports dynamically added payload types and route handlers from extensions (see corresponding changes ininternal/txapp
and router changes elsewhere).internal/txapp
is refactored. A great deal of code duplication is removed and theTxApp
's route system more extensible now, with a singlebaseRoute
that handles the common logic including spending and DB transaction lifetime management. Routes may also be registered by their implementation, which may be defined in terms of the types defines in more low-level packages and registered withRegisterRouteImpl
, unlike the baked-in routes defined withininternal/txapp
itself.types/serialize
is refactored a little to support registering aCodec
that can expand serialization capability via hardfork.Extension-defined forks
These "extended forks" are constrained by the limitations of the
extensions/consensus.Hardforks
type, but allow modifying network rules with the extension system without modifying kwild code.The
extensions/resolutions.Hardfork
structUsed by every canonical or extended fork, this type characterizes the changes in certain well defined ways. It is well documented, so read this, but it is essential aspects are:
Name
corresponds to the string in the genesis file that tells when to activate the rule changesTxPayloads []Payload
specifies new transaction payloads to recognize at activation. This includes a new system for definingRoute
s for the tx app, described in the next section.ResolutionUpdates []*ResolutionMod
defines changes (add/modify/remove) to the oracle/resolution system.ParamsUpdates *ParamUpdates
defines changes to the consensus paremeters. Some of these belong to cometbft, while some affect ABCI app behavior. All changes lead to app hash changes.StateMod func(context.Context, *common.App) error
is triggered at activation. It can do anything, one time. For instance, arbitrary change to application state via the Engine or more directly to the DB may be made at the end of block at the activation height.TxApp Route system refactoring
The refactoring of the
TxApp
routes was something to address regardless. Each route was mostly boilerplate code shared between each route. This is now factored out into abaseRoute
. ThebaseRoute
embeds aconsensus.Route
, which provides the unique implementation of the route. ThebaseRoute
's own method provide the boilerplate once, calling into theconsensus.Route
's methods to run route-specific logic that takes place before and after the db transaction is created, as well as pricing.For convenience, I am using a
*common.App
as one of the inputs to theconsensus.Route
. Thecommon.App
type was initially for the precompiles and resolutions extension, so this might be a little awkward (ExtensionConfigs
are useless in this context), but it has a similar purpose.Gremlin test hard fork
extensions/consensus/defaults.go
is where we would define any hardforks that we require to change rule, fix bugs, etc that would break a network if not coordinated.I had defined one fork call "gremlin" for testing all of the hardfork and extension machinery. In any production environment, this fork would never be enabled (absent/nil in genesis config). None of the tooling is aware of it.