Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP - Catastrophic blockchain failures and recovery #10

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jcnelson
Copy link
Contributor

This SIP attempts to codify a set of procedures for recovering from catastrophic errors in the blockchain that either cause it to crash, or cause severe safety problems for other peoples' digital assets and code. I was inspired to write this in light of the recent network outage on 7 February. The procedures outlined in this SIP are meant to be "game plans" for dealing with future such events, as well as drawing a few lines in the sand as to what's a legitimate reason for following some of the more severe recovery procedures (e.g. forks).

@whoabuddy
Copy link
Member

Thank you for this! Without knowing much about soft and hard forks on the technical side, this SIP gave me a great understanding of what capabilities exist to mitigate a catastrophic failure. Each possible action contained a clear description and action plan, with emphasis on using least-disruptive method possible.

2. The branch will be submitted as a pull-request to the `master` branch, and
will be reviewed and approved by at least two blockchain engineers, representing
both the Stacks Foundation and at least one other major Stacks ecosystem entity.
3. If warranted, an unofficial announcement will be made to various public
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this not be warranted? Given:

A failure qualifies as a catastrophic failure if and only if there is no conceivable way for the correct nodes in the network to make progress and preserve safety without human intervention.

Two options that could happen automatically:

  • setup GitHub to announce any releases for stacks-blockchain automatically
  • setup Discord to accept and relay the message from [email protected]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the more-disruptive procedures use this procedure as a step. Some of them, like the one below it, require a public embargo on talking about the bug because the act of making the bug public knowledge would make the problem worse. For example, if someone discovered a bug that would let an attacker steal anyone's STX, we would not announce the bug until the fix was already deployed.

I like the idea of relaying all messages from [email protected] to Discord. Is there a bot that can do this? Not too familiar with Discord (I'm old-school), but I'd love to have a dedicated #announce channel which just relayed these messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even have a separate (3) step? IMO step (5) could very well be re-worded as "Availability of new binaries will be communicated via standard channels per established norms of the project -- currently via the announce@ email list and Discord"

In particular, is there some specific advantage an "unofficial" announcement has? If Github releases are a source of truth, availability of a new release is automatically a "formal" announcement IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Github didn't render your comment in the appropriate place for some reason. Meant to post this reply here).

The point is to let node operators know when the source code for the fix is available for scrutiny and testing (which is mentioned at the end of (3)).

To encourage users who discover such sensitive blockchain bugs to report them
while keeping them secret, the Stacks Foundation will a bug-bounty program that
will be set up once this SIP activates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: ..Foundation will start*? bug-bounty program.

Would love to explore this more, may be relevant to COC?
The incentive weights of malicious actors is also interesting to play with in bug-bounties mentioned presumably?

Ty kindly for all this, this kind of SIP goes a long way to opening up maintenance know how in the space 💯 agree with @whoabuddy's sentiments there.

Copy link
Contributor Author

@jcnelson jcnelson Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate your feedback @HaroldDavis3. Ty kindly for taking the time to read through this SIP :)

The incentive weights of malicious actors is also interesting to play with in bug-bounties mentioned presumably?

If at all possible, I think we'll want to make it more profitable for malicious actors who find bugs that trigger catastrophic failures to just tell us about the bug privately instead of exploit it. We'd want to do some research and figure out what the ROI would be for exploiting various kinds of serious bugs, and see if we can offer some kind of ROI-matching bug bounty (it's a tricky calculation -- just because the attacker can steal funds on-chain, for example, doesn't necessarily mean that they can cash them out without consequences).

@jcnelson
Copy link
Contributor Author

Would y'all like to talk about this some more at the governance call this week?

Copy link
Contributor

@diwakergupta diwakergupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcnelson -- this is a great starting point, left bunch of minor comments!

sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
with `fix/` to the Stacks Blockchain reference implementation, hosted at
https://github.com/blockstack/stacks-blockchain.
2. The branch will be submitted as a pull-request to the `master` branch, and
will be reviewed and approved by at least two blockchain engineers, representing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] s/"blockchain engineers"/"developers with write access to the stacks-blockchain repo"

Alternatively: "two members of the Stacks Core Developers group"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, is there some specific advantage an "unofficial" announcement has? If Github releases are a source of truth, availability of a new release is automatically a "formal" announcement IMO.

The point is to let node operators know when the source code for the fix is available for scrutiny and testing (which is mentioned at the end of (3)).

Copy link
Contributor Author

@jcnelson jcnelson Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see Github corrupted the comments and moved the one that I quoted above into a different thread.

I don't think the set of people who have write access to the stacks blockchain repo accurately reflects the set of people will be called upon to fix catastrophic bugs. I addressed this in 3832ae2 by defining the Stacks Core Developers as a list of developer names and contacts in a supplementary file, which can be added to or removed from by the Steering Committee. To be clear, this is just the list of folks who will be called upon to execute these procedures; this isn't meant to be an exclusionary list of who's-who in the blockchain or some other social club or clique. It's more like a "who is it okay to email at 3am on a Saturday if the blockchain halts and catches fire" list.

2. The branch will be submitted as a pull-request to the `master` branch, and
will be reviewed and approved by at least two blockchain engineers, representing
both the Stacks Foundation and at least one other major Stacks ecosystem entity.
3. If warranted, an unofficial announcement will be made to various public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even have a separate (3) step? IMO step (5) could very well be re-worded as "Availability of new binaries will be communicated via standard channels per established norms of the project -- currently via the announce@ email list and Discord"

In particular, is there some specific advantage an "unofficial" announcement has? If Github releases are a source of truth, availability of a new release is automatically a "formal" announcement IMO.

sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
sips/sip-011/sip-011-catastrophic-failure-recovery.md Outdated Show resolved Hide resolved
@stacksgov stacksgov deleted a comment from diwakergupta Feb 16, 2021
@jcnelson
Copy link
Contributor Author

@kantai I've added a set of case studies to demonstrate how each of these catastrophic error recover procedures can be used.

phases vote to accept the soft fork rules within the activation window, then the new
rules will take effect starting in the next whole reward cycle (i.e. right after
the second prepare phase finishes). All new releases of the Stacks node will
adhere to the soft fork rules, since they are now part of the block validation rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this soft-fork process, would the node released in step 1 immediately begin enforcing the soft-fork rules (i.e., ignoring blocks that do not follow the soft fork)? Then is step 4 just a consolidation of this rule (i.e., the declaration that all future releases will also follow the soft fork rule)?

Copy link
Contributor Author

@jcnelson jcnelson Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. This SIP only requires that miners who vote for the soft-fork do so through this signaling procedure, and that the new rules must take effect no later than when this activation threshold is met. The SIP intentionally does not specify anything about how early miners can start deciding when to ignore blocks they would no longer consider valid, since miners have this power already (with or without soft forks). In fact, per point 2, the provision that "stricter criteria are permitted" is meant to allow a particular soft-fork upgrade to impose additional requirements for miners to begin activating the new rules. For example, in the third case study below, a particular soft fork to repair a bug in smart contract processing could require miners to begin orphaning blocks in which the bug manifests ASAP.

Comment on lines +524 to +530
In addition, the Stacks Core Developers would coordinate with the Foundation to
release a version of the node software that included a fix for the bug, as well
as code to re-activate smart contracts under the condition that
the vast majority of miners have _rejected_ all the forks in
which this bug's exploits have occurred. In other words, smart contracts would
only re-activate if a fork that does _not_ descend from any block in which an
exploit occurs becomes the dominant fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this itself be a hard-fork? Repairing these kinds of bugs could easily create a hard-fork, but is this class of solution only considering soft-fork-able fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, admittedly this example is a bit contrived because it assumes that there's a bug severe enough that anyone can take anyone else's STX, but at the same time, we can somehow identify each occurrence of the exploit each STX's true owner. This maybe isn't the best example.

The point of this section is to explain how to use a chain fork to "undo" any active exploits, assuming they could be identified as such. The real-world example I had in mind of this was Bitcoin's integer overflow bug whereby 2**64 - 1 BTC got minted. This was handled by a soft fork then, and we could do something similar if there was a STX minting bug or some other kind of bug that resulted in the number of liquid STX increasing unexpectedly.

@whoabuddy
Copy link
Member

Note: the first goal for the Governance CAB will be to review and comment on SIP-011 by the next governance meeting on 2021/09/16, after which we can discuss any comments and help move this toward being ratified!

@jcnelson jcnelson changed the title SIP 011 SIP - Catastrophic blockchain failures and recovery May 16, 2022
@jcnelson jcnelson added the Draft SIP is in draft status label Jun 9, 2022
@AcrossfireX
Copy link

Further advancement of this particular SIP is pending more real world stress testing of this SIP being used in DR scenarios. As this has been successfully utilized in recent Stacks updates that should be noted and consideration for moving to Accepted should be considered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Draft SIP is in draft status
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants