Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EFM] Add Epoch Fallback Phase #6116

Merged
merged 18 commits into from
Jun 25, 2024

Conversation

jordanschalm
Copy link
Member

@jordanschalm jordanschalm commented Jun 18, 2024

This PR adds a new epoch phase to the data model, representing a state where epoch fallback mode has been triggered and we are within the view range of the latest committed epoch.

This phase is subtly different than the existing EpochFallbackTriggered flag in the protocol state:

  • EpochFallbackTriggered may be true while in EpochPhaseCommitted (in this case the protocol state flag and the epoch phase are "unsynchronized")
  • If current phase is not EpochPhaseCommitted, then when EpochFallbackTriggered is true, we must be in EpochPhaseFallback

@jordanschalm jordanschalm changed the title wip: add sketch, todos WIP: Epoch Fallback Phase Jun 18, 2024
Base automatically changed from jord/6013-events-metrics to feature/efm-recovery June 20, 2024 14:57
@codecov-commenter
Copy link

codecov-commenter commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 32.22222% with 61 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/efm-recovery@45f80fe). Learn more about missing BASE report.

Files Patch % Lines
utils/unittest/fixtures.go 0.00% 34 Missing ⚠️
state/protocol/mock/snapshot.go 0.00% 17 Missing ⚠️
cmd/util/cmd/common/snapshot.go 80.00% 1 Missing and 1 partial ⚠️
model/flow/epoch.go 0.00% 2 Missing ⚠️
state/protocol/badger/mutator.go 75.00% 2 Missing ⚠️
utils/unittest/epoch_builder.go 0.00% 2 Missing ⚠️
state/protocol/inmem/epoch.go 0.00% 1 Missing ⚠️
state/protocol/invalid/snapshot.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                   @@
##             feature/efm-recovery    #6116   +/-   ##
=======================================================
  Coverage                        ?   41.67%           
=======================================================
  Files                           ?     1972           
  Lines                           ?   139198           
  Branches                        ?        0           
=======================================================
  Hits                            ?    58016           
  Misses                          ?    75130           
  Partials                        ?     6052           
Flag Coverage Δ
unittests 41.67% <32.22%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jordanschalm jordanschalm marked this pull request as ready for review June 21, 2024 18:51
@jordanschalm jordanschalm changed the title WIP: Epoch Fallback Phase [EFM] Add Epoch Fallback Phase Jun 21, 2024
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Looks good.

state/protocol/badger/mutator.go Outdated Show resolved Hide resolved
state/protocol/badger/mutator.go Outdated Show resolved Hide resolved
state/protocol/badger/mutator.go Outdated Show resolved Hide resolved
state/protocol/badger/snapshot.go Outdated Show resolved Hide resolved
@AlexHentschel
Copy link
Member

Overall, I have one request to properly document the convention when we have the EpochPhaseFallback. My understanding is that we are in EpochPhaseFallback if and only if we would accept an Recovery event (the bocks where we enter EFM and exit EFM might be an exception from this rule, not sure). I think we should document this in detail in various places in the code base:

  • In particular, I think it is important to clearly document our convention when EpochPhaseFallback in the logic that controls the Epoch state during EFM
    state := parentState.EpochProtocolStateEntry.Copy()
    nextEpochCommitted := state.EpochPhase() == flow.EpochPhaseCommitted
    // we are entering fallback mode, this logic needs to be executed only once
    if !state.EpochFallbackTriggered {
    // the next epoch has not been committed, but possibly setup, make sure it is cleared
    if !nextEpochCommitted {
    state.NextEpoch = nil
    }
    state.EpochFallbackTriggered = true
    }
    together with the logic that interprets the Epoch State and deduces the EpochPhase:
    // EpochPhase returns the current epoch phase.
    // The receiver EpochProtocolStateEntry must be properly constructed.
    func (e *EpochProtocolStateEntry) EpochPhase() EpochPhase {
  • Furthermore, I would suggest to just copy this documentation to

For now, I would just drop the following documentation into all the different places mentioned above. I have tried to not only document the epoch phases but also when the notifications are emitted. Please check that it is correct. While it is a lot of words, I think it is super important to unambiguously specify our convention (both for people using the code as well as for engineers attempting to extend the code later to not break anything).

// EpochPhase returns the current epoch phase. During normal operations, each Epoch transitions through
// the phases:
//
//	║                                       Epoch N
//	║       ╭─────────────────────────────────┴─────────────────────────────────╮
//	║   finalize view            EpochSetup            EpochCommit
//	║     in epoch N            service event         service event
//	║        ⇣                       ⇣                     ⇣
//	║        ┌─────────────────┐     ┌───────────────┐     ┌───────────────────┐
//	║        │EpochPhaseStaking├─────►EpochPhaseSetup├─────►EpochPhaseCommitted├ ┄>
//	║        └─────────────────┘     └───────────────┘     └───────────────────┘
//	║        ⇣                       ⇣                     ⇣
//	║   EpochTransition     EpochSetupPhaseStarted    EpochCommittedPhaseStarted
//	║    Notification            Notification               Notification
//
// However, if the Protocol State encounters any unexpected epoch service events, or the subsequent epoch
// fails to be committed by the `EpochCommitSafetyThreshold`, then we enter Epoch Fallback Mode [EFM].
// Depending on whether the subsequent epoch has already been committed, the EFM progress differs slightly.
// In a nutshell, we always enter the _latest_ epoch already committed on the happy path (if there is any)
// and the follow the fallback protocol.
//
// Scenario A: the future Epoch N is already committed, when we enter EFM
//
//	║      Epoch N-1                            Epoch N
//	║   ···──┴─────────────────────────╮ ╭─────────────┴───────────────────────────────────────────────╮
//	║      invalid service                finalize view                   EpochRecover
//	║            event                    in epoch N                      service event
//	║              ⇣                      ⇣                    ┊          ⇣
//	║     ┌──────────────────────────┐    ┌────────────────────┊────┐     ┌───────────────────────────┐
//	║     │   EpochPhaseCommitted    ├────►    EpochPhaseFallback   ├─────►    EpochPhaseCommitted    ├ ┄>
//	║     └──────────────────────────┘    └────────────────────┊────┘     └───────────────────────────┘
//	║              ⇣                      ⇣                    ┊          ⇣
//	║   EpochFallbackModeTriggered     EpochTransition   EpochExtended*   EpochFallbackModeExited
//	║          Notification             Notification      Notification    + EpochCommittedPhaseStarted Notifications
//	║              ┆                                                      ┆
//	║              ╰┄┄┄┄┄┄┄┄┄┄ EpochFallbackTriggered is true ┄┄┄┄┄┄┄┄┄┄┄┄╯
//
// With 'EpochExtended*' we denote that there can be zero, one, or more Epoch Extension (depending on when
// we receive a valid EpochRecover service event.
//
// Scenario B: we are in Epoch N without any subsequent epoch being committed when entering EFM
//
//	║                         Epoch N
//	║ ···────────────────────────┴───────────────────────────────────────────────────────────────╮
//	║              invalid service event or                         EpochRecover
//	║         EpochCommitSafetyThreshold reached                    service event
//	║                           ⇣                      ┊            ⇣
//	║  ┌────────────────────┐   ┌──────────────────────┊──────┐     ┌───────────────────────────┐
//	║  │ EpochPhaseStaking  │   │     EpochPhaseFallback      │     │   EpochPhaseCommitted     │
//	║  │ or EpochPhaseSetup ├───►                      ┊      ├─────►                           ├ ┄>
//	║  └────────────────────┘   └──────────────────────┊──────┘     └───────────────────────────┘
//	║                           ⇣                      ┊            ⇣
//	║            EpochFallbackModeTriggered     EpochExtended*      EpochFallbackModeExited
//	║                     Notification           Notification       + EpochCommittedPhaseStarted Notifications
//	║                           ┆                                   ┆
//	║                           ╰┄┄ EpochFallbackTriggered true ┄┄┄┄╯
func (e *EpochProtocolStateEntry) EpochPhase() EpochPhase {

Copy link
Member

@durkmurder durkmurder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean PR, nicely done!

I've added the extended documentation only to flow.EpochPhase and
linked to it in the other locations.
@jordanschalm
Copy link
Member Author

Furthermore, I would suggest to just copy this documentation to [...] the interface

// EpochPhase returns the epoch phase for the current epoch.
EpochPhase() flow.EpochPhase

I added this to EpochProtocolStateAdapter, but not the interface. Saying the concrete implementation must be properly constructed in an abstract interface definition didn't make much sense to me; correct construction of a specific struct type is the domain of the implementation, not the interface.

For now, I would just drop the following documentation into all the different places mentioned above.

Thank you for the excellent in-depth documentation. I have incorporated this into the flow.EpochPhase godoc, along with the existing phase change diagram. Elsewhere, I added links to the flow.EpochPhase type and corresponding godoc rather than copying the ~50 lines directly.

@jordanschalm jordanschalm merged commit 498d02e into feature/efm-recovery Jun 25, 2024
55 checks passed
@jordanschalm jordanschalm deleted the jord/6092-epoch-fallback-phase branch June 25, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants