Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Oct 15, 2025

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 15, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 17, 2025, 6:26 PM

@claude
Copy link
Contributor

claude bot commented Oct 15, 2025

Claude finished @alpe's task —— View job


Comprehensive Code Review ✓

I've completed a thorough review of the PR "fix: Validate block headers against state". Here are my findings organized by focus area:


Summary

This PR adds state-based validation of block headers by introducing a LastHeaderHash field to the blockchain state and implementing AssertValidForNextState() method. This strengthens the integrity checks during block synchronization and production.


Critical Issues

1. Inconsistent Error Handling in trySyncNextBlock (block/internal/syncing/syncer.go:433-439)

Issue: The error handling after validateBlock is inconsistent with reviewer expectations. The reviewer (julienrbrt) noted that all errors from this path should return errInvalidBlock, but the current implementation has complex conditional logic.

Current code:

if err := s.validateBlock(currentState, data, header); err != nil {
    s.cache.RemoveHeaderDAIncluded(header.Hash().String())
    if !errors.Is(err, errInvalidState) && !errors.Is(err, errInvalidBlock) {
        return errors.Join(errInvalidBlock, err)
    }
}

Problems:

  • If err is errInvalidState or errInvalidBlock, the function continues executing instead of returning
  • This creates a silent failure path where validation fails but block application proceeds
  • The logic doesn't match the comment on line 382: "If the error is not due to a validation error, re-store the event as pending"

Expected behavior (per review comment):
All validation errors should return errInvalidBlock. The function should not continue after validation failure.

Recommendation:

if err := s.validateBlock(currentState, data, header); err != nil {
    s.cache.RemoveHeaderDAIncluded(header.Hash().String())
    if errors.Is(err, errInvalidState) {
        return err  // Preserve critical state error
    }
    return errors.Join(errInvalidBlock, err)  // All other errors are invalid blocks
}

2. Redundant Height Check (block/internal/syncing/syncer.go:361-365)

Issue: The reviewer noted that the height check in processHeightEvent is redundant because it's already validated prior to calling trySyncNextBlock.

Current code (line 361-365):

// If this is not the next block in sequence, store as pending event
// This check is crucial as trySyncNextBlock simply attempts to sync the next block
if height != currentHeight+1 {
    s.cache.SetPendingEvent(height, event)
    ...
    return
}

This check occurs before calling trySyncNextBlock, which then validates the same condition inside AssertValidForNextState (types/state.go:70-72).

Recommendation: Consider documenting why this check exists at both levels (early rejection vs. validation), or remove the redundancy if the comment is incorrect.


Code Quality & Architecture

Positive Aspects

  1. Good separation of concerns: The new AssertValidForNextState method cleanly encapsulates state transition validation
  2. Comprehensive test coverage: types/state_test.go includes thorough test cases for all validation scenarios
  3. Proper use of atomic operations: State management uses atomic.Pointer correctly for concurrent access
  4. Protobuf schema evolution: Properly reserves field 7 and adds new field 9 for LastHeaderHash

Areas for Improvement

3. Protobuf Field Naming Inconsistency (proto/evnode/v1/state.proto:18)

Issue: Field naming doesn't follow protobuf conventions.

bytes LastHeaderHash = 9;  // PascalCase

All other fields use snake_case (chain_id, initial_height, last_block_height, last_block_time, da_height, app_hash). This field should be:

bytes last_header_hash = 9;

Impact: While functionally correct due to the go_package option, this creates inconsistency and may confuse developers.


4. Memory Efficiency - Unnecessary Cloning (block/internal/syncing/syncer.go:159-161)

Issue: GetLastState() creates defensive copies of byte slices, but this may be unnecessary overhead depending on usage patterns.

stateCopy := *state
stateCopy.AppHash = bytes.Clone(state.AppHash)
stateCopy.LastHeaderHash = bytes.Clone(state.LastHeaderHash)

Questions:

  • Are callers expected to mutate the returned state?
  • Could we return a read-only view or document that mutation is forbidden?
  • What's the frequency of this call in hot paths?

Recommendation: Either:

  1. Document why defensive copying is necessary
  2. Profile and optimize if this shows up in hot paths
  3. Consider a read-only accessor method that returns pointers with clear documentation

Error Handling & Resilience

5. Critical State Divergence Handling (block/internal/syncing/syncer.go:386-389)

Good: The code properly identifies state divergence as fatal:

case errors.Is(err, errInvalidState):
    s.logger.Fatal().Uint64("block_height", event.Header.Height()).
        Uint64("state_height", s.GetLastState().LastBlockHeight).Err(err).
        Msg("Invalid state detected - block references do not match local state. Manual intervention required.")

Concern: Using logger.Fatal() will exit the process immediately. Consider whether this should instead:

  1. Send to the error channel for graceful shutdown
  2. Allow the node to halt cleanly and provide recovery instructions
  3. Include more diagnostic information (actual vs expected hashes)

Recommendation:

case errors.Is(err, errInvalidState):
    s.logger.Error().Uint64("block_height", event.Header.Height()).
        Uint64("state_height", s.GetLastState().LastBlockHeight).
        Hex("got_last_hash", event.Header.LastHeaderHash).
        Hex("want_last_hash", s.GetLastState().LastHeaderHash).
        Err(err).Msg("Invalid state detected")
    s.sendCriticalError(err)  // Let error channel handle shutdown
    return

6. Validation Order - Execute Before Validate Risk (block/internal/syncing/syncer.go:441-444)

Issue: If validation fails and returns nil (due to the bug in #1), execution proceeds:

// Apply block
newState, err := s.applyBlock(header.Header, data, currentState)

This means invalid blocks could be executed if the validation error handling is incorrect.

Recommendation: Add explicit return statement or better, fix the validation error handling per #1.


Security

7. Good: State Continuity Validation

The new validation logic properly checks:

  • Chain ID matches
  • Height is sequential (n+1)
  • Time is monotonically increasing
  • LastHeaderHash chains correctly
  • AppHash matches previous state

This significantly improves Byzantine fault tolerance and prevents:

  • Block injection attacks
  • Chain reorganization attacks
  • State forking

8. Consideration: Genesis State Handling (types/state.go:66-68)

if len(s.LastHeaderHash) == 0 { // initial state
    return nil
}

Question: Is an empty LastHeaderHash sufficient to detect genesis state? Could this create a vulnerability if state is corrupted?

Recommendation: Consider checking s.LastBlockHeight == s.InitialHeight - 1 as an additional condition for genesis state detection.


Performance & Resource Efficiency

9. Retry Logic Without Exponential Backoff (block/internal/syncing/syncer.go:511-538)

The executeTxsWithRetry method uses fixed delays:

case <-time.After(common.MaxRetriesTimeout):
    continue

Issues:

  • No exponential backoff
  • Fixed retry count may not be optimal for all failure scenarios
  • Could hammer a failing execution client unnecessarily

Recommendation: Consider implementing exponential backoff or making the retry strategy configurable.


10. Efficient: Atomic State Access Pattern

Good: The use of atomic.Pointer[types.State] is well-implemented:

  • Minimizes lock contention
  • Safe for concurrent reads
  • Proper defensive copying in GetLastState()

Testing & Reliability

11. Excellent Test Coverage (types/state_test.go)

Strengths:

  • Tests all validation error paths
  • Uses table-driven tests effectively
  • Includes both positive and negative cases
  • Tests edge cases (initial state, time ordering, hash mismatches)

12. Missing Test Cases

Gaps:

  1. No test for concurrent access to state (data race testing)
  2. No test for the validation error path in trySyncNextBlock (the bug in ABCI interface #1)
  3. No integration test showing state recovery after validation failure
  4. No test for genesis state edge cases

Recommendation: Add tests for:

func TestSyncer_InvalidBlockHandling(t *testing.T) {
    // Test that invalid blocks are rejected and not rescheduled
}

func TestState_ConcurrentAccess(t *testing.T) {
    // Test with race detector enabled
}

Documentation & Developer Experience

13. Good Documentation

Strengths:

  • Clear function-level comments
  • Explains validation logic in AssertValidForNextState
  • Comments explain the error handling strategy

14. Missing Documentation

Gaps:

  1. No comment explaining why validation happens in multiple places
  2. No documentation on state recovery procedures after errInvalidState
  3. Protobuf field LastHeaderHash lacks a comment explaining its purpose
  4. No migration notes for nodes upgrading to this version

Recommendation: Add:

// last_header_hash is the hash of the last block's header, used to validate
// chain continuity during block validation. Empty for genesis state.
bytes last_header_hash = 9;

Additional Observations

15. .gitignore Addition (test/e2e/.gitignore)

Good: Adding /testnet/ to gitignore prevents test artifacts from being committed.

16. Test Data Uniqueness Fix (block/internal/submitting/submitter_test.go:306)

Good: Making test transaction data more unique by including chainID and height reduces test flakiness:

d.Txs = types.Txs{types.Tx(fmt.Sprintf("any-unique-tx-%s-%d-%d", chainID, height, now.UnixNano()))}

Summary of Recommendations

Must Fix (Blocking Issues)

  1. Fix validation error handling in trySyncNextBlock (lines 433-439)
  2. Add explicit return after validation failure or fix silent failure bug

Should Fix (High Priority)

  1. Rename protobuf field to follow conventions: last_header_hash
  2. Improve state divergence error handling - use error channel instead of Fatal
  3. Add test coverage for validation error paths

Consider (Medium Priority)

  1. Document or remove redundant height check
  2. Improve retry logic with exponential backoff
  3. Add concurrent access tests
  4. Document state recovery procedures

Nice to Have (Low Priority)

  1. Consider memory efficiency of defensive copying
  2. Add more diagnostic information to error messages
  3. Document validation happens at multiple layers

Conclusion

This PR introduces important state validation logic that significantly improves blockchain integrity. However, the critical error handling bug in trySyncNextBlock must be fixed before merge, as it could allow invalid blocks to be processed. Once the critical issue is addressed, the PR will be in good shape.

Overall Assessment: Approve with required changes to error handling.


return nil
}
// Validate header against state
if header.Height() != state.LastBlockHeight+1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good sanity check, but this is validated prior to calling trySyncNextBlock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this into the AssertValidForNextState method. In context of the State object, it makes more sense and is used by Executor as well.

alpe added 3 commits October 17, 2025 17:36
* main:
  feat(cache): cache pruning (#2761)
  refactor: replace sort.Slice with slices.Sort for natural ordering (#2768)
  chore!: cleanup header unused types (#2766)
  chore: remove unused flag (#2765)
  chore: fix some comments (#2762)
@alpe alpe force-pushed the alex/fix_block_validation branch from d3aa64e to 74cc608 Compare October 17, 2025 15:59
@alpe alpe marked this pull request as ready for review October 20, 2025 06:35
case errors.Is(err, errInvalidBlock):
// do not reschedule
case errors.Is(err, errInvalidState):
s.logger.Fatal().Uint64("block_height", event.Header.Height()).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fatal send an os.Exit(1), which will shortcut all cleanup logic we have

// errInvalidBlock is returned when a block is failing validation
errInvalidBlock = errors.New("invalid block")
// errInvalidState is returned when the state has diverged from the DA blocks
errInvalidState = errors.New("invalid state")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error isn't used anymore

switch {
case errors.Is(err, errInvalidBlock):
// do not reschedule
case errors.Is(err, errInvalidState):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

if err := s.validateBlock(currentState, data, header); err != nil {
// remove header as da included (not per se needed, but keep cache clean)
s.cache.RemoveHeaderDAIncluded(header.Hash().String())
return errors.Join(errInvalidBlock, fmt.Errorf("failed to validate block: %w", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just keep this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants