Skip to content

Conversation

@pratikspatil024
Copy link
Member

Description

Please provide a detailed description of what was done in this PR

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

}
}

// Validate witness.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this before the merge (cache -> witness.state). Let me know if you think the same. @cffls @lucca30

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understood the reason behind it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate the witness before the execution. Now once we have the cache, I merge the cache into the witness, so my question was to do this witness validation before the merge or leave it after the merge.

@pratikspatil024 pratikspatil024 changed the base branch from develop to psp-pos-2955 October 27, 2025 10:19
// Also used for reduced witness requests (distinguished by IsReduced flag).
type GetWitnessRequest struct {
WitnessPages []WitnessPageRequest // Request by list of witness pages
IsReduced bool // True if requesting reduced witness (omits cached states)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Compact is a bit more descriptive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

oldActiveSize := len(bc.activeCacheMap)
bc.activeCacheMap = bc.nextCacheMap
bc.nextCacheMap = make(map[string]struct{})
bc.cacheWindowStart = blockNum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure different nodes can all agree on the same cacheWindowStart? From the current state of the code, it seems like this value will be different depending on at which block the node starts syncing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed here. Right now, the window size is 20, but we can change later if needed.

Comment on lines 368 to 370
cacheWindowStart uint64 // Start block of current window
cacheWindowSize uint64 // Size of each window (e.g., 20 blocks)
cacheOverlapSize uint64 // Overlap between windows (e.g., 10 blocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are tied to the consensus. Might be a good idea to make them constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!


// updateSlidingWindowCache updates the sliding window cache after processing a block.
// This is used by both full nodes and witness-receiving nodes to maintain synchronized caches.
func (bc *BlockChain) updateSlidingWindowCache(blockNum uint64, witness *stateless.Witness, statedb *state.StateDB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a timer metric here to understand the performance impact of constructing the caches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here.


// handleGetReducedWitness retrieves witnesses and filters them using the sliding window cache.
// This reduces bandwidth by omitting state nodes that the receiver should already have cached.
func (h *witHandler) handleGetReducedWitness(peer *wit.Peer, req *wit.GetWitnessPacket) (wit.WitnessPacketResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result could be cached to reduce computation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look good? Unsure about the size of the cache though, it will be ok to cache blocks > length of the window, right?


// Validate witness.
// During parallel import, defer pre-state validation to the end of the batch.
if !bc.parallelStatelessImportEnabled.Load() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In parallelStatelessImport mode, witness cache won't work. Need to make sure parallel block import and witness cache are used in a mutually exclusive way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Just to double check -

  1. If the sender has parallel mode enabled, it will only send full witness (irrespective of the request)?
  2. Also, if the receiver has parallel mode enabled, it will only be able to sync with full witness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here, thanks!

Comment on lines +4284 to +4286
if _, exists := bc.nextCacheMap[stateNode]; !exists {
bc.nextCacheMap[stateNode] = struct{}{}
cachedToNext++
Copy link
Contributor

@cffls cffls Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the witness here is a super set of all trie nodes from active window start block to the current block (because of mergeSpanCacheIntoWitness) in stateless nodes. If this is true, bc.nextCacheMap will have the same size as bc.activeCacheMap and hence lose its original purpose - trim off unnecessary nodes. The correct implementation * I think * is to make sure the witness passed to updateSlidingWindowCache contains only the trie nodes that are used by the current block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed here.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 3, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

// Only in sequential mode - parallel import is incompatible with cache
// Pass originalWitnessStates (not full witness) to avoid re-caching merged states
if !bc.parallelStatelessImportEnabled.Load() {
bc.updateSlidingWindowCache(blockNum, originalWitnessStates, statedb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, originalWitnessStates will be full if the block is at window starts (even the overlapped ones), and only compact in other blocks? Otherwise, if the witness is compact, the stateless node might not be able to execute correctly in a later block in the window.

Example

Trie node usage

Trie nodes used by block [1-20]:

{
    "0x1", # trie node used by block 1
    "0x2", # trie node used by block 2
}

Trie node used by block [11-30]:

{
    "0x1", # trie node used by block 11 and 30
    "0x3", # trie node used by block 11
}

Witness

The compact witness for block 11 would be:

{
    "0x3", # new trie node used by block 11
}

because 0x1 is trimmed off due to its presence in the first window.

The full witness for block 11 would be:

{
    "0x1", # trie node used by block 11 and 30
    "0x3", # trie node used by block 11
}

if 0x1 isn't included in the second cache window [11-30], which is the compact witness of block 11, execution of block 30 will fail.

Alternative

I think an alternative solution is to force the stateless node to compute the full witness during the block execution, and use that as the original full witness. This will allow nodes to receive only compact witness for any block (except the first one) but still allow the cache window to shift at the same time.

Comment on lines +166 to +176
expectedWindowStart := (blockNum / windowSize) * windowSize

// If block is at window start, need full witness (this is first block of window)
// The sender will have just cleared/slid their cache at this block
if blockNum == expectedWindowStart {
return false
}

// Block is within window but not at start, can use compact witness
// The sender should have cached states from earlier blocks in this window
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first block of the overlapped window is also a start block. If this is true, the node should request full witness every 10 blocks instead of 20 (assuming 20-block cache window and 10-block overlap).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants