[WIP] witness state caching #1818

pratikspatil024 · 2025-10-13T09:36:07Z

Description

Please provide a detailed description of what was done in this PR

Changes

Bugfix (non-breaking change that solves an issue)
Hotfix (change that solves an urgent issue, and requires immediate attention)
New feature (non-breaking change that adds functionality)
Breaking change (change that is not backwards-compatible and/or changes current functionality)
Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

I have added at least 2 reviewer or the whole pos-v1 team
I have added sufficient documentation in code
I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

This PR requires changes to heimdall
- In case link the PR here:
This PR requires changes to matic-cli
- In case link the PR here:

Testing

I have added unit tests
I have added tests to CI
I have tested this code manually on local environment
I have tested this code manually on remote devnet using express-cli
I have tested this code manually on amoy
I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

…p-pos-2947

pratikspatil024 · 2025-10-13T09:37:47Z

core/blockchain.go

+		}
+	}
+
 	// Validate witness.


I think we should move this before the merge (cache -> witness.state). Let me know if you think the same. @cffls @lucca30

Not sure if I understood the reason behind it.

We validate the witness before the execution. Now once we have the cache, I merge the cache into the witness, so my question was to do this witness validation before the merge or leave it after the merge.

…g windows

…p-pos-2947

…to psp-pos-2947

cffls · 2025-10-29T19:16:37Z

eth/protocols/wit/protocol.go

+// Also used for reduced witness requests (distinguished by IsReduced flag).
 type GetWitnessRequest struct {
 	WitnessPages []WitnessPageRequest // Request by list of witness pages
+	IsReduced    bool                 // True if requesting reduced witness (omits cached states)


Maybe Compact is a bit more descriptive?

cffls · 2025-10-29T20:14:04Z

core/blockchain.go

+		oldActiveSize := len(bc.activeCacheMap)
+		bc.activeCacheMap = bc.nextCacheMap
+		bc.nextCacheMap = make(map[string]struct{})
+		bc.cacheWindowStart = blockNum


How do we ensure different nodes can all agree on the same cacheWindowStart? From the current state of the code, it seems like this value will be different depending on at which block the node starts syncing.

Thanks! Fixed here. Right now, the window size is 20, but we can change later if needed.

cffls · 2025-10-29T20:15:07Z

core/blockchain.go

+	cacheWindowStart uint64              // Start block of current window
+	cacheWindowSize  uint64              // Size of each window (e.g., 20 blocks)
+	cacheOverlapSize uint64              // Overlap between windows (e.g., 10 blocks)


These values are tied to the consensus. Might be a good idea to make them constant.

Updated, thanks!

cffls · 2025-10-29T20:51:24Z

core/blockchain.go

+
+// updateSlidingWindowCache updates the sliding window cache after processing a block.
+// This is used by both full nodes and witness-receiving nodes to maintain synchronized caches.
+func (bc *BlockChain) updateSlidingWindowCache(blockNum uint64, witness *stateless.Witness, statedb *state.StateDB) {


Would be good to have a timer metric here to understand the performance impact of constructing the caches.

Added here.

cffls · 2025-10-29T20:54:57Z

eth/handler_wit.go


+// handleGetReducedWitness retrieves witnesses and filters them using the sliding window cache.
+// This reduces bandwidth by omitting state nodes that the receiver should already have cached.
+func (h *witHandler) handleGetReducedWitness(peer *wit.Peer, req *wit.GetWitnessPacket) (wit.WitnessPacketResponse, error) {


The result could be cached to reduce computation.

Does this look good? Unsure about the size of the cache though, it will be ok to cache blocks > length of the window, right?

cffls · 2025-10-29T21:21:41Z

core/blockchain.go

+
 	// Validate witness.
 	// During parallel import, defer pre-state validation to the end of the batch.
 	if !bc.parallelStatelessImportEnabled.Load() {


In parallelStatelessImport mode, witness cache won't work. Need to make sure parallel block import and witness cache are used in a mutually exclusive way.

Good point.
Just to double check -

If the sender has parallel mode enabled, it will only send full witness (irrespective of the request)?

Also, if the receiver has parallel mode enabled, it will only be able to sync with full witness?

Fixed here, thanks!

cffls · 2025-10-29T21:40:13Z

core/blockchain.go

+				if _, exists := bc.nextCacheMap[stateNode]; !exists {
+					bc.nextCacheMap[stateNode] = struct{}{}
+					cachedToNext++


It seems like the witness here is a super set of all trie nodes from active window start block to the current block (because of mergeSpanCacheIntoWitness) in stateless nodes. If this is true, bc.nextCacheMap will have the same size as bc.activeCacheMap and hence lose its original purpose - trim off unnecessary nodes. The correct implementation * I think * is to make sure the witness passed to updateSlidingWindowCache contains only the trie nodes that are used by the current block.

Nice catch! Fixed here.

…apSize as constant

… constructing the caches

…s state caching) mutually exclusive

… merged states

…to psp-pos-2947

…pdating GetWitnessRequest

sonarqubecloud · 2025-11-03T11:12:36Z

Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

cffls · 2025-11-03T21:57:02Z

core/blockchain.go

+	// Only in sequential mode - parallel import is incompatible with cache
+	// Pass originalWitnessStates (not full witness) to avoid re-caching merged states
+	if !bc.parallelStatelessImportEnabled.Load() {
+		bc.updateSlidingWindowCache(blockNum, originalWitnessStates, statedb)


If I understand correctly, originalWitnessStates will be full if the block is at window starts (even the overlapped ones), and only compact in other blocks? Otherwise, if the witness is compact, the stateless node might not be able to execute correctly in a later block in the window.

Example

Trie node usage

Trie nodes used by block [1-20]:

{ "0x1", # trie node used by block 1 "0x2", # trie node used by block 2 }

Trie node used by block [11-30]:

{ "0x1", # trie node used by block 11 and 30 "0x3", # trie node used by block 11 }

Witness

The compact witness for block 11 would be:

{ "0x3", # new trie node used by block 11 }

because 0x1 is trimmed off due to its presence in the first window.

The full witness for block 11 would be:

{ "0x1", # trie node used by block 11 and 30 "0x3", # trie node used by block 11 }

if 0x1 isn't included in the second cache window [11-30], which is the compact witness of block 11, execution of block 30 will fail.

Alternative

I think an alternative solution is to force the stateless node to compute the full witness during the block execution, and use that as the original full witness. This will allow nodes to receive only compact witness for any block (except the first one) but still allow the cache window to shift at the same time.

cffls · 2025-11-03T22:06:47Z

eth/handler_eth.go

+	expectedWindowStart := (blockNum / windowSize) * windowSize
+
+	// If block is at window start, need full witness (this is first block of window)
+	// The sender will have just cleared/slid their cache at this block
+	if blockNum == expectedWindowStart {
+		return false
+	}
+
+	// Block is within window but not at start, can use compact witness
+	// The sender should have cached states from earlier blocks in this window
+	return true


The first block of the overlapped window is also a start block. If this is true, the node should request full witness every 10 blocks instead of 20 (assuming 20-block cache window and 10-block overlap).

pratikspatil024 added 4 commits October 10, 2025 16:42

core: implemented witness state caching

e53b771

core/state: added helper functions

3d66730

Merge branch 'develop' of https://github.com/maticnetwork/bor into ps…

57ba58f

…p-pos-2947

core: updates in witness state caching

62bd494

pratikspatil024 commented Oct 13, 2025

View reviewed changes

pratikspatil024 added 4 commits October 23, 2025 17:02

core: updated the witness state caching mechanism to now use 2 slidin…

d4707da

…g windows

core: added a way for full nodes to replicate the witness state cache

46139f2

Merge branch 'develop' of https://github.com/maticnetwork/bor into ps…

7492ecc

…p-pos-2947

Merge branch 'psp-pos-2955' of https://github.com/maticnetwork/bor in…

c6f6ac8

…to psp-pos-2947

pratikspatil024 changed the base branch from develop to psp-pos-2955 October 27, 2025 10:19

pratikspatil024 added 5 commits October 27, 2025 17:25

eth/protocols/wit: added message type to get and send reduced witnesses

464072b

eth, core: added logic for reduced witnesses

4d42f38

core, eth: sending reduced witness

211e129

eth: updated mock witness peer

3dfe7ad

fix lint

3683987

cffls reviewed Oct 29, 2025

View reviewed changes

pratikspatil024 added 10 commits October 30, 2025 16:16

updated name reduced -> compact

737856f

core: made compactWitnessCacheWindowSize and compactWitnessCacheOverl…

f05239e

…apSize as constant

core: added timer metric here to understand the performance impact of…

40a9051

… constructing the caches

eth: added caching for compact witnesses

bee6136

core, eth: made parallel stateless import and compact witness (witnes…

ee7342f

…s state caching) mutually exclusive

core: bug fix in witness caching - prevent cache bloat from recaching…

61b1a40

… merged states

core, eth: bug fix - setting the correct start block accross the nodes

eddf235

Merge branch 'psp-pos-2955' of https://github.com/maticnetwork/bor in…

d5a365d

…to psp-pos-2947

eth: use full witness if peer is wit/1

f95b0d6

eth: bug fix: fixed compatibility between wit/1 and wit/2 caused by u…

6127cb1

…pdating GetWitnessRequest

cffls reviewed Nov 3, 2025

View reviewed changes

[WIP] witness state caching #1818

Are you sure you want to change the base?

[WIP] witness state caching #1818

Uh oh!

Conversation

pratikspatil024 commented Oct 13, 2025

Description

Changes

Breaking changes

Nodes audience

Checklist

Cross repository changes

Testing

Manual tests

Additional comments

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cffls Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 3, 2025

Quality Gate failed

Uh oh!

Choose a reason for hiding this comment

Example

Trie node usage

Witness

Alternative

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cffls Oct 29, 2025 •

edited

Loading