-
Notifications
You must be signed in to change notification settings - Fork 556
[WIP] witness state caching #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: psp-pos-2955
Are you sure you want to change the base?
Conversation
| } | ||
| } | ||
|
|
||
| // Validate witness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understood the reason behind it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We validate the witness before the execution. Now once we have the cache, I merge the cache into the witness, so my question was to do this witness validation before the merge or leave it after the merge.
…to psp-pos-2947
eth/protocols/wit/protocol.go
Outdated
| // Also used for reduced witness requests (distinguished by IsReduced flag). | ||
| type GetWitnessRequest struct { | ||
| WitnessPages []WitnessPageRequest // Request by list of witness pages | ||
| IsReduced bool // True if requesting reduced witness (omits cached states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Compact is a bit more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
core/blockchain.go
Outdated
| oldActiveSize := len(bc.activeCacheMap) | ||
| bc.activeCacheMap = bc.nextCacheMap | ||
| bc.nextCacheMap = make(map[string]struct{}) | ||
| bc.cacheWindowStart = blockNum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure different nodes can all agree on the same cacheWindowStart? From the current state of the code, it seems like this value will be different depending on at which block the node starts syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed here. Right now, the window size is 20, but we can change later if needed.
core/blockchain.go
Outdated
| cacheWindowStart uint64 // Start block of current window | ||
| cacheWindowSize uint64 // Size of each window (e.g., 20 blocks) | ||
| cacheOverlapSize uint64 // Overlap between windows (e.g., 10 blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values are tied to the consensus. Might be a good idea to make them constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
core/blockchain.go
Outdated
|
|
||
| // updateSlidingWindowCache updates the sliding window cache after processing a block. | ||
| // This is used by both full nodes and witness-receiving nodes to maintain synchronized caches. | ||
| func (bc *BlockChain) updateSlidingWindowCache(blockNum uint64, witness *stateless.Witness, statedb *state.StateDB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a timer metric here to understand the performance impact of constructing the caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added here.
eth/handler_wit.go
Outdated
|
|
||
| // handleGetReducedWitness retrieves witnesses and filters them using the sliding window cache. | ||
| // This reduces bandwidth by omitting state nodes that the receiver should already have cached. | ||
| func (h *witHandler) handleGetReducedWitness(peer *wit.Peer, req *wit.GetWitnessPacket) (wit.WitnessPacketResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result could be cached to reduce computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look good? Unsure about the size of the cache though, it will be ok to cache blocks > length of the window, right?
|
|
||
| // Validate witness. | ||
| // During parallel import, defer pre-state validation to the end of the batch. | ||
| if !bc.parallelStatelessImportEnabled.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In parallelStatelessImport mode, witness cache won't work. Need to make sure parallel block import and witness cache are used in a mutually exclusive way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Just to double check -
- If the sender has parallel mode enabled, it will only send full witness (irrespective of the request)?
- Also, if the receiver has parallel mode enabled, it will only be able to sync with full witness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here, thanks!
| if _, exists := bc.nextCacheMap[stateNode]; !exists { | ||
| bc.nextCacheMap[stateNode] = struct{}{} | ||
| cachedToNext++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the witness here is a super set of all trie nodes from active window start block to the current block (because of mergeSpanCacheIntoWitness) in stateless nodes. If this is true, bc.nextCacheMap will have the same size as bc.activeCacheMap and hence lose its original purpose - trim off unnecessary nodes. The correct implementation * I think * is to make sure the witness passed to updateSlidingWindowCache contains only the trie nodes that are used by the current block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Fixed here.
…apSize as constant
… constructing the caches
…s state caching) mutually exclusive
…to psp-pos-2947
…pdating GetWitnessRequest
|
| // Only in sequential mode - parallel import is incompatible with cache | ||
| // Pass originalWitnessStates (not full witness) to avoid re-caching merged states | ||
| if !bc.parallelStatelessImportEnabled.Load() { | ||
| bc.updateSlidingWindowCache(blockNum, originalWitnessStates, statedb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, originalWitnessStates will be full if the block is at window starts (even the overlapped ones), and only compact in other blocks? Otherwise, if the witness is compact, the stateless node might not be able to execute correctly in a later block in the window.
Example
Trie node usage
Trie nodes used by block [1-20]:
{
"0x1", # trie node used by block 1
"0x2", # trie node used by block 2
}
Trie node used by block [11-30]:
{
"0x1", # trie node used by block 11 and 30
"0x3", # trie node used by block 11
}
Witness
The compact witness for block 11 would be:
{
"0x3", # new trie node used by block 11
}
because 0x1 is trimmed off due to its presence in the first window.
The full witness for block 11 would be:
{
"0x1", # trie node used by block 11 and 30
"0x3", # trie node used by block 11
}
if 0x1 isn't included in the second cache window [11-30], which is the compact witness of block 11, execution of block 30 will fail.
Alternative
I think an alternative solution is to force the stateless node to compute the full witness during the block execution, and use that as the original full witness. This will allow nodes to receive only compact witness for any block (except the first one) but still allow the cache window to shift at the same time.
| expectedWindowStart := (blockNum / windowSize) * windowSize | ||
|
|
||
| // If block is at window start, need full witness (this is first block of window) | ||
| // The sender will have just cleared/slid their cache at this block | ||
| if blockNum == expectedWindowStart { | ||
| return false | ||
| } | ||
|
|
||
| // Block is within window but not at start, can use compact witness | ||
| // The sender should have cached states from earlier blocks in this window | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first block of the overlapped window is also a start block. If this is true, the node should request full witness every 10 blocks instead of 20 (assuming 20-block cache window and 10-block overlap).


Description
Please provide a detailed description of what was done in this PR
Changes
Breaking changes
Please complete this section if any breaking changes have been made, otherwise delete it
Nodes audience
In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)
Checklist
Cross repository changes
Testing
Manual tests
Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it
Additional comments
Please post additional comments in this section if you have them, otherwise delete it