Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EIP: State conversion to Verkle Tree #8752

Merged
merged 4 commits into from
Jul 31, 2024

Conversation

jsign
Copy link
Contributor

@jsign jsign commented Jul 23, 2024

This PR creates a new EIP for the state conversion to Verkle Tree.

@eth-bot
Copy link
Collaborator

eth-bot commented Jul 23, 2024

✅ All reviewers have approved.

@github-actions github-actions bot added the w-ci Waiting on CI to pass label Jul 23, 2024
@eth-bot eth-bot added e-consensus Waiting on editor consensus e-review Waiting on editor to review labels Jul 23, 2024
@jsign jsign force-pushed the jsign-conversion-eip branch 2 times, most recently from 7c88462 to c51570d Compare July 23, 2024 16:33
Signed-off-by: Ignacio Hagopian <[email protected]>
Signed-off-by: Ignacio Hagopian <[email protected]>
Copy link

The commit 64f729c (as a parent of 45375d3) contains errors.
Please inspect the Run Summary for details.

EIPS/eip-x.md Outdated Show resolved Hide resolved
EIPS/eip-x.md Outdated Show resolved Hide resolved
Signed-off-by: Ignacio Hagopian <[email protected]>
@github-actions github-actions bot removed the w-ci Waiting on CI to pass label Jul 24, 2024
EIPS/eip-7748.md Outdated

## Motivation

The accounts state is big enough to make inviable waiting for transactions to organically move all of them to the VKT thought the Overlay Tree. Thus, we need a strategy to convert all the state within a reasonable time. The state conversion completion allows removing the Overlay Tree abstraction introduced in [EIP-7612](./eip-7612.md) and use directly the VKT for all state access.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The accounts state is big enough to make inviable waiting for transactions to organically move all of them to the VKT thought the Overlay Tree. Thus, we need a strategy to convert all the state within a reasonable time. The state conversion completion allows removing the Overlay Tree abstraction introduced in [EIP-7612](./eip-7612.md) and use directly the VKT for all state access.
The accounts state is too large to wait for transactions to organically move all of them to the VKT through the Overlay Tree. Thus, we need a strategy to convert all the state within a reasonable time. The state conversion completion allows removing the Overlay Tree abstraction introduced in [EIP-7612](./eip-7612.md) and use directly the VKT for all state access.

Alternatively: "Given the size of the full accounts state, it is inviable to wait for the natural modification of state via user transactions to trickle all accounts from the MPT to the VKT through the Overlay Tree."

It seems to me a second motivation would be, the fact that some zombie parts of state will cause us to have to have the two tree structure forever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've incorporated the suggestions.

It seems to me a second motivation would be, the fact that some zombie parts of state will cause us to have to have the two tree structure forever.

Indeed, that's what I intended to highlight by saying that waiting for "organic" activity would be problematic.


for chunk_num in range(len(chunked_code)):
state_set_codechunk(state, address, chunk_num, chunked_code[chunk_num])
n += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so its possible for n > stride depending on code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this clarified in the Rationale/Account code chunking done in a single step subsection since might not be obvious why was decided this way.

else:
# Getting the code from the Overlay Tree is fine since promises returning
# the Account full code which would come from the MPT or a separate code database.
account = get_account(state, curr_account.address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we always convert account and its code in one go (makes sense)

Performing the conversion step before the block txs execution has some benefits:

- If the state conversion step is done after txs execution, there's a possibility that txs execution writes overlap with converted key-values, having to care about them becoming stale in the same block. With the proposed ordering, they can only become stale by writes of previous blocks.
- It can reduce the complexity of optimizations, such as frontrunning the state conversion for the next block before it arrives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very neat point

Copy link
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, there is a small nit which once addressed we can merge the PR

@github-actions github-actions bot added c-new Creates a brand new proposal s-draft This EIP is a Draft t-core labels Jul 31, 2024
Signed-off-by: Ignacio Hagopian <[email protected]>
Copy link
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@eth-bot eth-bot enabled auto-merge (squash) July 31, 2024 16:21
Copy link
Collaborator

@eth-bot eth-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Reviewers Have Approved; Performing Automatic Merge...

@eth-bot eth-bot merged commit 53e3e29 into ethereum:master Jul 31, 2024
11 checks passed

- A contract storage slot.
- A contract code. (i.e. all the code is a single _conversion unit_)
- An account data. (e.g. balance, nonce, code-hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, would "balance" be a conversion unit, or would "balance+nonce+codehash" be a conversion unit? And how would that play out with the updated verkle leaf design where all version, balance, nonce and codesize are all package within a single slot (BASIC_DATA)? It seems to me like it would make sense to consider a "unit" whatever fits in a single "verkle slot" in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A conversion unit is "An account data" which includes all the fields (balance, nonce, code-hash). The rationale for this is that on the MPT side all these values are packed in a single tree key.

The complete space of options also have other variants. As you mention can be thought as "what is a unit" on the VKT side. Or maybe think of each field as a unit. And maybe others.

The option of "what is a unit on the MPT side" is the one chosen here. I see those other options as making the logic more complex, since now it means there're more places where you can partially migrate an account. And I don't see the benefit.

Copy link
Member

@gabrocheleau gabrocheleau Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. Definitely, I think we should go for whatever has the smallest complexity burden, and partial account migration is a concern that I had in mind. If we imagine an edge case where an account gets partially migrated at block n:

  • balance and nonce get migrated, and then we stop the account conversion short of converting code_size, having reached "n = CONVERSION_TRIDE`
  • in block n+1, we need to resume the conversion of the account

Then we have a situation where the "BASIC_DATA" of the verkle tree has been partially updated. We therefore have to keep track of the fact that some BASIC_DATA fields have been converted (e.g. version, balance, nonce), while code_size has not. Clients will therefore have to make that discernment during the transition, and won't be able to rely on e.g. the simple fact that BASIC_DATA.version !== undefined. What do you think?

Copy link
Contributor Author

@jsign jsign Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly why I prefer the current AccountDataPhase. All the fields are moved in one go, and there's no partial migration for all those fields.

Currently, the ***Phase describes all the situations where you can have a partially migrated account. Today we only have StoragePhase and AccountDataPhase so to push as much as possible to reduce these "partially migrated" situations (but not abuse having a too big conversion unit).

So I believe we're both saying the same? I didn't quite get if your last question implies you're proposing something different than today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it sounds like we agree then! Good

Comment on lines +184 to +188
slot_num, slot_value, next_key = trie_get_next_at_key(trie, curr_account.phase.next_key)
# The Overlay Tree will write in the VKT. We use the new
# only_if_empty parameter to avoid writing stale values.
set_storage(state, curr_account.address, slot_num, slot_value, only_if_empty=True)
n += 1
Copy link
Member

@gabrocheleau gabrocheleau Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are clients able to disambiguate between a storage slot that has been (purposefully) set back to 0, and a storage slot that has never been touched? Trying to picture an edge case where confusion would arise with regards to stale values, let's say:

  • Conversion starts at time t
  • Slot s previously was assigned the value 31 in the MPT.
  • Sometime after the conversion has begun, slot s is reset to 0. The 0 value is therefore written to the verkle trie.
  • At a later block, conversion reaches storage slot s. How can a client detect that the 0 value is not stale (and therefore should not be replaced by the 31 value, which is outdated)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the EIP perspective, the only_if_empty is the way to signal that the write should only happen if it wasn't written by some tx.

From a real impl perspective, it up to the client to figure this out. One option is to do a Get in the Verkle Tree and see if is empty or has a value. Since VKT can distinguish between empty and zero, then that's an option. Accessing the tree can be slow so other faster options can be considered, such as some tricks using flat-dbs to write there if something was written after the conversion started, so you can assume exists in VKT and the MPT value is stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-new Creates a brand new proposal e-consensus Waiting on editor consensus e-review Waiting on editor to review s-draft This EIP is a Draft t-core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants