-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOFv0 for packaging legacy code in Verkle Trees #58
Changes from 2 commits
1be5a30
9eef32a
cdb301a
380f1a7
68244b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# EOFv0 for packaging legacy code in Verkle Trees | ||
|
||
The design draft that proposes the use of EOF | ||
for storing code in Verkle Trees. | ||
An alternative to the existing method of executing | ||
31-byte code chunks accompanied by 1 byte of metadata. | ||
|
||
## Goal | ||
|
||
Provide the result of the jumpdest analysis of a deployed code as the EOF section. | ||
During code execution the jumpdest analysis is already available | ||
and the answer to the question "is this jump target valid?" can be looked up | ||
in the section. This allows using 32-byte Verkle Tree code chunks | ||
(instead of 31-byte of code + 1 byte of metadata). | ||
|
||
## Specification Draft | ||
|
||
1. Put the code in the single *code* EOF section. | ||
2. Use the EOF container format proposed by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540) with | ||
version 0 and following modifications to "Changes to execution semantics": | ||
1. `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` operates on the *code* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed on our meeting, this is an important change. This a separate set of semantics to what EOFv1 proposes. |
||
section only. | ||
2. `JUMP`/`JUMPI`/`PC` relates code positions to the *code* section only. | ||
3. Perform the jumpdest analysis of the code at deploy time (during contract creation). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This actually makes a big semantical change here: we are locking in the EVM version at the time of contract deployment (or verkle transition). Currently in mainnet we rely on the fact that semantics of contracts can change. This is both a negative (and maybe a positive?) Example: if a new opcode is introduced, the jumpdest analysis result of a contract may change. This is not the case after this proposal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've not been following 100%, but could we mitigate this:
by versioning the jumpdest analysis result and updating it on first use after the new opcode was introduced? Similar to how we do "packaging" (point 5), we do "re-packaging". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what you mean -- I mean that if the jumpdest analysis result is locked it at the time of contract creation / verkle transition, then introduction of new legacy opcodes will need to have special considerations for such contracts. Currently we can do an analysis on chain to see what effect a new opcode may be (does it change semantics of contracts?), but with the addition of the jumpdest table this analysis is different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that when you store the jumpdest analysis result during verkle transition, you can prepend a version There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is unfeasible. Would need to update the entire chain at every hardfork. This is one of the reasons any transition like verkle or flat-tree proposed earlier hits the roadblock. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting topic. Locking jumpdest analysis allows introducing instructions with immediate data. But this only works for deployed code but not for initcode (where analysis runs before execution). |
||
4. Store the result of the jumpdest analysis in the *jumpdest* EOF section as proposed | ||
by [EIP-3690](https://eips.ethereum.org/EIPS/eip-3690), | ||
but the jumpdests encoding changed to bitmap. | ||
5. The packaging process is done for every deployed code during Verkle Tree migration | ||
and also for every contract creation later | ||
(i.e. becomes the part of the consensus forever). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed a special handling for data contracts: one could check if the first byte is a terminating instruction (STOP, REVERT, etc.) or an unassigned instruction. However, marking it as a data contract based on an unassigned instruction, then the "problem" listed in https://github.com/ipsilon/eof/pull/58/files#r1484267443 comes up. The only "use case" here I can see is that someone deploys a contract with a soon-to-be-introduced instruction, and that will not work. Think about those merge NFTs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what happens to initcode? Do creation transaction and CREATE/CREATE2 accept the container? I assume you didn't intend it, as it's a big change, so they kinda accept only stripped-down "code section". And deployed container is also not returned from initcode as container. This should be mentioned I think. I think I personally would prefer not to package it into EOF container and not to frame it as "container" and anything related to EOF at all. Just prepend bytecode with jumptable in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a fix to Verkle trie structure, it's not like we're inventing yet another EOF to make Verkle compatible with EOFv1. |
||
## Backwards Compatibility | ||
|
||
EOF-packaged code execution if fully compatible with the legacy code execution. | ||
This is achieved by prepending the legacy code with EOF header and the section containing | ||
jumpdest metadata. The contents of the code section is identical to the lagacy code. | ||
|
||
Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code | ||
and legacy code extracted from the wrapping without any information loss. | ||
Implementations may consider keeping the legacy code in the database without modifications | ||
and only construct the EOF wrapping when loading the code from the database. | ||
|
||
It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` | ||
instructions. However, we cannot remove these instructions from the code because | ||
this would break at least *dynamic* jumps (where we will not be able to adjust their jump targets). | ||
chfast marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Rationale | ||
|
||
### Jumpdests encoding | ||
|
||
Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. | ||
This should be efficient for an average contract but behaves badly in the worst case | ||
(every instruction in the code is a `JUMPDEST`). | ||
The delta encoding has also another disadvantage for Verkle Tree code chunking: | ||
whole (?) section must be loaded and preprocessed to check a jump target validity. | ||
|
||
We propose to use a bitmap to encode jumpdests. | ||
Such encoding does not need pre-processing and provides random access. | ||
This gives constant 12.5% size overhead, but does not have the two mentioned disadvantages. | ||
chfast marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Extensions | ||
|
||
### Data section | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it doesn't make sense for the given example heuristic. What it does is searches for the unreachable code at the end without any valid jumpdests in it. We can achieve the same or better effect by trimming the jumpdest section to the latest bit set. |
||
|
||
Let's try to identify a segment of code at the end of the code where a contract stores data. | ||
We require a heuristic that does not generate any false positives. | ||
This arrangement ensures that the instructions inspecting the code | ||
work without modifications on the continuous *code*+*data* area | ||
|
||
Having a *data* section makes the *code* section and therefore the *jumpdest* section smaller. | ||
|
||
Example heuristic: | ||
|
||
1. Decode instructions. | ||
2. Traverse instructions in reverse order. | ||
3. If during traversal a terminating instruction (`STOP`, `INVALID`, etc) | ||
or the code beginning is encountered, | ||
then the *data* section starts just after the current position. | ||
End here. | ||
4. If during traversal a `JUMPDEST` instruction is encountered, | ||
then there is no *data* section. | ||
End here. | ||
|
||
### Prove all jump targets are valid | ||
|
||
If we can prove that all jump targets in the code are valid, | ||
then there is no need for the *jumpdest* section. | ||
|
||
In the solidity generated code all `JUMPI` instructions are "static" | ||
(preceded by a `PUSH` instruction). | ||
Only some `JUMP` instructions are not "static" because they are used to implement | ||
returns from functions. | ||
|
||
Erigon project had an analysis tool which was able to prove all jump validity | ||
for 90+% of contracts. | ||
|
||
### Super-dense metadata encoding (6-bit numbers) | ||
|
||
Follow the original Verkle Tree idea to provide the metadata of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another variant of this could be storing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this is actually true, the But the main disadvantage is that you need to preprocess the section to use it. I did some quick calculations: for the max number of chunks 768 the size of the 5-bit encoded section is 480 bytes. Assuming 3 bytes per entry of the map encoding, this encoding brings savings only if only 20% of chunks have non-zero entry. |
||
"number of leading pushdata bytes in a chunk". However, instead of including | ||
this metadata as a single byte in the chunk itself, place the value as a 6-bit | ||
encoded number in the *metadata* EOF section. This provides the following benefits: | ||
|
||
1. The code executes by full 32-byte chunks. | ||
2. The *metadata* overhead is smaller (2.3% instead of 3.2%). | ||
3. The *metadata* lookup is only needed for jumps | ||
(not needed when following through to the next chunk). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd summarise the main goal as the following:
With reusing basic EOF constructs, this allows a more simplified verkle implementation supporting both "eof0 legacy" and eof1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondary objective: can this result in a better "code-to-data" ratio (by avoiding chunking)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by your term "not having chunking" or "avoiding chunking". The chunking will be still present. Do you mean the chunking scheme with 31-byte code payload and additional metadata byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree "simplifying verkle impl" and "better code-to-data" (to be verified with data!) are the ultimate goals and benefits here, and should be listed in the doc to keep focused on that.