-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOFv0 for packaging legacy code in Verkle Trees #58
Changes from all commits
1be5a30
9eef32a
cdb301a
380f1a7
68244b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,166 @@ | ||||||||
# EOFv0 for packaging legacy code in Verkle Trees | ||||||||
|
||||||||
The design draft that proposes the use of EOF | ||||||||
for storing code in Verkle Trees. | ||||||||
An alternative to the existing method of executing | ||||||||
31-byte code chunks accompanied by 1 byte of metadata. | ||||||||
|
||||||||
## Goal | ||||||||
|
||||||||
Simplified legacy code execution in the Verkle Tree implementation. | ||||||||
|
||||||||
Better "code-to-data" ratio. | ||||||||
|
||||||||
Provide the result of the jumpdest analysis of a deployed code as the EOF section. | ||||||||
During code execution the jumpdest analysis is already available | ||||||||
and the answer to the question "is this jump target valid?" can be looked up | ||||||||
in the section. This allows using 32-byte Verkle Tree code chunks | ||||||||
(instead of 31-byte of code + 1 byte of metadata). | ||||||||
|
||||||||
## Specification | ||||||||
|
||||||||
### Container | ||||||||
|
||||||||
1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540). | ||||||||
2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if EOF is chosen as the format, with version |
||||||||
3. The EOFv0 consists of the header and two sections: | ||||||||
- *jumpdest* | ||||||||
- *code* | ||||||||
4. The header must contain information about the sizes of these sections. | ||||||||
For that the EIP-3540 header or a simplified one can be used. | ||||||||
5. The legacy code is placed in the *code* section without modifications. | ||||||||
6. The *jumpdest* section contains the set of all valid jump destinations matching the positions | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. has the alternative of storing invalid destinations been considered, that is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By invalid destinations do you mean push data? There are some alternative encodings in the "Jumpdest section encoding" section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not exactly - I mean the invalid |
||||||||
of all `JUMPDEST` instructions in the *code*. | ||||||||
The exact encoding of this section is specified separately. | ||||||||
|
||||||||
### Changes to execution semantics | ||||||||
|
||||||||
1. Execution starts at the first byte of the *code* section, and `PC` is set to 0. | ||||||||
2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the | ||||||||
end of the container). | ||||||||
3. `PC` returns the current position within the *code*. | ||||||||
4. The instructions which *read* code must refer to the *code* section only. | ||||||||
The modification keeps the behavior of these instructions unchanged. | ||||||||
These instructions are invalid in EOFv1. | ||||||||
The instructions are: | ||||||||
- `CODECOPY` (copies a part of the *code* section), | ||||||||
- `CODESIZE` (returns the size of the *code* section), | ||||||||
- `EXTCODECOPY`, | ||||||||
- `EXTCODESIZE`, | ||||||||
- `EXTCODEHASH`. | ||||||||
Comment on lines
+48
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
These 3 are valid when called in legacy and targeting EOF, they have semantics equaling to treating EOF code as being "just two bytes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I should not mention "original EIP-3540" because it is no longer relevant. |
||||||||
5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist | ||||||||
in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`. | ||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Applies only if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should skip all details about interaction with the new EOF calls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These points are to mostly to remember to be careful about "reusing" the |
||||||||
### Changes to contract creation semantics | ||||||||
|
||||||||
1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code | ||||||||
without EOF wrapping. However, because the EOF containers are not visible to any EVM program, | ||||||||
implementations may decide to wrap initcodes with EOFv0 and execute it the same way as | ||||||||
EOFv0 deployed codes. | ||||||||
2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). | ||||||||
3. The initcode still returns a plain deploy code. | ||||||||
The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). | ||||||||
4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state: | ||||||||
- perform jumpdest analysis of the plain code, | ||||||||
- encode the jumpdest analysis result as the *jumpdest* section, | ||||||||
- put the plain code in the *code* section, | ||||||||
- create EOFv0 container with the *jumpdest* and *code* sections. | ||||||||
5. The code deployment cost is calculated from the total EOFv0 size. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't we concerned that this de-facto cost increase will break existing CREATE/CREATE2 contracts? |
||||||||
This is a breaking change so the impact must be analysed. | ||||||||
6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code. | ||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also worth to mention than verkle account's code_size and code_hash fields should refer to the code without wrapping. And how to calculate full contract size from that. |
||||||||
### Jumpdest section encoding | ||||||||
|
||||||||
#### Bitmap | ||||||||
|
||||||||
A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset. | ||||||||
The tailing zero bytes must be trimmed. | ||||||||
Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead | ||||||||
(comparing with plain code size). | ||||||||
Such encoding doesn't require pre-processing and provides random access. | ||||||||
|
||||||||
Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. | ||||||||
This should be efficient for an average contract but behaves badly in the worst case | ||||||||
(every instruction in the code is a `JUMPDEST`). | ||||||||
The delta encoding has also another disadvantage for Verkle Tree code chunking: | ||||||||
whole (?) section must be loaded and preprocessed to check a single jump target validity. | ||||||||
|
||||||||
### Metadata encoding (8-bit numbers) | ||||||||
|
||||||||
Follow the original Verkle Tree idea to provide the single byte of metadata with the | ||||||||
"number of leading pushdata bytes in a chunk". | ||||||||
However, instead of including this in the chunk itself, | ||||||||
place the byte in order in the *jumpdest* section. | ||||||||
|
||||||||
This provides the following benefits over the original Verkle Tree design: | ||||||||
|
||||||||
1. The code executes by full 32-byte chunks. | ||||||||
2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`). | ||||||||
3. The *metadata* lookup is only needed for executing jumps | ||||||||
(not needed when following through to the next chunk). | ||||||||
|
||||||||
### Super-dense metadata encoding (6-bit numbers) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. New proposal discussed in person: Encode only "invalid pushdata" list. The goal is that we only encode invalid pushdatas (blocklist) as opposed to allowlist, and the hunch is that majority of the contracts have none so this header will be empty, and average contracts will only have a few entries. The list is basically a hashmap: Any dense encoding is sufficient if we want to reduce the header size the most. Analysis is still needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simple calculation with 4-bit encoding of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We probably need a basic "RLE-style" encoding here with a skip mode (lets call this scheme 1), i.e.:
The size of chunk number deltas must / can be tuned based on actual data. Worst case with this becomes:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could also simplify with not having delta in the value encoding (let's call this scheme 2):
Here's a calculation we did, these are the "invalid pushdatas" from a 2147 byte long contract:
Encoding with scheme 1:
2 skips (2 * 11 bits), 5 values (5 * 11 bits) = 10-bytes header (0.465%) Encoding with scheme 2:
5 skips (5 * 11 bits), 6 values (6 * 7 bits) = 13-bytes header (0.605%) Another example of the uniswap router contract (17958 bytes):
Encoding using scheme 1:
22-bytes header (0.122%) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documented in #98. |
||||||||
|
||||||||
The same as above except encode the values as 6-bit numbers | ||||||||
(minimum number of bits needed for encoding `32`). | ||||||||
Such encoding lowers the size overhead from 3.1% to 2.3%. | ||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed a special handling for data contracts: one could check if the first byte is a terminating instruction (STOP, REVERT, etc.) or an unassigned instruction. However, marking it as a data contract based on an unassigned instruction, then the "problem" listed in https://github.com/ipsilon/eof/pull/58/files#r1484267443 comes up. The only "use case" here I can see is that someone deploys a contract with a soon-to-be-introduced instruction, and that will not work. Think about those merge NFTs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what happens to initcode? Do creation transaction and CREATE/CREATE2 accept the container? I assume you didn't intend it, as it's a big change, so they kinda accept only stripped-down "code section". And deployed container is also not returned from initcode as container. This should be mentioned I think. I think I personally would prefer not to package it into EOF container and not to frame it as "container" and anything related to EOF at all. Just prepend bytecode with jumptable in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a fix to Verkle trie structure, it's not like we're inventing yet another EOF to make Verkle compatible with EOFv1. |
||||||||
## Backwards Compatibility | ||||||||
|
||||||||
EOF-packaged code execution if fully compatible with the legacy code execution. | ||||||||
This is achieved by prepending the legacy code with EOF header and the section containing | ||||||||
jumpdest metadata. The contents of the code section is identical to the lagacy code. | ||||||||
|
||||||||
Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code | ||||||||
and legacy code extracted from the wrapping without any information loss. | ||||||||
Implementations may consider keeping the legacy code in the database without modifications | ||||||||
and only construct the EOF wrapping when loading the code from the database. | ||||||||
|
||||||||
It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` | ||||||||
instructions. However, we **cannot** remove these instructions from the code because | ||||||||
this potentially breaks: | ||||||||
|
||||||||
- *dynamic* jumps (where we will not be able to adjust their jump targets), | ||||||||
- code introspection with `CODECOPY` and `EXTCODECOPY`. | ||||||||
|
||||||||
## Extensions | ||||||||
|
||||||||
### Detect unreachable code | ||||||||
|
||||||||
The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section | ||||||||
provided there are no `0x5b` bytes in the data. | ||||||||
|
||||||||
We can extend this capability by trying to detect unreachable code | ||||||||
(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions). | ||||||||
For this we require a heuristic that does not generate any false positives. | ||||||||
|
||||||||
One interesting example is a "data" contract staring with a terminating instruction | ||||||||
(e.g. `STOP`, `INVALID` or any unassigned opcode). | ||||||||
|
||||||||
There are new risks this method introduces. | ||||||||
|
||||||||
1. Treating unassigned opcodes as terminating instructions prevents them | ||||||||
from being assigned to a new instruction. | ||||||||
2. The heuristic will be considered by compilers optimizing for code size. | ||||||||
|
||||||||
### Prove jump targets are valid | ||||||||
|
||||||||
#### Prove all "static jumps" | ||||||||
|
||||||||
By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction. | ||||||||
|
||||||||
In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static". | ||||||||
(these numbers must be verified on bigger sample of contracts). | ||||||||
|
||||||||
We can easily validate all static jumps and mark a contracts with "all static jumps valid" | ||||||||
at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section. | ||||||||
|
||||||||
#### Prove all jumps | ||||||||
|
||||||||
If we can prove that all jump targets in the code are valid, | ||||||||
then there is no need for the *jumpdest* section. | ||||||||
|
||||||||
Erigon project has a | ||||||||
[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488) | ||||||||
which is able to prove all jump validity for 95+% of contracts. | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd summarise the main goal as the following:
With reusing basic EOF constructs, this allows a more simplified verkle implementation supporting both "eof0 legacy" and eof1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondary objective: can this result in a better "code-to-data" ratio (by avoiding chunking)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by your term "not having chunking" or "avoiding chunking". The chunking will be still present. Do you mean the chunking scheme with 31-byte code payload and additional metadata byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree "simplifying verkle impl" and "better code-to-data" (to be verified with data!) are the ultimate goals and benefits here, and should be listed in the doc to keep focused on that.