From fd4fc3890bd915d368d3228556617fbb4840d051 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Thu, 25 Apr 2024 15:54:33 +0200 Subject: [PATCH] EOFv0 for packaging legacy code in Verkle Trees (#58) The design draft that proposes the use of EOF for storing code in Verkle Trees. An alternative to the existing method of executing 31-byte code chunks accompanied by 1 byte of metadata. --- spec/eofv0_verkle.md | 166 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 166 insertions(+) create mode 100644 spec/eofv0_verkle.md diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md new file mode 100644 index 0000000..f18952d --- /dev/null +++ b/spec/eofv0_verkle.md @@ -0,0 +1,166 @@ +# EOFv0 for packaging legacy code in Verkle Trees + +The design draft that proposes the use of EOF +for storing code in Verkle Trees. +An alternative to the existing method of executing +31-byte code chunks accompanied by 1 byte of metadata. + +## Goal + +Simplified legacy code execution in the Verkle Tree implementation. + +Better "code-to-data" ratio. + +Provide the result of the jumpdest analysis of a deployed code as the EOF section. +During code execution the jumpdest analysis is already available +and the answer to the question "is this jump target valid?" can be looked up +in the section. This allows using 32-byte Verkle Tree code chunks +(instead of 31-byte of code + 1 byte of metadata). + +## Specification + +### Container + +1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540). +2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0. +3. The EOFv0 consists of the header and two sections: + - *jumpdest* + - *code* +4. The header must contain information about the sizes of these sections. + For that the EIP-3540 header or a simplified one can be used. +5. The legacy code is placed in the *code* section without modifications. +6. The *jumpdest* section contains the set of all valid jump destinations matching the positions + of all `JUMPDEST` instructions in the *code*. + The exact encoding of this section is specified separately. + +### Changes to execution semantics + +1. Execution starts at the first byte of the *code* section, and `PC` is set to 0. +2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the + end of the container). +3. `PC` returns the current position within the *code*. +4. The instructions which *read* code must refer to the *code* section only. + The modification keeps the behavior of these instructions unchanged. + These instructions are invalid in EOFv1. + The instructions are: + - `CODECOPY` (copies a part of the *code* section), + - `CODESIZE` (returns the size of the *code* section), + - `EXTCODECOPY`, + - `EXTCODESIZE`, + - `EXTCODEHASH`. +5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist + in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`. + +### Changes to contract creation semantics + +1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code + without EOF wrapping. However, because the EOF containers are not visible to any EVM program, + implementations may decide to wrap initcodes with EOFv0 and execute it the same way as + EOFv0 deployed codes. +2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). +3. The initcode still returns a plain deploy code. + The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). +4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state: + - perform jumpdest analysis of the plain code, + - encode the jumpdest analysis result as the *jumpdest* section, + - put the plain code in the *code* section, + - create EOFv0 container with the *jumpdest* and *code* sections. +5. The code deployment cost is calculated from the total EOFv0 size. + This is a breaking change so the impact must be analysed. +6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code. + +### Jumpdest section encoding + +#### Bitmap + +A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset. +The tailing zero bytes must be trimmed. +Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead +(comparing with plain code size). +Such encoding doesn't require pre-processing and provides random access. + +Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. +This should be efficient for an average contract but behaves badly in the worst case +(every instruction in the code is a `JUMPDEST`). +The delta encoding has also another disadvantage for Verkle Tree code chunking: +whole (?) section must be loaded and preprocessed to check a single jump target validity. + +### Metadata encoding (8-bit numbers) + +Follow the original Verkle Tree idea to provide the single byte of metadata with the +"number of leading pushdata bytes in a chunk". +However, instead of including this in the chunk itself, +place the byte in order in the *jumpdest* section. + +This provides the following benefits over the original Verkle Tree design: + +1. The code executes by full 32-byte chunks. +2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`). +3. The *metadata* lookup is only needed for executing jumps + (not needed when following through to the next chunk). + +### Super-dense metadata encoding (6-bit numbers) + +The same as above except encode the values as 6-bit numbers +(minimum number of bits needed for encoding `32`). +Such encoding lowers the size overhead from 3.1% to 2.3%. + +## Backwards Compatibility + +EOF-packaged code execution if fully compatible with the legacy code execution. +This is achieved by prepending the legacy code with EOF header and the section containing +jumpdest metadata. The contents of the code section is identical to the lagacy code. + +Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code +and legacy code extracted from the wrapping without any information loss. +Implementations may consider keeping the legacy code in the database without modifications +and only construct the EOF wrapping when loading the code from the database. + +It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` +instructions. However, we **cannot** remove these instructions from the code because +this potentially breaks: + +- *dynamic* jumps (where we will not be able to adjust their jump targets), +- code introspection with `CODECOPY` and `EXTCODECOPY`. + +## Extensions + +### Detect unreachable code + +The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section +provided there are no `0x5b` bytes in the data. + +We can extend this capability by trying to detect unreachable code +(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions). +For this we require a heuristic that does not generate any false positives. + +One interesting example is a "data" contract staring with a terminating instruction +(e.g. `STOP`, `INVALID` or any unassigned opcode). + +There are new risks this method introduces. + +1. Treating unassigned opcodes as terminating instructions prevents them + from being assigned to a new instruction. +2. The heuristic will be considered by compilers optimizing for code size. + +### Prove jump targets are valid + +#### Prove all "static jumps" + +By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction. + +In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static". +(these numbers must be verified on bigger sample of contracts). + +We can easily validate all static jumps and mark a contracts with "all static jumps valid" +at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section. + +#### Prove all jumps + +If we can prove that all jump targets in the code are valid, +then there is no need for the *jumpdest* section. + +Erigon project has a +[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488) +which is able to prove all jump validity for 95+% of contracts. +