Skip to content

Commit

Permalink
EOFv0 for packaging legacy code in Verkle Trees (#58)
Browse files Browse the repository at this point in the history
The design draft that proposes the use of EOF for storing code in Verkle Trees.
An alternative to the existing method of executing 31-byte code chunks accompanied by 1 byte of metadata.
  • Loading branch information
chfast authored Apr 25, 2024
1 parent 40c4645 commit fd4fc38
Showing 1 changed file with 166 additions and 0 deletions.
166 changes: 166 additions & 0 deletions spec/eofv0_verkle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# EOFv0 for packaging legacy code in Verkle Trees

The design draft that proposes the use of EOF
for storing code in Verkle Trees.
An alternative to the existing method of executing
31-byte code chunks accompanied by 1 byte of metadata.

## Goal

Simplified legacy code execution in the Verkle Tree implementation.

Better "code-to-data" ratio.

Provide the result of the jumpdest analysis of a deployed code as the EOF section.
During code execution the jumpdest analysis is already available
and the answer to the question "is this jump target valid?" can be looked up
in the section. This allows using 32-byte Verkle Tree code chunks
(instead of 31-byte of code + 1 byte of metadata).

## Specification

### Container

1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540).
2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0.
3. The EOFv0 consists of the header and two sections:
- *jumpdest*
- *code*
4. The header must contain information about the sizes of these sections.
For that the EIP-3540 header or a simplified one can be used.
5. The legacy code is placed in the *code* section without modifications.
6. The *jumpdest* section contains the set of all valid jump destinations matching the positions
of all `JUMPDEST` instructions in the *code*.
The exact encoding of this section is specified separately.

### Changes to execution semantics

1. Execution starts at the first byte of the *code* section, and `PC` is set to 0.
2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the
end of the container).
3. `PC` returns the current position within the *code*.
4. The instructions which *read* code must refer to the *code* section only.
The modification keeps the behavior of these instructions unchanged.
These instructions are invalid in EOFv1.
The instructions are:
- `CODECOPY` (copies a part of the *code* section),
- `CODESIZE` (returns the size of the *code* section),
- `EXTCODECOPY`,
- `EXTCODESIZE`,
- `EXTCODEHASH`.
5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist
in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`.

### Changes to contract creation semantics

1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code
without EOF wrapping. However, because the EOF containers are not visible to any EVM program,
implementations may decide to wrap initcodes with EOFv0 and execute it the same way as
EOFv0 deployed codes.
2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860).
3. The initcode still returns a plain deploy code.
The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170).
4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state:
- perform jumpdest analysis of the plain code,
- encode the jumpdest analysis result as the *jumpdest* section,
- put the plain code in the *code* section,
- create EOFv0 container with the *jumpdest* and *code* sections.
5. The code deployment cost is calculated from the total EOFv0 size.
This is a breaking change so the impact must be analysed.
6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code.

### Jumpdest section encoding

#### Bitmap

A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset.
The tailing zero bytes must be trimmed.
Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead
(comparing with plain code size).
Such encoding doesn't require pre-processing and provides random access.

Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section.
This should be efficient for an average contract but behaves badly in the worst case
(every instruction in the code is a `JUMPDEST`).
The delta encoding has also another disadvantage for Verkle Tree code chunking:
whole (?) section must be loaded and preprocessed to check a single jump target validity.

### Metadata encoding (8-bit numbers)

Follow the original Verkle Tree idea to provide the single byte of metadata with the
"number of leading pushdata bytes in a chunk".
However, instead of including this in the chunk itself,
place the byte in order in the *jumpdest* section.

This provides the following benefits over the original Verkle Tree design:

1. The code executes by full 32-byte chunks.
2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`).
3. The *metadata* lookup is only needed for executing jumps
(not needed when following through to the next chunk).

### Super-dense metadata encoding (6-bit numbers)

The same as above except encode the values as 6-bit numbers
(minimum number of bits needed for encoding `32`).
Such encoding lowers the size overhead from 3.1% to 2.3%.

## Backwards Compatibility

EOF-packaged code execution if fully compatible with the legacy code execution.
This is achieved by prepending the legacy code with EOF header and the section containing
jumpdest metadata. The contents of the code section is identical to the lagacy code.

Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code
and legacy code extracted from the wrapping without any information loss.
Implementations may consider keeping the legacy code in the database without modifications
and only construct the EOF wrapping when loading the code from the database.

It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST`
instructions. However, we **cannot** remove these instructions from the code because
this potentially breaks:

- *dynamic* jumps (where we will not be able to adjust their jump targets),
- code introspection with `CODECOPY` and `EXTCODECOPY`.

## Extensions

### Detect unreachable code

The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section
provided there are no `0x5b` bytes in the data.

We can extend this capability by trying to detect unreachable code
(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions).
For this we require a heuristic that does not generate any false positives.

One interesting example is a "data" contract staring with a terminating instruction
(e.g. `STOP`, `INVALID` or any unassigned opcode).

There are new risks this method introduces.

1. Treating unassigned opcodes as terminating instructions prevents them
from being assigned to a new instruction.
2. The heuristic will be considered by compilers optimizing for code size.

### Prove jump targets are valid

#### Prove all "static jumps"

By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction.

In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static".
(these numbers must be verified on bigger sample of contracts).

We can easily validate all static jumps and mark a contracts with "all static jumps valid"
at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section.

#### Prove all jumps

If we can prove that all jump targets in the code are valid,
then there is no need for the *jumpdest* section.

Erigon project has a
[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488)
which is able to prove all jump validity for 95+% of contracts.

0 comments on commit fd4fc38

Please sign in to comment.