Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dense encoding of invalid pushdata in EOFv0 #98

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
61 changes: 24 additions & 37 deletions spec/eofv0_verkle.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,50 +111,34 @@ Alternate option is instead of encoding all valid `JUMPDEST` locations, to only
By invalid `JUMPDEST` we mean a `0x5b` byte in any pushdata.

This is beneficial if our assumption is correct that most contracts only contain a limited number
of offending cases. Our initial analysis suggests this is the case, e.g. Uniswap router has 9 cases,
one of the Arbitrum validator contracts has 6 cases.
of offending cases. Our initial analysis of the top 1000 used bytecodes suggests this is the case:
only 0.07% of bytecode bytes are invalid jumpdests.

Since Solidity contracts have a trailing metadata, which contains a Keccak-256 (32-byte) hash of the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we went to keep this trivia?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have data now so this estimation is pointless. I actually checked and probability from all dataset is indeed ~12%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually wasn't correct. The contract hash is not in the PUSH32 therefore doesn't count towards invalid jumpdests.

source, there is a 12% probability ($1 - (255/256)^{32}$) that at least one of the bytes of the hash
will contain the `0x5b` value, which gives our minimum probability of having at least one invalid
`JUMPDEST` in the contract.

Let's create a map of `invalid_jumpdests[chunk_no] = first_instruction_offset`. We can densely encode this
map using techniques similar to *run-length encoding* to skip distances and delta-encode offsets.
Let's create a map of `invalid_jumpdests[chunk_index] = first_instruction_offset`. We can densely encode this
map using techniques similar to *run-length encoding* to skip distances and delta-encode indexes.
This map is always fully loaded prior to execution, and so it is important to ensure the encoded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: see how much of those costs could be covered by the 21000 gas.

version is as dense as possible (without sacrificing on complexity).

In *scheme 1*, for each entry in `invalid_jumpdests`:
We propose the encoding using fixed-size 8-bit elements.
For each entry in `invalid_jumpdests`:
- 1-bit mode (`skip`, `value`)
- For skip-mode:
- 10-bit number of chunks to skip
- 7-bit number of chunks to skip
- For value-mode:
- 6-bit `first_instruction_offset`
- 7-bit number combining number of chunks to skip `s` and `first_instruction_offset`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next line?

produced as `s * 33 + first_instruction_offset`

Worst case encoding where each chunk contains an invalid `JUMPDEST`:
```
total_chunk_count = 24576 / 32 = 768
total_chunk_count * (1 + 6) / 8 = 672 # bytes for the header, i.e. 2.7% overhead
number_of_verkle_leafs = total_chunk_count / 32 = 21
```

*Scheme 2* differs slightly:
- 1-bit mode (`skip`, `value`)
- For skip-mode:
- 10-bit number of chunks to skip
- For value-mode:
- 4-bit number of chunks to skip
- 6-bit `first_instruction_offset`
For the worst case where each chunk contains an invalid `JUMPDEST` the encoding length is equal
to the number of chunks in the code. I.e. the size overhead is 3.1%.

Worst case encoding:
```
total_chunk_count = 24576 / 32 = 768
total_chunk_count * (1 + 4 + 6) / 8 = 1056 # bytes for the header, i.e. 4.1% overhead
number_of_verkle_leafs = total_chunk_count / 32 = 33
```
| code size limit | code chunks | encoding chunks |
|-----------------|-------------|-----------------|
| 24576 | 768 | 24 |
| 32768 | 1024 | 32 |
| 65536 | 2048 | 64 |

The decision between *scheme 1* and *scheme 2*, as well as the best encoding sizes, can be determined
through analysing existing code.
Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 3.1%.
This is strictly better than the 3.2% overhead of the current Verkle code chunking.

#### Header location

Expand All @@ -165,9 +149,12 @@ This second option allows for the simplification of the `code_size` value, as it

#### Runtime after Verkle

During runtime execution two checks must be done in this order:
1) Check if the destination is on the invalid list, and abort if so.
2) Check if the value in the chunk is an actual `JUMPDEST`, and abort if not.
During execution of a jump two checks must be done in this order:

1. Check if the jump destination is the `JUMPDEST` opcode.
2. Check if the jump destination chunk is in the `invalid_jumpdests` map.
If yes, the jumpdest analysis of the chunk must be performed
to confirm the jump destination is not push data.

It is possible to reconstruct sparse account code prior to execution with all the submitted chunks of the transaction
and perform `JUMPDEST`-validation to build up a relevant *valid `JUMPDEST` locations* map instead.
Expand Down