-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document dense encoding of invalid pushdata in EOFv0 #98
base: main
Are you sure you want to change the base?
Changes from 1 commit
043c959
75eb1b2
d231ec7
35955cf
df12705
920a4cc
2552767
34fba68
640b51a
0168937
de25406
462c8c3
d6e0255
3afee13
a087506
778d10e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,50 +111,34 @@ Alternate option is instead of encoding all valid `JUMPDEST` locations, to only | |
By invalid `JUMPDEST` we mean a `0x5b` byte in any pushdata. | ||
|
||
This is beneficial if our assumption is correct that most contracts only contain a limited number | ||
of offending cases. Our initial analysis suggests this is the case, e.g. Uniswap router has 9 cases, | ||
one of the Arbitrum validator contracts has 6 cases. | ||
of offending cases. Our initial analysis of the top 1000 used bytecodes suggests this is the case: | ||
only 0.07% of bytecode bytes are invalid jumpdests. | ||
|
||
Since Solidity contracts have a trailing metadata, which contains a Keccak-256 (32-byte) hash of the | ||
source, there is a 12% probability ($1 - (255/256)^{32}$) that at least one of the bytes of the hash | ||
will contain the `0x5b` value, which gives our minimum probability of having at least one invalid | ||
`JUMPDEST` in the contract. | ||
|
||
Let's create a map of `invalid_jumpdests[chunk_no] = first_instruction_offset`. We can densely encode this | ||
map using techniques similar to *run-length encoding* to skip distances and delta-encode offsets. | ||
Let's create a map of `invalid_jumpdests[chunk_index] = first_instruction_offset`. We can densely encode this | ||
map using techniques similar to *run-length encoding* to skip distances and delta-encode indexes. | ||
This map is always fully loaded prior to execution, and so it is important to ensure the encoded | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to self: see how much of those costs could be covered by the 21000 gas. |
||
version is as dense as possible (without sacrificing on complexity). | ||
|
||
In *scheme 1*, for each entry in `invalid_jumpdests`: | ||
We propose the encoding using fixed-size 8-bit elements. | ||
For each entry in `invalid_jumpdests`: | ||
- 1-bit mode (`skip`, `value`) | ||
- For skip-mode: | ||
- 10-bit number of chunks to skip | ||
- 7-bit number of chunks to skip | ||
- For value-mode: | ||
- 6-bit `first_instruction_offset` | ||
- 7-bit number combining number of chunks to skip `s` and `first_instruction_offset` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. next line? |
||
produced as `s * 33 + first_instruction_offset` | ||
|
||
Worst case encoding where each chunk contains an invalid `JUMPDEST`: | ||
``` | ||
total_chunk_count = 24576 / 32 = 768 | ||
total_chunk_count * (1 + 6) / 8 = 672 # bytes for the header, i.e. 2.7% overhead | ||
number_of_verkle_leafs = total_chunk_count / 32 = 21 | ||
``` | ||
|
||
*Scheme 2* differs slightly: | ||
- 1-bit mode (`skip`, `value`) | ||
- For skip-mode: | ||
- 10-bit number of chunks to skip | ||
- For value-mode: | ||
- 4-bit number of chunks to skip | ||
- 6-bit `first_instruction_offset` | ||
For the worst case where each chunk contains an invalid `JUMPDEST` the encoding length is equal | ||
to the number of chunks in the code. I.e. the size overhead is 3.1%. | ||
|
||
Worst case encoding: | ||
``` | ||
total_chunk_count = 24576 / 32 = 768 | ||
total_chunk_count * (1 + 4 + 6) / 8 = 1056 # bytes for the header, i.e. 4.1% overhead | ||
number_of_verkle_leafs = total_chunk_count / 32 = 33 | ||
``` | ||
| code size limit | code chunks | encoding chunks | | ||
|-----------------|-------------|-----------------| | ||
| 24576 | 768 | 24 | | ||
| 32768 | 1024 | 32 | | ||
| 65536 | 2048 | 64 | | ||
|
||
The decision between *scheme 1* and *scheme 2*, as well as the best encoding sizes, can be determined | ||
through analysing existing code. | ||
Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 3.1%. | ||
This is strictly better than the 3.2% overhead of the current Verkle code chunking. | ||
|
||
#### Header location | ||
|
||
|
@@ -165,9 +149,12 @@ This second option allows for the simplification of the `code_size` value, as it | |
|
||
#### Runtime after Verkle | ||
|
||
During runtime execution two checks must be done in this order: | ||
1) Check if the destination is on the invalid list, and abort if so. | ||
2) Check if the value in the chunk is an actual `JUMPDEST`, and abort if not. | ||
During execution of a jump two checks must be done in this order: | ||
|
||
1. Check if the jump destination is the `JUMPDEST` opcode. | ||
2. Check if the jump destination chunk is in the `invalid_jumpdests` map. | ||
If yes, the jumpdest analysis of the chunk must be performed | ||
to confirm the jump destination is not push data. | ||
|
||
It is possible to reconstruct sparse account code prior to execution with all the submitted chunks of the transaction | ||
and perform `JUMPDEST`-validation to build up a relevant *valid `JUMPDEST` locations* map instead. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we went to keep this trivia?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have data now so this estimation is pointless. I actually checked and probability from all dataset is indeed ~12%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually wasn't correct. The contract hash is not in the
PUSH32
therefore doesn't count towards invalid jumpdests.