From 1be5a308dde4cb2a1fd56e92177c03fbc891cbd4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Wed, 7 Feb 2024 12:55:09 +0100 Subject: [PATCH 1/5] EOFv0 for packaging legacy code in Verkle Trees The design draft that proposes the use of EOF for storing code in Verkle Trees. An alternative to the existing method of executing 31-byte code chunks accompanied by 1 byte of metadata. --- spec/eofv0_verkle.md | 93 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 spec/eofv0_verkle.md diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md new file mode 100644 index 0000000..3d28310 --- /dev/null +++ b/spec/eofv0_verkle.md @@ -0,0 +1,93 @@ +# EOFv0 for packaging legacy code in Verkle Trees + +The design draft that proposes the use of EOF +for storing code in Verkle Trees. +An alternative to the existing method of executing +31-byte code chunks accompanied by 1 byte of metadata. + +## Goal + +1. Provide the result of the jumpdest analysis of a deployed code as the EOF section. +During code execution the jumpdest analysis is already available +and the answer to the question "is this jump target valid?" can be looked up +in the section. This allows using 32-byte Verkle Tree code chunks +(instead of 31-byte of code + 1 byte of metadata). +2. EOF-packaged code execution if fully compatible with the legacy code execution. + +## Specification Draft + +1. Put the code in the single *code* EOF section. +2. Use the EOF container format proposed by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540) with + version 0 and following modifications to "Changes to execution semantics": + 1. `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` operates on the *code* + section only. + 2. `JUMP`/`JUMPI`/`PC` relates code positions to the *code* section only. +3. Perform the jumpdest analysis of the code at deploy time (during contract creation). +4. Store the result of the jumpdest analysis in the *jumpdest* EOF section as proposed + by [EIP-3690](https://eips.ethereum.org/EIPS/eip-3690), + but the jumpdests encoding changed to bitmap. +5. The packaging process is done for every deployed code during Verkle Tree migration + and also for every contract creation later + (i.e. becomes the part of the consensus forever). + +## Rationale + +### Jumpdests encoding + +Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. +This should be efficient for an average contract but behaves badly in the worst case +(every instruction in the code is a `JUMPDEST`). +The delta encoding has also another disadvantage for Verkle Tree code chunking: +whole (?) section must be loaded and preprocessed to check a jump target validity. + +We propose to use a bitmap to encode jumpdests. +Such encoding does not need pre-processing and provides random access. +This gives constant 12.5% size overhead, but does not have the two mentioned disadvantages. + +## Extensions + +### Data section + +Let's try to identify a segment of code at the end of the code where a contract stores data. +We require a heuristic that does not generate any false positives. +This arrangement ensures that the instructions inspecting the code +work without modifications on the continuous *code*+*data* area + +Having a *data* section makes the *code* section and therefore the *jumpdest* section smaller. + +Example heuristic: + +1. Decode instructions. +2. Traverse instructions in reverse order. +3. If during traversal a terminating instruction (`STOP`, `INVALID`, etc) + or the code beginning is encountered, + then the *data* section starts just after the current position. + End here. +4. If during traversal a `JUMPDEST` instruction is encountered, + then there is no *data* section. + End here. + +### Prove all jump targets are valid + +If we can prove that all jump targets in the code are valid, +then there is no need for the *jumpdest* section. + +In the solidity generated code all `JUMPI` instructions are "static" +(preceded by a `PUSH` instruction). +Only some `JUMP` instructions are not "static" because they are used to implement +returns from functions. + +Erigon project had an analysis tool which was able to prove all jump validity +for 90+% of contracts. + +### Super-dense metadata encoding (6-bit numbers) + +Follow the original Verkle Tree idea to provide the metadata of +"number of leading pushdata bytes in a chunk". However, instead of including +this metadata as a single byte in the chunk itself, place the value as a 6-bit +encoded number in the *metadata* EOF section. This provides the following benefits: + +1. The code executes by full 32-byte chunks. +2. The *metadata* overhead is smaller (2.3% instead of 3.2%). +3. The *metadata* lookup is only needed for jumps + (not needed when following through to the next chunk). From 9eef32a1e4996ee99dad72626bc9d2dff4a62854 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Thu, 8 Feb 2024 20:44:22 +0100 Subject: [PATCH 2/5] Add separate Backwards Compatiblity section --- spec/eofv0_verkle.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md index 3d28310..096091b 100644 --- a/spec/eofv0_verkle.md +++ b/spec/eofv0_verkle.md @@ -7,12 +7,11 @@ An alternative to the existing method of executing ## Goal -1. Provide the result of the jumpdest analysis of a deployed code as the EOF section. +Provide the result of the jumpdest analysis of a deployed code as the EOF section. During code execution the jumpdest analysis is already available and the answer to the question "is this jump target valid?" can be looked up in the section. This allows using 32-byte Verkle Tree code chunks (instead of 31-byte of code + 1 byte of metadata). -2. EOF-packaged code execution if fully compatible with the legacy code execution. ## Specification Draft @@ -30,6 +29,21 @@ in the section. This allows using 32-byte Verkle Tree code chunks and also for every contract creation later (i.e. becomes the part of the consensus forever). +## Backwards Compatibility + +EOF-packaged code execution if fully compatible with the legacy code execution. +This is achieved by prepending the legacy code with EOF header and the section containing +jumpdest metadata. The contents of the code section is identical to the lagacy code. + +Moreover, the wrapping process is bidirectional: wrapping can be created from the legacy code +and legacy code extracted from the wrapping without any information loss. +Implementations may consider keeping the legacy code in the database without modifications +and only construct the EOF wrapping when loading the code from the database. + +It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` +instructions. However, we cannot remove these instructions from the code because +this would break at least *dynamic* jumps (where we will not be able to adjust their jump targets). + ## Rationale ### Jumpdests encoding From cdb301a98c9f1a0ff05625f10298d80bb67f95b4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Mon, 19 Feb 2024 12:21:38 +0100 Subject: [PATCH 3/5] big spec update --- spec/eofv0_verkle.md | 169 +++++++++++++++++++++++++++---------------- 1 file changed, 108 insertions(+), 61 deletions(-) diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md index 096091b..b0c93b8 100644 --- a/spec/eofv0_verkle.md +++ b/spec/eofv0_verkle.md @@ -1,8 +1,8 @@ # EOFv0 for packaging legacy code in Verkle Trees -The design draft that proposes the use of EOF +The design draft that proposes the use of EOF for storing code in Verkle Trees. -An alternative to the existing method of executing +An alternative to the existing method of executing 31-byte code chunks accompanied by 1 byte of metadata. ## Goal @@ -13,23 +13,94 @@ and the answer to the question "is this jump target valid?" can be looked up in the section. This allows using 32-byte Verkle Tree code chunks (instead of 31-byte of code + 1 byte of metadata). -## Specification Draft +## Specification + +### Container + +1. Re-use the EOF container format defined by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540). +2. Set the EOF version to 0. I.e. the packaged legacy code will be referenced as EOFv0. +3. The EOFv0 consists of the header and two sections: + - *jumpdest* + - *code* +4. The header must contain information about the sizes of these sections. + For that the EIP-3540 header or a simplified one can be used. +5. The legacy code is placed in the *code* section without modifications. +6. The *jumpdest* section contains the set of all valid jump destinations matching the positions + of all `JUMPDEST` instructions in the *code*. + The exact encoding of this section is specified separately. + +### Changes to execution semantics + +1. Execution starts at the first byte of the *code* section, and `PC` is set to 0. +2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the + end of the container). +3. `PC` returns the current position within the *code*. +4. The instructions which *read* code must refer to the *code* section only. This is significantly + different from what the original EIP-3540 proposed, however this difference is not relevant + in the latest EOFv1 revision where these instructions are invalid. + The instructions are: + - `CODECOPY` (copies a part of the *code* section), + - `CODESIZE` (returns the size of the *code* section), + - `EXTCODECOPY`, + - `EXTCODESIZE`, + - `EXTCODEHASH`. +5. To execute a `JUMP` or `JUMPI` instruction the jump target position must exist + in the *jumpdest* set. The *jumpdest* guarantees that the target instruction is `JUMPDEST`. + +### Changes to contract creation semantics + +1. Initcode execution is performed without changes. I.e. initcode remains an ephemeral code + without EOF wrapping. However, because the EOF containers are not visible to any EVM program, + implementations may decide to wrap initcodes with EOFv0 and execute it the same way as + EOFv0 deployed codes. +2. The initcode size limit remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). +3. The initcode still returns a plain deploy code. + The plain code size limit is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). +4. The plain code is not empty it must be wrapped with EOFv0 before put in the state: + - perform jumpdest analysis of the plain code, + - encode the jumpdest analysis result as the *jumpdest* section, + - put the plain code in the *code* section, + - create EOFv0 container with the *jumpdest* and *code* sections. +5. The code deployment cost is calculated from the total EOFv0 size. +6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code. + +### Jumpdest section encoding + +#### Bitmap + +A valid `JUMPDEST` is represented as `1` in a byte-aligned bitset. +The tailing zero bytes must be trimmed. +Therefore, the size of the bitmap is at most `ceil(len(code) / 8)` giving ~12% size overhead +(comparing with plain code size). +Such encoding doesn't require pre-processing and provides random access. -1. Put the code in the single *code* EOF section. -2. Use the EOF container format proposed by [EIP-3540](https://eips.ethereum.org/EIPS/eip-3540) with - version 0 and following modifications to "Changes to execution semantics": - 1. `CODECOPY`/`CODESIZE`/`EXTCODECOPY`/`EXTCODESIZE`/`EXTCODEHASH` operates on the *code* - section only. - 2. `JUMP`/`JUMPI`/`PC` relates code positions to the *code* section only. -3. Perform the jumpdest analysis of the code at deploy time (during contract creation). -4. Store the result of the jumpdest analysis in the *jumpdest* EOF section as proposed - by [EIP-3690](https://eips.ethereum.org/EIPS/eip-3690), - but the jumpdests encoding changed to bitmap. -5. The packaging process is done for every deployed code during Verkle Tree migration - and also for every contract creation later - (i.e. becomes the part of the consensus forever). +Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. +This should be efficient for an average contract but behaves badly in the worst case +(every instruction in the code is a `JUMPDEST`). +The delta encoding has also another disadvantage for Verkle Tree code chunking: +whole (?) section must be loaded and preprocessed to check a single jump target validity. + +### Metadata encoding (8-bit numbers) + +Follow the original Verkle Tree idea to provide the single byte of metadata with the +"number of leading pushdata bytes in a chunk". +However, instead of including this in the chunk itself, +place the byte in order in the *jumpdest* section. + +This provides the following benefits over the original Verkle Tree design: + +1. The code executes by full 32-byte chunks. +2. The *metadata* size overhead slightly smaller `1/32` instead of `1/31`. +3. The *metadata* lookup is only needed for executing jumps + (not needed when following through to the next chunk). + +### Super-dense metadata encoding (6-bit numbers) + +The same as above except encode the values as 6-bit numbers +(minimum number of bits needed for encoding `32`). +Such encoding lowers the size overhead from 3.1% to 2.3%. -## Backwards Compatibility +## Backwards Compatibility EOF-packaged code execution if fully compatible with the legacy code execution. This is achieved by prepending the legacy code with EOF header and the section containing @@ -41,45 +112,31 @@ Implementations may consider keeping the legacy code in the database without mod and only construct the EOF wrapping when loading the code from the database. It also can be noted that information in the *jumpdest* section is redundant to the `JUMPDEST` -instructions. However, we cannot remove these instructions from the code because -this would break at least *dynamic* jumps (where we will not be able to adjust their jump targets). +instructions. However, we **cannot** remove these instructions from the code because +this potentially breaks: -## Rationale - -### Jumpdests encoding - -Originally, the EIP-3690 proposes to use delta encoding for the elements of the *jumpdest* section. -This should be efficient for an average contract but behaves badly in the worst case -(every instruction in the code is a `JUMPDEST`). -The delta encoding has also another disadvantage for Verkle Tree code chunking: -whole (?) section must be loaded and preprocessed to check a jump target validity. - -We propose to use a bitmap to encode jumpdests. -Such encoding does not need pre-processing and provides random access. -This gives constant 12.5% size overhead, but does not have the two mentioned disadvantages. +- *dynamic* jumps (where we will not be able to adjust their jump targets), +- code introspection with `CODECOPY` and `EXTCODECOPY`. ## Extensions -### Data section +### Detect unreachable code -Let's try to identify a segment of code at the end of the code where a contract stores data. -We require a heuristic that does not generate any false positives. -This arrangement ensures that the instructions inspecting the code -work without modifications on the continuous *code*+*data* area +The bitmap encoding has a potential of omitting contract's tailing data from the *jumpdest* section +provided there are no `0x5b` bytes in the data. -Having a *data* section makes the *code* section and therefore the *jumpdest* section smaller. +We can extend this capability by trying to detect unreachable code +(e.g. contract's metadata, data or inicodes and deploy codes for `CREATE` instructions). +For this we require a heuristic that does not generate any false positives. -Example heuristic: +One interesting example is a "data" contract staring with a terminating instruction +(e.g. `STOP`, `INVALID` or any unassigned opcode). -1. Decode instructions. -2. Traverse instructions in reverse order. -3. If during traversal a terminating instruction (`STOP`, `INVALID`, etc) - or the code beginning is encountered, - then the *data* section starts just after the current position. - End here. -4. If during traversal a `JUMPDEST` instruction is encountered, - then there is no *data* section. - End here. +There are new risks this method introduces. + +1. Treating unassigned opcodes as terminating instructions prevents them + from being assigned to a new instruction. +2. The heuristic will be considered by compilers optimizing for code size. ### Prove all jump targets are valid @@ -91,17 +148,7 @@ In the solidity generated code all `JUMPI` instructions are "static" Only some `JUMP` instructions are not "static" because they are used to implement returns from functions. -Erigon project had an analysis tool which was able to prove all jump validity -for 90+% of contracts. - -### Super-dense metadata encoding (6-bit numbers) - -Follow the original Verkle Tree idea to provide the metadata of -"number of leading pushdata bytes in a chunk". However, instead of including -this metadata as a single byte in the chunk itself, place the value as a 6-bit -encoded number in the *metadata* EOF section. This provides the following benefits: +Erigon project has a +[prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488) +which is able to prove all jump validity for 95+% of contracts. -1. The code executes by full 32-byte chunks. -2. The *metadata* overhead is smaller (2.3% instead of 3.2%). -3. The *metadata* lookup is only needed for jumps - (not needed when following through to the next chunk). From 380f1a7f818805c36b2f052893d6e76222eee889 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Tue, 26 Mar 2024 11:42:19 +0100 Subject: [PATCH 4/5] address some comments --- spec/eofv0_verkle.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md index b0c93b8..0494c12 100644 --- a/spec/eofv0_verkle.md +++ b/spec/eofv0_verkle.md @@ -7,6 +7,10 @@ An alternative to the existing method of executing ## Goal +Simplified legacy code execution in the Verkle Tree implementation. + +Better "code-to-data" ratio. + Provide the result of the jumpdest analysis of a deployed code as the EOF section. During code execution the jumpdest analysis is already available and the answer to the question "is this jump target valid?" can be looked up @@ -35,9 +39,9 @@ in the section. This allows using 32-byte Verkle Tree code chunks 2. Execution stops if `PC` goes outside the code section bounds (in case of EOFv0 this is also the end of the container). 3. `PC` returns the current position within the *code*. -4. The instructions which *read* code must refer to the *code* section only. This is significantly - different from what the original EIP-3540 proposed, however this difference is not relevant - in the latest EOFv1 revision where these instructions are invalid. +4. The instructions which *read* code must refer to the *code* section only. + The modification keeps the behavior of these instructions unchanged. + These instructions are invalid in EOFv1. The instructions are: - `CODECOPY` (copies a part of the *code* section), - `CODESIZE` (returns the size of the *code* section), @@ -53,15 +57,16 @@ in the section. This allows using 32-byte Verkle Tree code chunks without EOF wrapping. However, because the EOF containers are not visible to any EVM program, implementations may decide to wrap initcodes with EOFv0 and execute it the same way as EOFv0 deployed codes. -2. The initcode size limit remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). +2. The initcode size limit and cost remains defined by [EIP-3860](https://eips.ethereum.org/EIPS/eip-3860). 3. The initcode still returns a plain deploy code. - The plain code size limit is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). -4. The plain code is not empty it must be wrapped with EOFv0 before put in the state: + The plain code size limit and cost is defined by [EIP-170](https://eips.ethereum.org/EIPS/eip-170). +4. If the plain code is not empty, it must be wrapped with EOFv0 before put in the state: - perform jumpdest analysis of the plain code, - encode the jumpdest analysis result as the *jumpdest* section, - put the plain code in the *code* section, - create EOFv0 container with the *jumpdest* and *code* sections. 5. The code deployment cost is calculated from the total EOFv0 size. + This is a breaking change so the impact must be analysed. 6. During Verkle Tree migration perform the above EOFv0 wrapping of all deployed code. ### Jumpdest section encoding @@ -90,7 +95,7 @@ place the byte in order in the *jumpdest* section. This provides the following benefits over the original Verkle Tree design: 1. The code executes by full 32-byte chunks. -2. The *metadata* size overhead slightly smaller `1/32` instead of `1/31`. +2. The *metadata* size overhead slightly smaller: 3.1% (`1/32`) instead of 3.2% (`1/31`). 3. The *metadata* lookup is only needed for executing jumps (not needed when following through to the next chunk). From 68244b3ba4b4fdb6e0e9c876b7dd39af6618aee2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Bylica?= Date: Tue, 26 Mar 2024 12:13:35 +0100 Subject: [PATCH 5/5] new idea for validating static jumps --- spec/eofv0_verkle.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/spec/eofv0_verkle.md b/spec/eofv0_verkle.md index 0494c12..f18952d 100644 --- a/spec/eofv0_verkle.md +++ b/spec/eofv0_verkle.md @@ -143,16 +143,23 @@ There are new risks this method introduces. from being assigned to a new instruction. 2. The heuristic will be considered by compilers optimizing for code size. -### Prove all jump targets are valid +### Prove jump targets are valid + +#### Prove all "static jumps" + +By "static jump" we consider a jump instruction directly preceded by a `PUSH` instruction. + +In the solidity generated code all `JUMPI` instructions and 85% of `JUMP` instructions are "static". +(these numbers must be verified on bigger sample of contracts). + +We can easily validate all static jumps and mark a contracts with "all static jumps valid" +at deploy time. Then at runtime static jumps can be executed without accessing jumpdest section. + +#### Prove all jumps If we can prove that all jump targets in the code are valid, then there is no need for the *jumpdest* section. -In the solidity generated code all `JUMPI` instructions are "static" -(preceded by a `PUSH` instruction). -Only some `JUMP` instructions are not "static" because they are used to implement -returns from functions. - Erigon project has a [prototype analysis tool](https://github.com/ledgerwatch/erigon/blob/devel/cmd/hack/flow/flow.go#L488) which is able to prove all jump validity for 95+% of contracts.