-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
44 changed files
with
1,745 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "ethdebug/format/pointer", | ||
"position": 4, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "Work-in-progress formal schema for ethdebug format" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "Collections", | ||
"position": 5, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "Work-in-progress formal schema for ethdebug format" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
sidebar_position: 1 | ||
--- | ||
|
||
import SchemaViewer from "@site/src/components/SchemaViewer"; | ||
|
||
# Collection schema | ||
|
||
<SchemaViewer | ||
schema={{ id: "schema:ethdebug/format/pointer/collection" }} | ||
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
sidebar_position: 4 | ||
--- | ||
|
||
import SchemaViewer from "@site/src/components/SchemaViewer"; | ||
|
||
# Conditional | ||
|
||
<SchemaViewer | ||
schema={{ id: "schema:ethdebug/format/pointer/collection/conditional" }} | ||
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
|
||
import SchemaViewer from "@site/src/components/SchemaViewer"; | ||
|
||
# Group | ||
|
||
<SchemaViewer | ||
schema={{ id: "schema:ethdebug/format/pointer/collection/group" }} | ||
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
sidebar_position: 3 | ||
--- | ||
|
||
import SchemaViewer from "@site/src/components/SchemaViewer"; | ||
|
||
# List | ||
|
||
<SchemaViewer | ||
schema={{ id: "schema:ethdebug/format/pointer/collection/list" }} | ||
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,309 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
|
||
import SchemaViewer from "@site/src/components/SchemaViewer"; | ||
|
||
# Key concepts | ||
|
||
## A **pointer** is a region or a collection of other pointers | ||
|
||
High-level languages allow programmers to describe and manipulate conceptual | ||
ideas as succinct, individual building blocks of machine execution. | ||
Nowadays, thanks to decades of compiler research, resulting low-level machine | ||
states are often reliably indecipherable, bearing no resemblance at all to | ||
even basic data abstractions. | ||
|
||
The **ethdebug/format/pointer** schema provides a tree-based syntax for | ||
representing complete (and often minutely detailed) address information for | ||
finding a particular high-level data object. (For instance: a compiler may need | ||
to inform a debugger about where to find a particular array in memory.) | ||
As such, this schema is specified recursively: a pointer is either a single, | ||
continuous sequence of bytes addresses (a **region**), or it aggregates other | ||
pointers (a **collection**). | ||
|
||
## A **region** is a single continuous range of byte addresses | ||
|
||
For simple allocations (like those that fit into a single word), the | ||
**ethdebug/format/pointer** representation is also quite simple: just a single, | ||
optionally named, continuous chunk of bytes in the machine state. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Pointing to the first 32 bytes of memory | ||
</summary> | ||
|
||
```json | ||
{ | ||
"name": "memory-start", | ||
"location": "memory", | ||
"offset": "0x0000000000000000000000000000000000000000000000000000000000000000", | ||
"length": 32 | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
This schema defines the concept of a **region** to be the representation | ||
of the addressing details for a particular block of continuous bytes. Different | ||
data locations use different, location-specific schemas for regions | ||
(since, e.g., stack regions are very different than storage regions). The | ||
**ethdebug/format/pointer/region** schema aggregates these using the | ||
`"location"` field as a polymorphic discriminator. | ||
|
||
## A **collection** aggregates other pointers | ||
|
||
Other allocations are not so cleanly represented by a single continuous block | ||
of bytes anywhere. In these situations, the **ethdebug/format/pointer** | ||
representation can describe the aggregation of other composed pointers. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Solidity `struct` in memory | ||
</summary> | ||
|
||
Consider the `struct` definition: | ||
```solidity | ||
struct Record { | ||
uint256 x; | ||
uint256 y; | ||
} | ||
``` | ||
|
||
A minimal way to represent one possible memory allocation for this type is | ||
to `"group"` together two single-region pointers for each of the two struct | ||
members. | ||
|
||
```json | ||
{ | ||
"group": [{ | ||
"location": "memory", | ||
"offset": "0x40", | ||
"length": 32 | ||
}, { | ||
"location": "memory", | ||
"offset": "0x60", | ||
"length": 32 | ||
}] | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
This kind of pointer is defined to be a **collection** of other pointers. | ||
This schema includes several sub-schemas for different kinds of collections | ||
(e.g., since the allocation of struct types are very different from the | ||
allocation of array types, etc.). The **ethdebug/format/pointer/collection** | ||
schema aggregates these. | ||
|
||
|
||
## Pointers allow describing a value as a complex **expression** | ||
|
||
The examples above all use literal values for `"offset"` and `"length"`, but | ||
these will very often be impossible to predict at compile-time. This format | ||
provides a JSON-based **expression** syntax for relating the data components | ||
involved in a complex allocation. | ||
|
||
Besides representing byte offsets, word addresses, byte range lengths, etc. | ||
using just unsigned integer literals, this schema also allows representing | ||
addressing details via numeric operations, references to named regions, | ||
explicit EVM lookup, and so on. | ||
reference to other regions by name, and a few builtin operations. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Region defined using an arithmetic operation | ||
</summary> | ||
|
||
```json | ||
{ | ||
"location": "memory", | ||
"offset": { | ||
"$sum": [1, 1] | ||
}, | ||
"length": 1 | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
More expressively, regions whose representations include a | ||
`"name": "<identifier>"` property allow the use of this `<identifier>` in | ||
references to this region elsewhere in the pointer representation. | ||
|
||
This can be used, for example, to indicate a storage slot whose address is | ||
known at some point in execution to be the value at the top of the stack. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Naming a region and reading this region's data from machine state | ||
</summary> | ||
|
||
```json | ||
{ | ||
"group": [{ | ||
"name": "pointer-to-storage-slot", | ||
"location": "stack", | ||
"slot": 0 | ||
}, { | ||
"name": "storage-slot", | ||
"location": "storage", | ||
"slot": { | ||
"$read": "pointer-to-storage-slot" | ||
} | ||
}] | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
Regions can also be referenced for the purposes of copying fields (to | ||
avoid duplicating constants, etc.) This is useful, e.g., with structs, whose | ||
members often are allocated one after the next with no gap. In this example, | ||
the sub-pointer corresponding to `y` is positioned based on `x`'s offset | ||
and length. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Defining regions in terms of other regions | ||
</summary> | ||
|
||
```json | ||
{ | ||
"group": [{ | ||
"name": "record-pointer", | ||
"location": "stack", | ||
"slot": 0 | ||
}, { | ||
"name": "record-x", | ||
"location": "memory", | ||
"offset": { | ||
"$read": "record-pointer" | ||
}, | ||
"length": 32 | ||
}, { | ||
"name": "record-y", | ||
"location": "memory", | ||
"offset": { | ||
"$sum": [{ ".offset": "record-x" }, { ".length": "record-x" }] | ||
}, | ||
"length": 32 | ||
}] | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
This schema is designed to allow compilers maximal expressiveness in producing | ||
self-contained representations that are completely knowable at compile-time. | ||
|
||
## Collections can be dynamic | ||
|
||
For collections whose cardinality or configuration is unpredictable at | ||
compile-time (e.g., `uint256[]` or `string storage` allocations, respectively), | ||
and for collections whose static representation would simply be too cumbersome | ||
(e.g., `bytes32[3200][5600][111]` allocations), | ||
**ethdebug/format/pointer/collection** provides sub-schemas for describing the | ||
full set of inter-related data addresses in dynamic terms. | ||
|
||
<details> | ||
<summary> | ||
**Example**: Representing a dynamically-sized array allocation | ||
</summary> | ||
|
||
The following represents an allocation where the array's item count is stored | ||
as the leading word-sized sequence of bytes, and each item in the array has | ||
the same fixed size and appears in memory sequentially following that with no | ||
gaps. | ||
|
||
Notice how `"list": { ... }` expects an object with an expression for the | ||
value of `"count"`, the name of the scalar variable to represent `"each"` | ||
item's index in the list, and the underlying pointer for the item itself. | ||
|
||
```json | ||
{ | ||
"group": [ | ||
{ | ||
"name": "array-count", | ||
"location": "memory", | ||
"offset": "0x40", | ||
"length": 32 | ||
}, | ||
{ | ||
"list": { | ||
"count": { "$read": "array-count" }, | ||
"each": "item-index", | ||
"is": { | ||
"name": "array-item", | ||
"location": "memory", | ||
"offset": { | ||
"$sum": [ | ||
{ ".offset": "array-count" }, | ||
{ ".length": "array-count" }, | ||
"$product": [ | ||
"item-index", | ||
{ ".length": "array-item" } | ||
] | ||
] | ||
}, | ||
"length": 32 | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
## A region is specified in terms of an **addressing scheme** | ||
|
||
The EVM models its various data locations in a couple different ways based on | ||
how bytes are defined to be arranged in each location: e.g., storage is | ||
arranged in slots, but memory is just one long bytes array. | ||
|
||
This pointer schema does not make any attempt to unify these different access | ||
abstractions, but instead it defines the concept of an **addressing scheme**. | ||
|
||
Each location-specific region schema is defined as an extension of the schema | ||
for a particular addressing scheme. | ||
|
||
This format currently defines two such addressing schemes: | ||
**ethdebug/format/pointer/scheme/slice** and | ||
**ethdebug/format/pointer/scheme/segment**, for addressing ranges within a | ||
single continuous byte array and addressing a slot or collection of slots | ||
in a word-arranged locaton (respectively). | ||
|
||
<details> | ||
<summary> | ||
**Example**: Slice-based region | ||
</summary> | ||
|
||
This example addresses the 32 bytes starting at location `0x3334` in calldata: | ||
|
||
```json | ||
{ | ||
"location": "calldata", | ||
"offset": "0x3334", | ||
"length": "0x20" | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
<details> | ||
<summary> | ||
**Example**: Segment-based region | ||
</summary> | ||
|
||
This example addresses the first four slots of transient storage (beginning at | ||
slot 0). | ||
|
||
```json | ||
{ | ||
"location": "transient", | ||
"slot": 0 | ||
} | ||
``` | ||
|
||
</details> |
Oops, something went wrong.