Skip to content

Commit

Permalink
[wip] pointer schema
Browse files Browse the repository at this point in the history
  • Loading branch information
gnidan committed Feb 6, 2024
1 parent 1fee602 commit bf85f5c
Show file tree
Hide file tree
Showing 44 changed files with 1,745 additions and 0 deletions.
2 changes: 2 additions & 0 deletions packages/tests/schemas/examples.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ const idsOfSchemasAllowedToOmitExamples = new Set([
"schema:ethdebug/format/type",
"schema:ethdebug/format/type/complex",
"schema:ethdebug/format/type/elementary",
"schema:ethdebug/format/pointer/region",
"schema:ethdebug/format/pointer/collection",
]);

describe("Examples", () => {
Expand Down
8 changes: 8 additions & 0 deletions packages/web/spec/pointer/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "ethdebug/format/pointer",
"position": 4,
"link": {
"type": "generated-index",
"description": "Work-in-progress formal schema for ethdebug format"
}
}
8 changes: 8 additions & 0 deletions packages/web/spec/pointer/collection/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Collections",
"position": 5,
"link": {
"type": "generated-index",
"description": "Work-in-progress formal schema for ethdebug format"
}
}
11 changes: 11 additions & 0 deletions packages/web/spec/pointer/collection/collection.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 1
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Collection schema

<SchemaViewer
schema={{ id: "schema:ethdebug/format/pointer/collection" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/pointer/collection/conditional.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 4
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Conditional

<SchemaViewer
schema={{ id: "schema:ethdebug/format/pointer/collection/conditional" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/pointer/collection/group.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 2
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Group

<SchemaViewer
schema={{ id: "schema:ethdebug/format/pointer/collection/group" }}
/>
11 changes: 11 additions & 0 deletions packages/web/spec/pointer/collection/list.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
sidebar_position: 3
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# List

<SchemaViewer
schema={{ id: "schema:ethdebug/format/pointer/collection/list" }}
/>
309 changes: 309 additions & 0 deletions packages/web/spec/pointer/concepts.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
---
sidebar_position: 2
---

import SchemaViewer from "@site/src/components/SchemaViewer";

# Key concepts

## A **pointer** is a region or a collection of other pointers

High-level languages allow programmers to describe and manipulate conceptual
ideas as succinct, individual building blocks of machine execution.
Nowadays, thanks to decades of compiler research, resulting low-level machine
states are often reliably indecipherable, bearing no resemblance at all to
even basic data abstractions.

The **ethdebug/format/pointer** schema provides a tree-based syntax for
representing complete (and often minutely detailed) address information for
finding a particular high-level data object. (For instance: a compiler may need
to inform a debugger about where to find a particular array in memory.)
As such, this schema is specified recursively: a pointer is either a single,
continuous sequence of bytes addresses (a **region**), or it aggregates other
pointers (a **collection**).

## A **region** is a single continuous range of byte addresses

For simple allocations (like those that fit into a single word), the
**ethdebug/format/pointer** representation is also quite simple: just a single,
optionally named, continuous chunk of bytes in the machine state.

<details>
<summary>
**Example**: Pointing to the first 32 bytes of memory
</summary>

```json
{
"name": "memory-start",
"location": "memory",
"offset": "0x0000000000000000000000000000000000000000000000000000000000000000",
"length": 32
}
```

</details>

This schema defines the concept of a **region** to be the representation
of the addressing details for a particular block of continuous bytes. Different
data locations use different, location-specific schemas for regions
(since, e.g., stack regions are very different than storage regions). The
**ethdebug/format/pointer/region** schema aggregates these using the
`"location"` field as a polymorphic discriminator.

## A **collection** aggregates other pointers

Other allocations are not so cleanly represented by a single continuous block
of bytes anywhere. In these situations, the **ethdebug/format/pointer**
representation can describe the aggregation of other composed pointers.

<details>
<summary>
**Example**: Solidity `struct` in memory
</summary>

Consider the `struct` definition:
```solidity
struct Record {
uint256 x;
uint256 y;
}
```

A minimal way to represent one possible memory allocation for this type is
to `"group"` together two single-region pointers for each of the two struct
members.

```json
{
"group": [{
"location": "memory",
"offset": "0x40",
"length": 32
}, {
"location": "memory",
"offset": "0x60",
"length": 32
}]
}
```

</details>

This kind of pointer is defined to be a **collection** of other pointers.
This schema includes several sub-schemas for different kinds of collections
(e.g., since the allocation of struct types are very different from the
allocation of array types, etc.). The **ethdebug/format/pointer/collection**
schema aggregates these.


## Pointers allow describing a value as a complex **expression**

The examples above all use literal values for `"offset"` and `"length"`, but
these will very often be impossible to predict at compile-time. This format
provides a JSON-based **expression** syntax for relating the data components
involved in a complex allocation.

Besides representing byte offsets, word addresses, byte range lengths, etc.
using just unsigned integer literals, this schema also allows representing
addressing details via numeric operations, references to named regions,
explicit EVM lookup, and so on.
reference to other regions by name, and a few builtin operations.

<details>
<summary>
**Example**: Region defined using an arithmetic operation
</summary>

```json
{
"location": "memory",
"offset": {
"$sum": [1, 1]
},
"length": 1
}
```

</details>

More expressively, regions whose representations include a
`"name": "<identifier>"` property allow the use of this `<identifier>` in
references to this region elsewhere in the pointer representation.

This can be used, for example, to indicate a storage slot whose address is
known at some point in execution to be the value at the top of the stack.

<details>
<summary>
**Example**: Naming a region and reading this region's data from machine state
</summary>

```json
{
"group": [{
"name": "pointer-to-storage-slot",
"location": "stack",
"slot": 0
}, {
"name": "storage-slot",
"location": "storage",
"slot": {
"$read": "pointer-to-storage-slot"
}
}]
}
```

</details>

Regions can also be referenced for the purposes of copying fields (to
avoid duplicating constants, etc.) This is useful, e.g., with structs, whose
members often are allocated one after the next with no gap. In this example,
the sub-pointer corresponding to `y` is positioned based on `x`'s offset
and length.

<details>
<summary>
**Example**: Defining regions in terms of other regions
</summary>

```json
{
"group": [{
"name": "record-pointer",
"location": "stack",
"slot": 0
}, {
"name": "record-x",
"location": "memory",
"offset": {
"$read": "record-pointer"
},
"length": 32
}, {
"name": "record-y",
"location": "memory",
"offset": {
"$sum": [{ ".offset": "record-x" }, { ".length": "record-x" }]
},
"length": 32
}]
}
```

</details>

This schema is designed to allow compilers maximal expressiveness in producing
self-contained representations that are completely knowable at compile-time.

## Collections can be dynamic

For collections whose cardinality or configuration is unpredictable at
compile-time (e.g., `uint256[]` or `string storage` allocations, respectively),
and for collections whose static representation would simply be too cumbersome
(e.g., `bytes32[3200][5600][111]` allocations),
**ethdebug/format/pointer/collection** provides sub-schemas for describing the
full set of inter-related data addresses in dynamic terms.

<details>
<summary>
**Example**: Representing a dynamically-sized array allocation
</summary>

The following represents an allocation where the array's item count is stored
as the leading word-sized sequence of bytes, and each item in the array has
the same fixed size and appears in memory sequentially following that with no
gaps.

Notice how `"list": { ... }` expects an object with an expression for the
value of `"count"`, the name of the scalar variable to represent `"each"`
item's index in the list, and the underlying pointer for the item itself.

```json
{
"group": [
{
"name": "array-count",
"location": "memory",
"offset": "0x40",
"length": 32
},
{
"list": {
"count": { "$read": "array-count" },
"each": "item-index",
"is": {
"name": "array-item",
"location": "memory",
"offset": {
"$sum": [
{ ".offset": "array-count" },
{ ".length": "array-count" },
"$product": [
"item-index",
{ ".length": "array-item" }
]
]
},
"length": 32
}
}
}
]
}
```

</details>

## A region is specified in terms of an **addressing scheme**

The EVM models its various data locations in a couple different ways based on
how bytes are defined to be arranged in each location: e.g., storage is
arranged in slots, but memory is just one long bytes array.

This pointer schema does not make any attempt to unify these different access
abstractions, but instead it defines the concept of an **addressing scheme**.

Each location-specific region schema is defined as an extension of the schema
for a particular addressing scheme.

This format currently defines two such addressing schemes:
**ethdebug/format/pointer/scheme/slice** and
**ethdebug/format/pointer/scheme/segment**, for addressing ranges within a
single continuous byte array and addressing a slot or collection of slots
in a word-arranged locaton (respectively).

<details>
<summary>
**Example**: Slice-based region
</summary>

This example addresses the 32 bytes starting at location `0x3334` in calldata:

```json
{
"location": "calldata",
"offset": "0x3334",
"length": "0x20"
}
```

</details>

<details>
<summary>
**Example**: Segment-based region
</summary>

This example addresses the first four slots of transient storage (beginning at
slot 0).

```json
{
"location": "transient",
"slot": 0
}
```

</details>
Loading

0 comments on commit bf85f5c

Please sign in to comment.