Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mapping key identification to format #37

Merged
merged 3 commits into from
Aug 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/source/format.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ For example, the Solidity compiler will in some cases perform a "tail-call" opti
the compiler will push the entry point of `f` as the return address for the call to `g`. The format should
help explicitly identify the targets of internal function calls and what arguments are being passed on the stack.

### Mapping key identification

EVM languages commonly include non-enumerable mappings. As such, it is useful to be able to dynamically identify any mapping keys that may appear
while analyzing a transaction trace or debugging.

## The Format

The format will be JSON so that it may be included in the standard input/output APIs that the Vyper and Solidity compilers support.
Expand Down Expand Up @@ -229,6 +234,7 @@ is itself a dictionary that (optionally) includes some of the following:
* The AST ID(s) that "correspond" to the opcode
* The layout of the stack, including type information and local variable names (if available)
* Jump target information (if available/applicable)
* Identification of mapping key information

In the above "correspond" roughly means "what source code caused the generation of this opcode".

Expand All @@ -238,6 +244,7 @@ that contributed to the generation of this opcode.
* `ast`: A list of AST ids for the "closest" AST node that contributed to the generation of this opcode.
* `stack` A layout of the stack as understood by the compiler, represented as a list.
* `jumps`: If present, provides hints about the location being jumped to by a jumping command (JUMP or JUMPI)
* `mappings`: If present, contains information about how the opcode relates to mapping keys.

#### Source Locations

Expand Down Expand Up @@ -321,3 +328,10 @@ If the value of `sort` is `"return"`, then the dictionary has the following fiel
* `returns`: A list of dictionaries with the same format of as the `arguments` array of `call`, but without any `return_address` entries.

**Discussion**: The above proposal doesn't really handle the case of "tail-calls" identified at the beginning of this document, where multiple return addresses can be pushed onto the stack. Is that something debug format must explicitly model?

#### Mapping key identification

The value of this field (when present) is a dictionary with (some of) the following fields:
* `isMappingHash`: A boolean that identifies whether the opcode is computing a hash for a mapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big vague, but I'm assuming only a single opcode that returns the final result should be marked? I.e. you're not interested in the whole sequence of opcodes that are involved in a high-level operation of indexing a mapping, only the final step giving you the hash, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I meant, sorry, likely that should be clarified.

Of course, there's one big exception to that statement, which is the case of mapping computations that themselves involve a hash at an earlier step (as can happen in Vyper). In that case the earlier SHA3 instruction should also be marked, but as a prehash instead.

* `isMappingPreHash`: For mappings that use two hashes, this boolean can identify whether the opcode is computing the first of the two hashes. Possibly this field should be combined with a previous one into some sort of enum?
* `mappingHashFormat`: An enumeration; specifies the format of what gets hashed for the mapping. Formats could include "prefix" (for Solidity), "postfix" (for Vyper value types), and "postfix-prehashed" (for Vyper strings and bytestrings). Possibly "prefix" could be split further into "prefix-padded" (for Solidity value types) and "prefix-unpadded" (for Solidity strings and bytestrings). This could be expanded in the future if necessary. (Also, potentially `"prefix-padded"`, if split out, could be broken down even further, by padding type -- zero padding (left) vs sign-padding vs zero-padding (right)...)