From a32c9850a50a122e27293618dfe4449c46a8b398 Mon Sep 17 00:00:00 2001 From: Rob Mulholand Date: Tue, 12 Feb 2019 15:41:44 -0600 Subject: [PATCH 1/2] Add first draft of storage transformers readme - Explain process of creating a new storage transformer --- pkg/transformers/factories/storage/README.md | 124 +++++++++++++++++++ 1 file changed, 124 insertions(+) create mode 100644 pkg/transformers/factories/storage/README.md diff --git a/pkg/transformers/factories/storage/README.md b/pkg/transformers/factories/storage/README.md new file mode 100644 index 000000000..1196001b5 --- /dev/null +++ b/pkg/transformers/factories/storage/README.md @@ -0,0 +1,124 @@ +# Watching Contract Storage + +One approach VulcanizeDB takes to caching and indexing smart contracts is to ingest raw contract storage values. +Assuming that you are running an ethereum node that is writing contract storage changes to a CSV file, VulcanizeDB can parse them and persist the results to postgres. + +## Assumptions + +The current approach for caching smart contract storage diffs assumes that you are running a node that is writing contract storage diffs to a CSV file. +The CSV file is expected to have 5 columns: contract address, block hash, block number, storage key, storage value. + +We have [a branch on vulcanize/parity-ethereum](https://github.com/vulcanize/parity-ethereum/tree/watch-storage-diffs) that enables running a node that writes storage diffs this way. +We also have [sample data](https://github.com/8thlight/maker-vulcanizedb/pull/132/files) that comes from running that node against Kovan through block 9796184. + +Looking forward, we would like to isolate this assumption as much as possible. +We may end up needing to read CSV data that is formatted differently, or reading data from a non-CSV source, and we do not want resulting changes to cascade throughout the codebase. + +## Shared Code + +VulcanizeDB has shared code for continuously reading from the CSV file written by the ethereum node and writing a parsed version of each row to postgres. + +### Storage Watcher + +The storage watcher is responsible for continuously delegating CSV rows to the appropriate transformer as they are being written by the ethereum node. +It maintains a mapping of contract addresses to transformers, and will ignore storage diff rows for contract addresses that do not have a corresponding transformer. + +The storage watcher is currently initialized from the `parseStorageDiffs` command, which also adds transformers that the watcher should know about in its mapping of addresses to transformers. + +### Storage Transformer + +The storage transformer is responsible for converting raw contract storage hex values into useful data and writing them to postgres. +The storage transformer depends on contract-specific implementations of code capable of recognizing storage keys and writing the matching (decoded) storage value to disk. + +```golang +func (transformer Transformer) Execute(row shared.StorageDiffRow) error { + metadata, lookupErr := transformer.Mappings.Lookup(row.StorageKey) + if lookupErr != nil { + return lookupErr + } + value, decodeErr := shared.Decode(row, metadata) + if decodeErr != nil { + return decodeErr + } + return transformer.Repository.Create(row.BlockHeight, row.BlockHash.Hex(), metadata, value) +} +``` + +## Custom Code + +In order to watch an additional smart contract, a developer must create three things: + +1. Mappings - specify how to identify keys in the contract's storage trie. +1. Repository - specify how to persist a parsed version of the storage value matching the recognized storage key. +1. Instance - create an instance of the storage transformer that uses your mappings and repository. + +### Mappings + +```golang +type Mappings interface { + Lookup(key common.Hash) (shared.StorageValueMetadata, error) + SetDB(db *postgres.DB) +} +``` + +A contract-specific implementation of the mappings interface enables the storage transformer to fetch metadata associated with a storage key. + +Storage metadata contains: the name of the variable matching the storage key, a raw version of any keys associated with the variable (if the variable is a mapping), and the variable's type. + +```golang +type StorageValueMetadata struct { + Name string + Keys map[Key]string + Type ValueType +} +``` + +Keys are only relevant if the variable is a mapping. For example, in the following Solidity code: + +```solidity +pragma solidity ^0.4.0; + +contract Contract { + uint x; + mapping(address => uint) y; +} +``` + +The metadata for variable `x` would not have any associated keys, but the metadata for a storage key associated with `y` would include the address used to specify that key's index in the mapping. + +The `SetDB` function is required for the mappings to connect to the database. +A database connection may be desired when keys in a mapping variable need to be read from log events (e.g. to lookup what addresses may exist in `y`, above). + +### Repository + +```golang +type Repository interface { + Create(blockNumber int, blockHash string, metadata shared.StorageValueMetadata, value interface{}) error + SetDB(db *postgres.DB) +} +``` + +A contract-specific implementation of the repository interface enables the transformer to write the decoded storage value to the appropriate table in postgres. + +The `Create` function is expected to recognize and persist a given storage value by the variable's name, as indicated on the row's metadata. + +The `SetDB` function is required for the repository to connect to the database. + +### Instance + +```golang +type Transformer struct { + Address common.Address + Mappings storage_diffs.Mappings + Repository storage_diffs.Repository +} +``` + +A new instance of the storage transformer is initialized with the contract-specific mappings and repository, as well as the contract's address. +The contract's address is included so that the watcher can query that value from the transformer in order to build up its mapping of addresses to transformers. + +## Summary + +To begin watching an additional smart contract, create a new mappings file for looking up storage keys on that contract, a repository for writing storage values from the contract, and initialize a new storage transformer instance with the mappings, repository, and contract address. + +The new instance, wrapped in an initializer that calls `SetDB` on the mappings and repository, should be passed to the `AddTransformers` function on the storage watcher. \ No newline at end of file From 9b5794201ce18a022352517729c093e3f7a5c722 Mon Sep 17 00:00:00 2001 From: Rob Mulholand Date: Wed, 13 Feb 2019 20:15:04 -0600 Subject: [PATCH 2/2] add example contract and instructions for watching its storage --- pkg/transformers/factories/storage/EXAMPLE.md | 167 ++++++++++++++++++ 1 file changed, 167 insertions(+) create mode 100644 pkg/transformers/factories/storage/EXAMPLE.md diff --git a/pkg/transformers/factories/storage/EXAMPLE.md b/pkg/transformers/factories/storage/EXAMPLE.md new file mode 100644 index 000000000..288b61f41 --- /dev/null +++ b/pkg/transformers/factories/storage/EXAMPLE.md @@ -0,0 +1,167 @@ +# Storage Transformer Example + +In the Storage Transformer README, we went over code that needs to be written to add a new storage transformer to VulcanizeDB. +In this document, we'll go over an example contract and discuss how one would go about watching its storage. + +## Example Contract + +For the purposes of this document, we'll be assuming that we're interested in watching the following contract: + +```solidity +pragma solidity ^0.5.1; + +contract Contract { + uint256 public num_addresses; + mapping(address => uint) public addresses; + + event AddressAdded( + address addr, + uint256 num_addrs + ); + + constructor() public { + addresses[msg.sender] = 1; + num_addresses = 1; + } + + function add_address(address addr) public { + bool exists = addresses[addr] > 0; + addresses[addr] = addresses[addr] + 1; + if (!exists) { + emit AddressAdded(addr, ++num_addresses); + } + } +} +``` + +Disclaimer: this contract has not been audited and is not intended to be modeled or used in production. :) + +This contract persists two values in it's storage: + +1. `num_addresses`: the total number of unique addresses known to the contract. +2. `addresses`: a mapping that records the number of times an address has been added to the contract. + +It also emits an event each time a new address is added into the contract's storage. + +## Custom Code + +In order to monitor the state of this smart contract, we'd need to implement: an event transformer, a mappings namespace, and a repository. +We will go through each of these in turn. + +### Event Transformer + +Given that the contract's storage includes a mapping, `addresses`, we will need to be able to identify the keys to that mapping that exist in the system so that we can recognize contract storage keys that correspond to non-zero values in that mapping. + +The simplest way to be aware of keys used in a contract's mapping is to listen for contract events that emit the keys that are used in its mapping(s). +Since this contract includes an event, `AddressAdded`, that is emitted each time a new address is added to the `addresses` mapping, we will want to listen for those events and cache the adddresses that map to non-zero values. + +Please see the event transformer README for detailed instructions about developing this code. +In short, it should be feasible to recognize `AddressAdded` events on the blockchain and parse them to keep a record of addresses that have been added to the system. + +### Mappings + +If we point an ethereum node at a blockchain hosting this contract and our node is equipped to write out storage changes happening on this contract, we will expect such changes to appear each time `add_address` (which modifies the `addresses` mapping) is called. + +In order for those changes - which include raw hex versions of storage keys and storage values, to be useful for us - we need to know how to recognize and parse them. +Our mappings file should assist us with both of these tasks: the `Lookup` function should recognize raw storage keys and return known metadata about the storage value. + +In order to perform this lookup, the mappings file should maintain its own mapping of known storage keys to the corresponding storage value metadata. +This internal mapping should contain the storage key for `num_addresses` as well as a storage key for each `addresses` key known to be associated with a non-zero value. + +#### num_addresses + +`num_addresses` is the first variable declared on the contract, and it is a simple (non-array, non-mapping) type. +Therefore, we know that its storage key is `0000000000000000000000000000000000000000000000000000000000000000`. +The storage key for non-array and non-mapping variables is (usually*) the index of the variable on the contract's storage. +If we see a storage diff being emitted from this contract with this storage key, we know that the `num_addresses` variable has been modified. + +In this case, we would expect that the call `mappings.Lookup("0000000000000000000000000000000000000000000000000000000000000000")` would return metadata corresponding to the `num_addresses` variable. +This metadata would probably look something like: + +```golang +shared.StorageValueMetadata{ + Name: "num_addresses", + Keys: nil, + Type: shared.Uint256, +} +``` + +* Occasionally, multiple variables may be packed into one storage slot, which complicates a direct translation of the index of the variable on the contract to its storage key. + +#### addresses + +`addresses` is the second variable declared on the contract, but it is a mapping. +Since it is a mapping, the storage key is more complex than `0000000000000000000000000000000000000000000000000000000000000001` (which would be the key for the variable if it were not an array or mapping). +Having a single storage slot for an entire mapping would not work, since there can be an arbitrary number of entries in a mapping, and a single storage value slot is constrained to 32 bytes. + +The way that smart contract mappings are maintained in storage (in Solidity) is by creating a new storage key/value pair for each entry in the mapping, where the storage key is a hash of the occupied slot's key concatenated with the mapping's index on the contract. +Given an occupied slot's key, `k`, and a mapping's index on the contract, `i`, we can generate the storage key with the following code: + +```golang +func GetMappingStorageKey(k, i string) string { + return common.BytesToHash(crypto.Keccak256(common.FromHex(k + i))).Hex() +} +``` + +If we were to call the contract's `add_address` function with `0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe`, we would expect to see an `AddressAdded` event emitted, with `0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe` in its payload. +From that event, we would know that there exists in the contract's storage a storage key of: + +```golang +GetMappingStorageKey("0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe", "0000000000000000000000000000000000000000000000000000000000000001") +``` + +Executing the above code results in: `0x0f96a1133cfd5b94c329aa0526b5962bd791dbbfc481ca82f7d4a439e1e9bc40`. + +Therefore, the first time `add_address` was called for this address, we would also expect to see a storage diff with a key of `0x0f96a1133cfd5b94c329aa0526b5962bd791dbbfc481ca82f7d4a439e1e9bc40` and a value of `0000000000000000000000000000000000000000000000000000000000000001`. +This would be the indication that in contract storage, the address `0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe` maps to the value 1. + +Given that we knew this address was a key in the mapping from our event transformer, we would expect a call to `mappings.Lookup("0x0f96a1133cfd5b94c329aa0526b5962bd791dbbfc481ca82f7d4a439e1e9bc40")` to return metadata corresponding to _this slot_ in the addresses mapping: + +```golang +shared.StorageValueMetadata{ + Name: "addresses, + Keys: map[Key]string{Address: "0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe"}, + Type: shared.Uint256, +} +``` + +### Repository + +Once we have recognized a storage diff, we can decode the storage value to the data's known type. +Since the metadata tells us that the above values are `uint256`, we can decode a value like `0000000000000000000000000000000000000000000000000000000000000001` to `1`. + +The purpose of the contract-specific repository is to write that value to the database in a way that makes it useful for future queries. +Typically, the involves writing the block hash, block number, decoded value, and any keys in the metadata to a table. + +The current repository interface has a generalized `Create` function that can accept any arbitrary storage row along with it's metadata. +This is deliberate, to facilitate shared use of the common storage transformer. +An implication of this decision is that the `Create` function typically includes a `switch` statement that selects which table to write to, as well as what data to include, based on the name of the variable as defined in the metadata. + +An example implementation of `Create` for our example contract above might look like: + +```golang +func (repository AddressStorageRepository) Create(blockNumber int, blockHash string, metadata shared.StorageValueMetadata, value interface{}) error { + switch metadata.Name { + case "num_addresses": + _, err := repository.db.Exec(`INSERT INTO storage.num_addresses (block_hash, block_number, n) VALUES ($1, $2, $3)`, + blockHash, blockNumber, value) + return err + case "addresses": + _, err := repository.db.Exec(`INSERT INTO storage.addresses (block_hash, block_number, address, n) VALUES ($1, $2, $3, $4)`, + blockHash, blockNumber, metadata.Keys[Address], value) + return err + default: + panic(fmt.Sprintf("unrecognized contract storage name: %s", metadata.Name)) + } +} +``` + +## Summary + +With our very simple address storing contract, we would be able to read it's storage diffs by implementing an event transformer, a mappings, and a repository. + +The mappings would be able to lookup storage keys reflecting `num_addresses` or any slot in `addresses`, using addresses derived from watching the `AddressAdded` event for the latter. + +The repository would be able to persist the value or `num_addresses` or any slot in `addresses`, using metadata returned from the mappings. + +The mappings and repository could be plugged into the common storage transformer, enabling us to know the contract's state as it is changing. \ No newline at end of file